Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it really depends on the application of web scraping. (As someone who does, what is in my mind, ethical web scraping)

- Scraping public information from government websites to do analysis: ethical, it's the public's data

- Scraping to help some companies customers more effectively use that companies product, for example scraping a medical office's insurance claims to help them automate their insurance remittance process: ethical

- Scraping faces to build a surveillance-tech company: disgusting

- Scraping your own website because your internal processes are so broken you can't get it any other way: ethical

- Scraping to just copy someone's data they worked hard to generate to go and resell: unethical



The first one here is important. Despite the open data movement pressuring governments to provide their data in easily consumable forms, a lot of government organizations are still unable or unwilling to do so.

Political advocacy orgs rely a lot on scraping to collect political representative data that isn't available through any other means.


Yes, and so do research orgs. My organization does a lot of scraping because we deal with local election data and that's. Uh. Let's just say that if all counties had websites that were like Web 1.0, that would be an improvement over the current situation.


- Scraping faces to find missing persons: ethical

- Scraping photos to create deep learning VQGAN+CLIP art generator: ethical

.. we can go on and on, but we should all agree scraping is a useful tool that should never be outlawed.


I don't think that it is outlawed though, at least in practical terms, no one is gonna sue you for scrapping government websites. You really only think about the legal aspect when you do it for commercial gains.

It would be interesting to know if that data can be used in a court case against a government agency though.


> - Scraping to just copy someone's data they worked hard to generate to go and resell: unethical

Wanted to include a slightly different application:

- Scraping multiple websites and organizing data in a new and useful way for customers: To me this would be ethical since it produces new value and does not just copy someone else's data as-is


So it's really not about the "scraping" here, it's about the kind of business you're building. I don't think any of your definitions change if you simply employed people to check the websites instead of scripts.


Re government websites: they're often terrible. I've occasionally contemplated a side project just to scrape and restructure some local/state websites into a usable forms with search and whatnot.


And if you manually copy someone's data they worked hard to generate to go and resell, then it's ethical?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: