First of: Google has not once crashed one of our sites with GoogleBot. They have never tried to by-pass our caching and they are open and honest about their IP ranges, allowing us to rate-limit if needed.
The residential proxies are not needed, if you behave. My take is that you want to scrape stuff that site owners do not want to give you and you don't want to be told no or perhaps pay a license. That is the only case where I can see you needing a residential proxies.
>The residential proxies are not needed, if you behave
I'm starting to think that somee users in hackernews do not 'behave' or at least they think they do not 'behave' and provide an alibi for those that do not 'behave'.
That the hacker in hackernews does not attract just hackers as in 'hacking together features' but also hackers as in 'illegitimately gaining access to servers/data'
As far as I can tell, as a hacker that hacks features together, resi proxies are something the enemy uses. Whenever I boot up a server and get 1000 log in requests per second and requests for commonly exploited files from russian and chinese IPs, those come from resi IPs no doubt. There's 2 sides to this match, no more.
> You can’t get much crawling done from published cloud IPs.
Think about why that might be. I'm sorry, if you legitimately need to crawl the net, and do so from a cloud provide, your industry screwed you over with bad behaviour. Go get hosting with a company that cares about who their customers are, you're hanging out with a bad crowd.
No, no they really aren't, but I was thinking the "scraping industry" in the sense that that's a thing. Getting hosting in smaller datacenters is simple enough, but you may need to manage your own hardware, or VMs. Many will help you get your own IP ranges and ASN, that's going to go a long way, if you don't want to get bundled in with the bad bots.
This differs obviously, but having an ASN in our case means that we can deal you, contact you and assume that you're better than random bot number 817.
Thank you for speaking some sense. As a site operator that's been inundated with junk traffic over the past ~month where well in excess of 99% of it has to be blocked, the scrapers have brought this upon themselves.
I actually do let quite a few known, "good" scrapers scrape my stuff. They identify themselves, they make it clear what they do, and they respect conventions like robots.txt.
These residential proxies have been abused by scrapers that use random legit-looking user agents and absolutely hammer websites. What is it with these scrapers just not understanding consent? It's gross.
The residential proxies are not needed, if you behave. My take is that you want to scrape stuff that site owners do not want to give you and you don't want to be told no or perhaps pay a license. That is the only case where I can see you needing a residential proxies.