Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I run a data aggregation company that has a fairly advanced scraping infrastructure for collecting data across the web. Having built the scraping side, I'm pretty familiar with most of the strategies for avoiding bot detection.

Coming from that perspective, detecting and stopping at least the majority of bots out there is fairly doable, and I put together a rudimentary thing for a side project.

The core of it uses an IP API for looking up the requesting IP to identify the country and if it's coming from a data center, VPN, Tor, etc. If it passes that, I trigger Google Captcha to show up. Lastly, I track IPs that make it through and have some basic rules in place to try to detect patterns and block offenders that way.

There's a bunch more stuff you can check for, but the core of it is basically filtering out data center traffic to minimize the requests going to Google Captcha.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: