Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1. You put it in a URL marked as "noindex-nofollow". Google will avoid it. You are supposed to only serve the page to identified spam bots anyway.

2. You create an exception so that they never cache the page and don't proxy this exact URL.



> 1. You put it in a URL marked as "noindex-nofollow".

Better yet, mark it Disallow in robots.txt - to see "noindex, nofollow", they'd still need to request the URL, running the risk to be served with the bomb.

> 2. You create an exception so that they never cache the page and don't proxy this exact URL.

They work as reverse proxies on host-basis, I don't think you can exclude a single URL. CF at least will never cache text/html (unless specifically told to), but I don't know whether they will unpack (and possibly cross-compress to a better suited compression algorithm) the content while transmitting.


I put, as a test and for fun, a "Disallow" entry in my robots.txt (with a campy name to be honest) and not a single crawler hit that dir in more than three years, don't know if others had the same experience.


I was suggesting Disallow to make sure Google doesn't request it ;) I don't know if any bots look at robots.txt to see potentially interesting URLs. I do when I take a better look at sites, but I usually don't qualify as a bot.

My experience is that most bots just hit the usual suspects, /wp-login.php, /phpmyadmin/ etc, regardless whether they are in robots.txt or not.


> My experience is that most bots just hit the usual suspects, /wp-login.php, /phpmyadmin/ etc, regardless whether they are in robots.txt or not.

Yeah, basically what I see in my logs. To be more clear, the disallow is for a non existent path in the document dir. I somewhat expected to find at least one script to actively crawl it, but it makes sense, as no sane people would put secrets on a website and protect them with a robot.txt... ^__^;


Thank you (both you and GP)!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: