1. You put it in a URL marked as "noindex-nofollow". Google will avoid it. You a...

luckylion · on July 4, 2019

> 1. You put it in a URL marked as "noindex-nofollow".

Better yet, mark it Disallow in robots.txt - to see "noindex, nofollow", they'd still need to request the URL, running the risk to be served with the bomb.

> 2. You create an exception so that they never cache the page and don't proxy this exact URL.

They work as reverse proxies on host-basis, I don't think you can exclude a single URL. CF at least will never cache text/html (unless specifically told to), but I don't know whether they will unpack (and possibly cross-compress to a better suited compression algorithm) the content while transmitting.

IronBacon · on July 4, 2019

I put, as a test and for fun, a "Disallow" entry in my robots.txt (with a campy name to be honest) and not a single crawler hit that dir in more than three years, don't know if others had the same experience.

luckylion · on July 4, 2019

I was suggesting Disallow to make sure Google doesn't request it ;) I don't know if any bots look at robots.txt to see potentially interesting URLs. I do when I take a better look at sites, but I usually don't qualify as a bot.

My experience is that most bots just hit the usual suspects, /wp-login.php, /phpmyadmin/ etc, regardless whether they are in robots.txt or not.

IronBacon · on July 4, 2019

> My experience is that most bots just hit the usual suspects, /wp-login.php, /phpmyadmin/ etc, regardless whether they are in robots.txt or not.

Yeah, basically what I see in my logs. To be more clear, the disallow is for a non existent path in the document dir. I somewhat expected to find at least one script to actively crawl it, but it makes sense, as no sane people would put secrets on a website and protect them with a robot.txt... ^__^;

bartread · on July 4, 2019

Thank you (both you and GP)!