Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

10 requests per second doesn't sound majorly high. That's 36,000 pages per hour which whilst big, doesn't sound too high, especially for a site as popular as SO (Alexa puts it at 137rd most popular site; granted, Alexa isn't the most accurate).


This is addressed in the post - apparently it's hitting pages that haven't been accessed in a while, starting background tasks - but it still seems odd to me. I'd have expected a huge amount of Stack Overflow's traffic to come from long tail searches, which should be basically the same thing. Excerpt for the lazy:

"and when Google hits thousands of pages in a few minutes, that can kick off a lot of background work, such as rebuilding related questions. Not expensive by itself, but when multiplied by a hundred at once.. can be quite painful."


The rules are that you can't send google different page content than regular browsers, but there's no reason they have to run all the background processes on googlebot requests -- can't they just send it the most recent cached version?


Not a bad idea, but seems like it would be tricky to get right. You do kinda want Google to have the most recent version of a page, all other things being equal.


Agreed, there's a sweet spot somewhere though, and it might not be the same for googlebot as a regular viewer).


Why not put the cached content in the page by default, then do an update via AJAX only in the case where the cached version is old? That way it's not triggered for crawlers. It's probably secondary content anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: