Hacker Newsnew | past | comments | ask | show | jobs | submit | techarity's commentslogin

Alot of people seem to think Google only crawls content found via ANCHOR elements, but for a long time they've been able to extract the path from EMBED, SRC, and other markup elements that indicate a remote resource is being included; but that's a far cry from being able to process and execute scripting languages and understand the DOM transformations happening from AJAX requests.

In your case, I'd suspect they were simply following the src of your: <script src="path here"></script> markup... though if you read the articles cited, we suspect they've been crawling and understanding JavaScript for a pretty long time now.


Haha the article that just won't die!

For those who are interested, there was a follow up to the article here: http://www.distilled.net/blog/seo/google-stop-playing-the-ji...

And Dan Clarke did some independent tests here: http://www.danclarkie.co.uk/can-the-googlebot-read-javascrip...

This was all back in Oct - Dec of '11. Basically we learned that Googlebot handles JavaScript and AJAX pretty much like a browser.

When it comes to AJAX, it appears to index the content under the destination URL of the XHR in some cases, while indexing it as part of the page making the XHR in other instances. Something about the way the AJAX request is made causes Google to treat it like a 302 redirect at times.

Standard JS window.location redirects also appear to be treated as equivalent to 302 redirects.

@dsl - I suspect you're correct. The Google Toolbar, Chrome's Opt-In Program, The Search Quality Program, and now Google Analytics Data (since the TOS change) are probably all being used to train the behavior of Googlebot when interacting with elements on a page.

Google also has plenty of patents related to computer vision, and their self-driving car is road-worthy... so processing DOM renders of the page ala Firefox's 3D View/Tilt is probably small potatoes for them.


I agree on this one... just because Google CAN do it doesn't mean they will.

On that note, I am personally of the belief that the fragments are part of Google's learning/training process for their spiders.

If they sniff the XHR traffic on every domain they encounter a HashBang they can learn lots about the use of AJAX and the types of content being exposed via AJAX.


Google's ability to crawl JavaScript in IFRAMES was confirmed by a Google Search Quality Engineer just recently today.

https://twitter.com/#!/mattcutts/status/131425949597179904

There might be something to this theory...


Haha those calls are AJAX calls for the social sharing widgets in IPullRank. It's Twitter, Digg, ShareThis, etc.

Thank you for looking out for the fine folks at HN though!


I was actually talking about requests to the website roots of various porn and p2p domains.

EDIT: Please contact me at the address in my profile if you'd like me to send you a list of URLs that my browser accessed when I loaded that cached page.


Yeah, when I loaded the original article, I got a lot more loaded than you would expect (hundreds of domains, most of which don't appear to be relevant), leading me to believe that something shady is going on.


A quick run through the site with Firebug's XHR and Dependency Logger didn't show any Porn or P2P links or strange script calls.

I'll see if the site owner wants to reach out to you view email; thanks so much for speaking up.


I can't edit that link anymore. Would suck if the Wordpress install was hacked. I'll try to contact the author. Thank you for the heads up.


Hello Azaki, thanks for the feedback.

The article was written as a simplification as the target audience were SEO Professionals who may or may not have a development background. I'm really surprised in the interest so far, I didn't expect this to get outside of it's intended audience.

The "inaccuracies" are largely simplifications, rather than deliberate inaccuracies, but please do fact check it; I appreciate the feedback. It's especially valuable to hear from developers.

I'm not at all dismissing Google's efforts if you read the article more deeply. I just place more emphasis on their effort to make Chrome threaded, as I believe that functionality is absolutely necessary to deploy a browser as a spider.

It also an amazing piece of engineering, and has loads of benefits. As for V8's speed, I couldn't find any recent benchmarks, so I leaned on the anecdotal evidence. As you said, it really depends on the benchmark.

Mentioning the programming language was more about hinting at Google's proficiency in the space; C++ is one of their core development languages.

Anywho, thanks again for the feedback. If nothing else, I hope you found it interesting!


Hi All!

7DaystoChangetheWorld.com is an experimental fundraising platform we were working on prior to the recent tragedy in Japan.

We decided to launch an alpha version to the public in support of the Japanese relief effort. We're hoping the resident geeks @ HN can provide some feedback and criticism.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: