Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wish web.archive.org had an index by someone like common crawl. There is lots of great stuff on archive.org


web.archive.org has a CDX index, similar to Common Crawl.

Since I use both of these archives together, I wrote this code to iron out the differences between them:

https://github.com/cocrawler/cdx_toolkit


Hey! I was using your tool a couple months ago. It was super helpful for my project.


Thanks! I rarely hear from users, great to hear from you!


They do and its better than common crawl's by my testing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: