> Those aren't proper audits. And again, bringing up the face that it's open sou...

unknownaccount · on Aug 25, 2022

>self hosting isn't feasible for 99% of the population

Its only this way because companies have a vested interest in keeping it like that. It's how they make their money. It is absolutely within the realm of possibility that people host their own search engine. 99% of people know how to install Google Chrome right? this should be no different. The entire search engine & webserver stack it depends on could be bundled into a .exe/.app installer with simple instructions people can understand. Consider XAMPP- which already provides a webserver stack that is extremely easy to install on Windows/Mac just by a simple .exe/.app that 'just works'. This hypothetical search engine could use similar methods as the XAMPP installer. There is no technical reason why this can't happen. It just isn't happening because it'd increase competition, cutting into DDG's profits.

tsimionescu · on Aug 25, 2022

Sure, the problem with installing a local search engine is the installer technology. It can't be the petabyte of index information that the search engine actually needs, and the petaflops of CPU it would need to search through it.

Everyone has a PB of SSD disk space, some few TB of RAM and a few thousand CPUs to throw at the search problem, or is happy to type in a search query and give a 16 core CPU a few days to execute it, right?

josefx · on Aug 25, 2022

> or is happy to type in a search query and give a 16 core CPU a few days to execute it, right?

That is just a naive implementation. For the first 10 results you grab ads, the database of those is significantly smaller, for the next 20 results you look at Wikipedia and stackexchange clone sites. Everything after that is indexed using math.random(). If you want to get fancy run the query through a fact creating AI and present the results inline, people are always happy to know that the color of the sky is purple or that the ideal amount of chess players is 5. Disclaimer: I have never seen googles source code nor any patents related to it, any similarity with existing search engines is pure coincidence.

unknownaccount · on Aug 25, 2022

I don’t know why you are framing this as an impossible task. It doesn’t need to be on the scale of Bing/Google to function. There are already some self-hosted search engine solutions that work okay. Just filter out all the trash sites with low quality content like Facebook/Twitter from the database and that 300TB common crawl could probably be cut down to a more reasonable 200TB. Filter out non-English results and it probably halves it further. I’m seeing 8TB drives on Newegg for $129. It absolutely does not take anywhere on the order of “days” to query a properly optimized db of this size.