Can someone please explain why we are supposed to trust DDG? Isnt it just a random website that popped up out of nowhere claiming to be private yet no audit has ever been conducted which substantiated those claims?
And finally, I’m not sure that random or just popped up is an accurate characterization for us. We’re pretty well established at this point, having been around for nearly 15 years! I was an early user of this site and a frequent contributor during the early days of DuckDuckGo.
Those aren't proper audits. And again, bringing up the fact that it's open source is a meaningless piece of information since there is no way to verify it's the same software code on production. It only serves to trick the average user who doesn't understand how web servers work into trusting your service more.
The best thing you could do, if you actually care about privacy and not just $$$, is to open-source the entire search index db and accompanying webserver software, making it easy for users to setup their own local instance of DDG which is actually auditable. Additionally, posting a notice on-site which notifies your users that their searches may be recorded and tracked in spite of what the privacy policy says(due to the USA jurisdiction of the company making it susceptible to National Security Letters and secret gag orders) would be the right thing to do.
> open-source the entire search index db and accompanying webserver software, making it easy for users to setup their own local instance of DDG which is actually auditable
Easy to self-host? How large do you suppose the Bing index is, for example? Simply storing the index would be an immense undertaking beyond the reach of probably everyone who has ever self-hosted anything, ever. This ignores the compute required to actually search it, as well as how it would get updated.
I was curious, so as a point of comparison, the latest Common Crawl [0] is 3.1 billion pages and 370 TB uncompressed. I would presume that Bing would be significantly larger given commercial interests.
If somehow Google and AskJeeves worked perfectly fine 20 years ago for millions of monthly users, I find it hard to believe a modern powerful computer lacks the resources to support a search engine for a single person.
What is the largest hard disk one can buy nowadays? I found a WD Gold 20TB. You'd need 19 of them plugged into your computer just to hold the uncompressed archive from Common Crawl.
Your assumption is correct if you look at supercomputers, where the fastest in the world in 1999 could produce ~2.3 TFLOPS and in 2018 it could produce 122 PFLOPS which is around 5000 times the increase in FLOPS.
But i doubt most of the people you would want to go through this index has access to a super computer.
I wouldn't be surprised if the indexed subset of Facebook alone were more than 1000x larger than all of the indexed web 20 years ago. The web in general has probably expanded many millions or hundreds of millions of times.
Personally I wouldn’t mind if trash/spam sites like Facebook/Twitter were omitted from the database. As well as non-English content, being as though I only speak English. Remove trash/spam/non-english from the db and the size of that 300TB will be cut down substantially to the point it is feasible for a single person to store. After all, even if somebody wanted to store the whole 300TB db would cost about $4000 in hard drives which is not as totally out-of-reach as some people here are making it seem.
That was a very different internet. Search engines aren't something you build once and then you just have them. Constant, extensive work is necessary. It's quite literally a global-scale task to do this effectively.
> Those aren't proper audits. And again, bringing up the face that it's open source is a meaningless piece of information since there is no way to verify it's the same software code on production.
> The best thing you could do, if you actually care about privacy and not just $$$, is to open-source the entire search index db and accompanying webserver software, making it easy for users to setup their own local instance of DDG which is truly auditable.
self hosting isn't feasible for 99% of the population. DDG is aiming to be the mainstream privacy protecting search engine, I used them for a while and can appreciate their efforts. if you want something nerdy and and self hosted use a searX instance or host it yourself.
>self hosting isn't feasible for 99% of the population
Its only this way because companies have a vested interest in keeping it like that. It's how they make their money. It is absolutely within the realm of possibility that people host their own search engine. 99% of people know how to install Google Chrome right? this should be no different. The entire search engine & webserver stack it depends on could be bundled into a .exe/.app installer with simple instructions people can understand. Consider XAMPP- which already provides a webserver stack that is extremely easy to install on Windows/Mac just by a simple .exe/.app that 'just works'. This hypothetical search engine could use similar methods as the XAMPP installer. There is no technical reason why this can't happen. It just isn't happening because it'd increase competition, cutting into DDG's profits.
Sure, the problem with installing a local search engine is the installer technology. It can't be the petabyte of index information that the search engine actually needs, and the petaflops of CPU it would need to search through it.
Everyone has a PB of SSD disk space, some few TB of RAM and a few thousand CPUs to throw at the search problem, or is happy to type in a search query and give a 16 core CPU a few days to execute it, right?
> or is happy to type in a search query and give a 16 core CPU a few days to execute it, right?
That is just a naive implementation. For the first 10 results you grab ads, the database of those is significantly smaller, for the next 20 results you look at Wikipedia and stackexchange clone sites. Everything after that is indexed using math.random(). If you want to get fancy run the query through a fact creating AI and present the results inline, people are always happy to know that the color of the sky is purple or that the ideal amount of chess players is 5. Disclaimer: I have never seen googles source code nor any patents related to it, any similarity with existing search engines is pure coincidence.
I don’t know why you are framing this as an impossible task. It doesn’t need to be on the scale of Bing/Google to function. There are already some self-hosted search engine solutions that work okay. Just filter out all the trash sites with low quality content like Facebook/Twitter from the database and that 300TB common crawl could probably be cut down to a more reasonable 200TB. Filter out non-English results and it probably halves it further. I’m seeing 8TB drives on Newegg for $129. It absolutely does not take anywhere on the order of “days” to query a properly optimized db of this size.
I stopped trusting ddg when they said they were going to sensor Russian news. I assume google and other major search engines sensor political issues but I didn’t think ddg would.
You're not supposed to totally trust DDG, but they are a better default search engine than Google if you care about privacy.
- they are less likely to throw a captcha in your face if you connect over VPN
- they have less surveillance infrastructure and run less code clientside than Google does
- they are at least not explicitly tracking you
- they have a lower number of secondary data-points from other services that can be connected to your searches
the list kind of goes on. I don't assume that DuckDuckGo is perfectly trustworthy just because they say so, but Debian has a choice of a couple of different default search engines that are mature enough and give good enough results to use as a default search tool: Google, Bing, DuckDuckGo, etc...
Of those choices, DuckDuckGo seems to be a pretty reasonable decision.
At the very least, DuckDuckGo lets me search when I'm behind a VPN and have anti-fingerprinting tools turned on, Google very often doesn't. It's not a super-hard decision for me which one is more private.
privacy isn't the only one criterion to choose search engine. If search quality is very poor for a language, it's not suitable even if it's privacy oriented.
Sure, but that's really a separate conversation than what this thread has been talking about so far -- which is whether or not we can trust that DDG is more private than Google.
You aren't supposed to. Even if you assume they lie in every sentence about their data collection, with their current setup it would be much harder for them to build a valuable shadow profile about you.
They haven't been caught running fingerprinting scripts yet and they dont have an account system to tie to your searches. At best they could use your ip to build a shadow profile and thats wildly inaccurate in our mostly ipv4 world.
How do you know what server-side profiling occurs or does not occur? There is no way to know that. DDG gives people a completely misplaced and false sense of security, when they are just as easily comprimisable/corruptable/subpoenable/susceptible to NSLs, EDRs and secret court orders as any other company.
And I disagree with your premise that it's particularly difficult to link a persons IP to their real world identity. There are organized fraud gangs who have it down to a science. know exactly what dept. of the ISP to call, what to say, etc. Basically if someone knows your IP and your ISP account is registered in your name it's game over.
I am aware that they are susepctible to nation state level data collection, just like every site on the internet. I conduct all my non e2e encrypted communications/interactions with this in mind.
I'm more worried about teenage crooks equipped with Emergency Data Request PDF templates than any nation state. We know Google, Facebook, Snapchat etc were all giving up information on users without a court order to these crooks. All it took(probably still) was a EDR notice alleging an imminent threat to human life is about to occur -sent from a real or fake police dept email- and companies will hand over your data without second thought.
Even if they do server-side profiling, they can only track you on duckduckgo.com. Last I checked, DDG did not also own an analytics service that has infested half the world's websites.
> Last I checked, DDG did not also own an analytics service that has infested half the world's websites.
uMatrix shows a 3rd party request to improving.duckduckgo.com every time I visit a page from DDG search results, ostensibly to measure click-through rate. This is claimed to be anonymous, but in principle it gives DDG the opportunity to log much about their users' browsing habits.
Even in the worst case scenario you propose, where DuckDuckGo is deliberately lying and collecting more information than they claim and where those clickthrough requests are sending as much information as is possible for them to send, this is still exposing you to way less risk than Google Analytics.
It is still, I would claim, objectively more private to use DuckDuckGo than Google even in a world where they are lying about their privacy policies, purely because DuckDuckGo does not have the same surveillance scope and level of infrastructure as Google.
And that's really what we're arguing about here, unless you have a more private alternative to DuckDuckGo that has been subject to more rigorous audits and can scale to support being the default search engine for a bunch of nontechnical users?
Cynically speaking, I am not sure that there is an audit you're going to be able to do that won't cost a ton of money that the people in this thread would trust as definitive "proof" of anything[0].
I think a big part of what I'm personally getting at with the comment above is that I'm not looking for perfect proof of anything; independent audits are great and I love to see them and I absolutely encourage them, but remember that the point of comparison here is Google/Bing. Take it with a grain of salt, and purely opinion me, but I think its fine for private search engines to offer the best proof of their claims that they can and to otherwise ignore people who demand perfection or nothing.
It's great to see more search engines in the space with a focus on privacy, and if you're able to pull off building your own indexes, that's also a pretty big win. I wish there was a more obvious path forward for your company to make money (I get nervous when companies say, "we'll figure out funding later", to me that comes across as a little bit of a time bomb). But in general, always good to see more private options for people available.
If I was in your position and I was looking for audits, I'd honestly be looking at the same sources that DuckDuckGo's founder talks about further up-thread, because that would at least allow me to say, "the same sources that claim DuckDuckGo is private have also said that we are private." But it's not my area of expertise, so maybe that's bad advice.
As a regular user of the Javascript-less page, several months ago it started returning wildly different results than the “fully featured” version for the same queries. My uneducated guess is that it’s using a different index. There also appears to be some sort of rate-limiting wherein the results will frequently just be empty (using the JS version and same query resolves the issue).
I’m guessing they’re intentionally degrading the non-Javascript page as an anti-bot measure, but it’s so bad that I find it disingenuous to suggest that the non-Javascript page even a valid alternative at this point.
It is a legit company based in Pennsylvania, not some random website. Their privacy policy explicitly states they do not collect user info. If they are caught doing it anyway they could be open to legal action. While they may be lying, at least it's better than other search engines where collecting data is explicit and built into their business model.
That doesnt mean anything. I can go ahead and register an LLC in Pennsylvania too for a few hundred bucks and then put up a website with a completely fictional privacy policy. I could collect everyones IPs depite claims that we do not, and no one would be able to prove it.
I don’t understand why folk seem to think that admitting they are capable of fraud is some kind of dunk.
I mean - yea. Maybe your dentist never actually graduated dental school. Did you call the dental association to check he’s a member? Anyone can just print out a certificate on their home printer and put it on their wall, y’know. And even if he is, do you think the dental association actually called his college to verify the transcript he gave them when he joined in 1988 or whatever?
You really should do an independent audit of your dentist’s dentistry skills. Perhaps you should demand he does some kind of standardised test. But he can’t just go to a testing centre, you have no way of verifying that. He must do the test in front of you.
And how do you know the answers to the standardised test are correct, anyway? You will need to do a dentistry degree yourself first.
TLDR: A trustless society doesn’t work, and most people aren’t out to pull one over you.
> While they may be lying, at least it's better than other search engines where collecting data is explicit and built into their business model.
Just to be clear, are you saying that given the choice between collecting data and lying about it vs. collecting data and being explicit about it, you’d choose the first option?
Just to be clear, are you saying that given the choice between collecting data and lying about it vs. collecting data and being explicit about it, you’d choose the first option?
Yes. Absolutely. Because that would give me some legal recourse.
Would you hire someone who hides in the fine print they can steal from you and you can't do anything about it, or hire someone else and accept the chance that they might steal.
The choice is between a bad thing definitely happening, or a bad thing possibly happening.
Thanks for bringing this up. I don't understand why people seem to automatically trust things just because they advertise as being more "private" than the alternatives. I guess none of us are immune to advertising tactics, but it's so important to remember that they have no obligation to be truthful and will lie every chance they get.
I am not claiming that DDG is bad or anything, I just don't like putting trust into something just because it says "you can totally trust me!".