quickly realized that some of the fingerprinting information could be useful for VM detection because vendor names were exposed. In this particular instance the string "VMWare" was contained within the WebGL information. After some more testing I also discovered that VirtualBox reported the same kind of information.
I believe there are patches that can close those holes, but I've always found the fact that such information is exposed by default and can thus make a VM obviously not look more like real hardware is puzzling. Ideally, a VM should be indistinguishable from real hardware, and in practice that ideal is difficult to achieve --- especially with timing-based detections --- but you'd think such obvious signs wouldn't appear.
Also, the amount of information that can be gathered via JS is disturbingly immense. To me, this is just further validation of the fact that JS needs to be off by default and whitelisted only for the (very few) sites that one truly trusts.
There's a lot of paravirtualized devices even within an HVM type of VM. This is necessary for performance. You don't want to have to emulate a lot of things if you don't have to.
I feel like this might be a better and more feasible solution. Is there a reason that JS can collect as much data as it does about our hardware? I feel like I always hear the mantra, "The browser is a sandbox," but reading articles like this make me really unconfident that that is true. I don't really have too much of an idea of how WebGL works, but I wonder if there's a way to create some sort of additional abstraction layer between a website and the hardware? So JS just has access to "hardware interfaces" like a CPU and a GPU but can only interact with via this interface. That way, even if a site wanted to, the best it could do is determine that your computer has a GPU or a CPU but not how many cores or what type of GPU?
This would however have the downside of incurring an additional latency with an additional abstraction, but if there was a way you turn this off with trusted websites and only leave it on when you're using a site you don't trust it could be more usable? It just seems better than disabling JS entirely since a lot of websites just completely break without JS.
It just seems like with the amount of information that JS can collect, even if you're using TOR or a VPN, if you crunch all the information about a particular user, like the kind of OS they're running, the version of the browser, the screen ratio, mouse click movements, time of access, number of CPU cores, type of GPU, whether or not it's a VM, etc., it just feels like you might be able to devise a pretty reasonable heuristic for where this person is and the kind of computer they're using. I can't really say I know the extent of browser and JS capabilities, but these things already seem alarming enough where I wouldn't really feel super confident that I can't be tracked even with TOR or a VPN.
The user should have full control over everything. For example, if the user want to configure it so that all JavaScript time reporting reports that everything takes zero time, then that is what it should do in that case. If the user wants all timeouts to expire immediately (so that JavaScript-based animations will take zero time), that can also be done, then.
You should have enough ropes to hang yourself, and also a few more just in case.
> I've always found the fact that such information is exposed by default and can thus make a VM obviously not look more like real hardware is puzzling.
Eh, it seems to me that software ought to be cooperative by default. Plenty of programs will detect whether you have AMD or nVidia graphics and optimize itself for your hardware—why not VMWare graphics?
Where I agree is that there ought to be an easy checkbox to hide it.
I agree with this in most cases, but when it comes to Javascript and fingerprinting, I think systems should be as generic as possible. This isn't even a VM versus bare metal thing.
...I don't know. I realize this is an entirely subjective view of what a web page should be, but I just don't think any website should know the intricacies of my hardware. The web is a low-friction, low-trust environment; installing a desktop app has more friction, but it also acts as a signal of greater trust.
If a website ever really needs to know my hardware, it can ask me to choose from a drop-down. A lot of users won't know what hardware they have—but, those users are also unlikely to understand the implications of a hardware-detection permission prompt.
I mean, it's the usual duality. If a browser is for browsing documents, then of course you don't need that. And if a browser is a method for running arbitrary applications pseudo-safely, then it absolutely should be doing that.
The web APIs seem to be filled with features that in theory could be useful but you would have to do some serious hunting to find a legitimate user while you are flooded with examples of evil uses.
Firefox removed the battery API for this. In theory you could do something like show a stripped down site for low power users or something but it was only ever used for tracking.
While browsers have been used for a lot now, gaming seems like the one place we have seen virtually no use outside of random 2D games. I doubt there is a single web game that actually makes useful use of the gpu vendor details.
The problem with limiting browser features is that it makes web apps less competitive with apps on propertiary platforms. I agree not all websites should have access to battery API, but the user should decide on that, not a browser vendor. The same goes for all other limitations imposed.
I think the sane argument here is for sensible defaults. Leaving all those switches turned on is just opening the door for adtech. The set of information that's made available out of the box should be small, and if you need to access information about my graphics card, you can ask for it.
I think this only works if the average consumer can assess what is being asked for, though. “Do you want to let this website know what hardware you have?” is not a simple question.
“What is hardware?”
“Should I let a game know my hardware? Should I let a news website know my hardware?”
It is a discussion how to do it right. And surely, if propertiary platforms can do it, browsers can do it too. Especially propertiary platform app has permission to the system data enabled on default. Browser asking for it, would make web apps safer than native ones
Makes sense to me. There are a lot of legitimate reasons software on your computer might need to know it's in a VM or what the limits of your VR engine are.
...no, that's where I don't agree. Again, software should attempt to be truthful by default. There's a reason we allow programs to detect the hardware they're running on—it allows for all sorts of optimizations.
Does the VM claim it's network driver was manufactured by Broadcom, or does it go with Cambridge Silicon Radio? Or does it decline to provide a vendor, and if it does, how long until software starts assuming that is the sign of a VM, except this time with potential false positives for users of niche hardware?
I think a key distinction is whether we're in an adversarial context or not. And I think loading untrusted applications over network connection from arbitrary third parties is absolutely a position to be distrustful.
> Ideally, a VM should be indistinguishable from real hardware
VMs wouldn't achieve the performance they do without paravirtualization. VMs don't meticulously emulate all attached virtual devices. Paravirtualized device drivers more or less forward the I/O request to the hypervisor which handles it in a VM-specific way. For example, it's not super useful for a VM to emulate all the bitwise register-twiddling dances needed to talk to a SATA controller, it can simply have some "backdoor" channel into the hypervisor that says "Queue request to write X to LBA Y for this VM".
Since paravirtualization requires specific drivers, it will always be detected.
While changing the reported device names can just make the VM a little less obvious, I suspect there will always be clues that indicate a VM. For example:
- Do network adapter MAC vendor ID's make sense?
- Does the hard drive size make sense?
- Which 3D acceleration features are supported/work correctly?
- How much graphics RAM is there?
- Timing-based methods
It's a bit like detecting private/incognito mode in a browser. Everything worked great, until websites realised there are ways to detect it, then became a game of cat and mouse.
>- Which 3D acceleration features are supported/work correctly?
vmware workstation has 3d acceleration support targeting directx 11, so I'd imagine most features are supported and are passed through to the host gpu for execution. I doubt you'll able to detect is a vm or not based on that. In addition, resistfingerprinting (on firefox) hides this kind of stuff.
>- How much graphics RAM is there?
vmware workstation has vram selectable from 32MB all the way to 8GB, so that covers the entire range of plausible vram sizes.
>- Timing-based methods
what else can you test that is both accessible via browser api and would yield big differences between a vm and a slow computer?
>It's a bit like detecting private/incognito mode in a browser. Everything worked great, until websites realised there are ways to detect it, then became a game of cat and mouse.
not really, it's a solved problem: make a new browser profile and then delete it after you're done. I've seen a few HN commenters post their (relatively short) scripts to make a new firefox/chromium, start it, and then automatically delete it once it exits.
> Ideally, a VM should be indistinguishable from real hardware
Why? I mean, it's possible to make a VM that's indistinguishable (except for speed), but what's the purpose of doing so?
Most people who run VMs have the purpose of "I want this application to run more conveniently than having dedicated hardware for it." For that purpose, it's useful to provide abstractions (e.g., providing dedicated access to CPUs in a way normal kernels usually don't) and usually to provide sandboxing (e.g., prohibiting disk writes outside of the VM disk), but there's generally little point in lying, unless the software you want to run won't run right without lying. And it's often counterproductive to lie, because software can adapt to the ways the abstraction is leaky if you're truthful about the nature of the abstraction. (In this case, the VMware graphics driver can achieve much better performance by cooperating with the host than a normal graphics driver expecting physical hardware could get on a software emulation of that hardware.)
It's also often pointless to lie - if you pay for a VM from Amazon EC2, and you log into it and it pretends to be a 1U physical server, are you going to believe it?
I guess the line of thinking is "the more differences there are, the leakier the abstraction is".
The obvious case people will think about is the security angle (by using a vm you to impersonate a consumer end user, so for eaxmple malware doesn't realize it's running in a vm). But there are other cases where murphy's law will bite you.
For example I bet some WebGL apps manage to trip themselves up over this feature string because of whitelists or buggy logic in code that tries to be clever about used features vs underlying platform.
I didn't know WebGL leaks so much information about my machine. I already open a browser just in ingokigno/private mode 90% of the time. Maybe is time to start just each time a different VM for browsing.
That wouldn't really help. The containers/VMs might be separated from each other, but they're running on the same hardware/software stack, so they'll behave the same should you decide to fingerprint it.
I meant for convenience since launching VM is part of the OS.
Wouldn't every Qubes VM (whatever the underlying physical machine) return the same fingerprint? Something like VM Fedora version XXX running on Xen hypervisor.
Depends how they do 3d rendering. If it's passed through to the host gpu, then it's fingerprintable. If it's using some sort of software renderer that might be fine, but the performance is going to be garbage.
For timing based attacks, couldn't we start looking at injecting random jitter into the JS runtime? You could source the timing entropy from a CSPRNG so that attackers would be unable to distinguish from random noise. Obviously, this would have an impact on performance, but it could be something that is controlled on a per-host basis as required.
I am honestly more surprised by the fact that we let these kinds of APIs creep into our browsers. What’s the scenario where a website needs to know how much RAM or what kind of video adapter I have? I get it for a game or a desktop app, but a website?
At the end of the day, we all know that any kinds of unique identifiers will be used in combination. We need to reduce those to an absolute minimum. Today’s browser APIs are leaking information like a 1920s faucet.
Sometimes people put games and desktop apps in websites.
And, really, when you download a game or desktop app as a traditional executable, the OS gives it so much access to your private data that the term "leak" isn't meaningful any more. Any video game you install as a .exe can silently access every email and online banking account you either are logged into or will log into in the future.
A video game installation is a conscious choice that I can make depending on whether I trust the vendor or not. Me visiting New York Times and getting 58 trackers scraping my device configuration and preferences is not a choice.
It is not. I can open someone’s blog without knowing what trackers they have. A site has inherently a different trust boundary than an executable, and it should stay that way.
Not only that, you can vet someone's blog, decide you are ok with the trackers and revisit 24 hours later only to find that they've changed since you last visited.
And? Why should we still be adapting the security and privacy model we had 20 years ago?
To your analogy, 20 years ago this bit us all in the ass just as much because every EXE brought with it all sorts of toolbars and adware. I don’t want the web to become this.
20 years ago laptop weight was measured in pounds, clock speeds we're in MHz, storage was in single GB ranges, and battery life was a quarter of what it is today. Dial up was common, there was no youtube or spotify. Let's not hold ourselves to the standards of 3 generations ago of technology
> I get it for a game or a desktop app, but a website?
Browsers "had to" replace Java Applets and Flash. With the great side effect that users can no longer easily just disable those plugins to get rid of the malware build on top.
You already get banished from half of the internet for hiding your IP address, I hope this wouldn't be used to make our lives even worse.
Imagining a grim future where sites block all ad blockers (how about some detecting if an ad blocker exists at all, rather than its usage?), VPNs, virtual machines, even incognito mode. No full trust = no website.
Then do go to those websites. It's not a right that you have access to a website.
Seek out only sites that don't block adblock, or has no ads. This might include a paid option (for example, youtube premium, or twitch subscription for ad-free viewing).
I may not have the right to have access to most of these websites but I do have the right to complain about how unethical I find it.
And of course this situation has different implications if the website offers some sort of an essential service, but that's an entirely different discussion.
There are many ways to pay for hosting. Ads are just one way. Since they have proven so lucrative for some well positioned businesses, those businesses have pushed the meme that the only choices are ads or paid subscription. There are other ways. I’ve run sites where I happily paid
the hosting bill because I correctly suspected it would lead to me getting the job I wanted down the road. You could say that sounds like
advertising, but it was not run as a business, much less an advertising business. There was no self promotion whatsoever other than bragging rights.
Many people have agendas, including sometimes very positive agendas, that lead them to support online services of various types in order to further those agendas.
None of this is to say there’s no place for advertising or for paid subscriptions. I’m just saying there are often more options.
Ads, fine. Participating into a pervasive tracking infrastructure that strips you bare of your privacy to optimize manipulation of your individual cognitive faults is entirely another thing.
Ooof, yeah. I was trying to make a new, purely anonymized identity. Went through an anonymized bitcoin VPN with TOR on top. Registered an email through Protonmail.
Pretty much no social media platform will accept Protonmail as an address without also having a phone number.
Got banned from Discord within 3 hours, literally all I'd done was send three friend requests and join one discord. My IP was rotating and I then needed to have 2-factor authentication (and protonmail wasn't allowed, I needed that phone number).
So, I went out and bought a burner phone, cash, with a 1-year prepaid account. Got it setup over a wired proxy with all radios turned off. Now at least I had a Google account! (they also require a phone number)
And Discord proceeded to reject it, because I needed to have a 'real' phone number from a major carrier.
I essentially needed to craft an entirely new identity if I wanted to be truly anonymous. It was eye-opening how invasive and pervasive the 'track you down to a real identity' accounts have become.
I know Twitter does this. Also saw someone mention Facebook asked for the same when they tried to delete their account. It's a mad mad world out there.
That might be consequence if GDPR: they need legal proof of the legitimacy of the request and of its thorough execution.
Just as you can’t really “unsee”, you can’t fully delete data once you’re exposed to it. Something documenting its previous existence will always remain
I had to create a FB account for college, and somehow my account was suspicious according to them. They locked it and said that I needed to provide a picture of my national ID in order to unlock it.
I was baffled and just created another account to be honest.
Great question! I've heard that in-person cash exchange is your best bet, if you're wanting to be completely off the grid. There are services out there that can help with the exchange (early in Bitcoin, there was an escrow service in which you deposited cash into a savings account through an ATM, and an anonymous wallet would be credited with the appropriate Bitcoin). Not sure where that stands now.
Oh! But to your question: Mullvad VPN seems to be highly regarded, and has an option to configure a recurring payment that they (claim to) decouple from your identity. Their service even supports defining multi-hop routes.
And to be clear: My goal is to obfuscate my identity from malicious individuals (think: politician that wants to be kinky, but has to resort to online interactions during COVID lockdown, and wants to avoid both simple tracert IP identification as well as a potential password breach & leak of social media platform X). Hiding my identity from governments is not my goal, so trusting Mullvad was an acceptable risk assessment. I'd add additional layers if I wanted to be more anonymous.
My next step, when I get around to it, is to try to track down a cheap anonymous virtual host to SSL into... then at least I'll have a static IP. But I'd still be up a creek if they ever wanted to do two factor for some reason.
You can buy Monero instead, and funnel it through xmr.to to pay in Bitcoins. Another alternative might be fixedfloat.com. Tested both, but you had better not exchange big amounts.
Sorry to say, but if you act like someone you aren't, you are not going to be welcome at websites that want to know they are responding to a real person.
Basically you are acting like you have something to hide which typically is something that people up to no good would do. I'm not saying you are up to no good, but your activity mirrors as such.
It's just unfortunate that companies are employing precog future-crime concepts to what (should) be standard privacy approaches.
And, to be clear, I'm not acting like someone I'm not. I'm forthcoming that I have an identity, and I'm even willing to prove that I'm a self-consistent individual. I'm acting like someone that has purchased a month-to-month phone, and have signed up for a free email account, and values their internet privacy. It's become apparent to me that a number of companies discriminate against people for whom this is their only choice... heaven help you if you can't afford a proper phone (or don't have the credit to open a proper cellphone account).
As a transgendered person that's been discriminated against, I do indeed have valid reasons for wanting my most personal conversations to be secure from being used against me. There are other aspects of my private life that that would be wildly misinterpreted if taken out of context. I would prefer to keep my private life wholly separate from my public persona, and that's... harder to do in the age of COVID isolation where everything is online.
It's astonishing to me how many people just... don't care about their privacy these days. And that the bar is so crazy high to be anonymous. To the point that wanting to be anonymous for a few hours is indistinguishable from being up to no good?!
Hell, I'd have to _break_ the law and craft a truly new identity just to be able to be anonymous on some of these platforms that don't have real-world identities as a sign-up requirement.
For my money, I can't help but think of people who have legitimate reasons for wanting privacy and how my relative lack thereof deprives them of it. For every one of us that has a well defined presence, it becomes that much easier to spot people who hide, and like you say, there are perfectly legitimate reasons to do so.
I would venture out even and say that categorically, there are unjust laws that people should be able to hide from - I would want to help them do that. By creating a culture that respects privacy, we insulate ourselves against a lot of the damage that can be done by a poorly managed legal system / toxic culture / opportunistic economic structure. I was just recently reading about how the military buys adtech data to track foreign nationals. I feel like we're living out a Gibson novel.
I do try to "fuzz" my presence as much as possible, because of this. However, like you mention, it ain't easy.
Thank you so much. I was worried recounting my experience would be perceived much like the first person responded. It feels good to know there are those out there that get it.
I think the thrust of the complaint is that the society doesn't sufficiently value privacy for its own sake.
In a different kind of society, websites of the sort that you describe would be unpopular, because users at large would place less value on the real-person guarantee than on the non-collection of their identifying information.
Closely related to this is the topic of end-to-end encryption. To withstand attacks that I expect will continue to be mounted on it, I think the society has to believe in privacy as a terminal good. The answer to the argument that "we could catch such and such criminals if we had key escrow" ought to be "yes, and not catching those criminals is a price we agree to pay, because privacy is just that valuable to us".
The fact that browsers allow websites to see whatgpu you have installed, driver version and enumerate your fonts is clearly put there for the benefit of tracking people.
Otherwise browser developers would create generic classes of device that segment users into large groups based on features.
The fonts thing is proof. For years, you install a custom font on your pc. Then you are unique.
I understand the technical challenge here, but pay a thought to students who are subjected to online proctoring software that takes a huge amount of onerous control over a student's computer [1]. This kind of software means that students running a vm to mask their computer from heavy-handed software may make it harder for a person to remain in control of their own system. I don't think this knowledge is bad to share, but a person using a virtual machine should never be penalised their degree.
The browser should ask permission from the user to divulge hardware information like this (something that looks like “This website wants to display 3D graphics, but this may divulge information used to track you”, etc.) We already have this for divulging your GPS location and notifications, why not add this too?
Yes, it means you’d get a lot more nagging, but to me, that’s good signal for what sites I should be avoiding in the first place.
I think you severely overestimate the resolve of the average user. They'd be very easily socially engineered into clicking yes, especially when the website is holding the content hostage. Not to mention, there's a wide array of fingerprinting techniques, and many of them have legitimate/common uses in normal sites. eg. window size, browser timezone, canvas.
For those that have access to Menlo Security’s safe browsing saas service, it’s worthwhile analysing what client side JavaScript can tell you about the runtime environment. Device aspect ratio is just one of the strange things reported.
That looks very interesting indeed! But it's so absurdly expensive that you have to wonder who the target audience for this service is? From their "Use Cases" page it seems that this is geared towards (shady?) online marketers (which would explain the price) rather than privacy-conscious individuals. Would be awesome if they offered a plan for normal users priced similar to a normal VPN service...
I believe there are patches that can close those holes, but I've always found the fact that such information is exposed by default and can thus make a VM obviously not look more like real hardware is puzzling. Ideally, a VM should be indistinguishable from real hardware, and in practice that ideal is difficult to achieve --- especially with timing-based detections --- but you'd think such obvious signs wouldn't appear.
Also, the amount of information that can be gathered via JS is disturbingly immense. To me, this is just further validation of the fact that JS needs to be off by default and whitelisted only for the (very few) sites that one truly trusts.