I was expecting an ad for their product somewhere towards the end, but it wasn't there!
I do wonder though: why would this company report this vulnerability to Mozilla if their product is fingeprinting?
Isn't it better for the business (albeit unethical) to keep the vulnerability private, to differentiate from the competitors? For example, I don't see many threat actors burning their zero days through responsible disclosure!
I don't understand what you mean. What separates this from other fingerprinting techniques your company monetizes?
No software wants to be fingerprinted. If it did, it would offer an API with a stable identifier. All fingerprinting is exploiting unintended behavior of the target software or hardware.
It makes sense to me, they're likely not trying to actually fingerprint Tor users. Those users will likely ignore ads, have JS disabled, etc. the real audience is people on the web using normal tooling.
Side channels that enable intended behavior, versus a flat-out bug like the above, though the line can often be muddied by perspective.
An example that comes to mind that I've seen is an anonymous app that allows for blocking users; you can programmatically block users, query all posts, and diff the sets to identify stable identities. However, the ability to block users is desired by the app developers; they just may not have intended this behavior, but there's no immediate solution to this. This is different than 'user_id' simply being returned in the API for no reason, which is a vulnerability. Then there's maybe a case of the user_id being returned in the API for some reason that MIGHT be important too, but that could be implemented another way more sensibly; this leans more towards vulnerability.
Ultimately most fingerprinting technologies use features that are intended behavior; Canvas/font rendering is useful for some web features (and the web target means you have to support a LOT of use cases), IP address/cookies/useragent obviously are useful, etc (though there's some case to be made about Google's pushing for these features as an advertising company!).
The real reason is that fingerprint.com's selling point is tracking over longer periods (months, their website claims), and this doesn't help them with that.
Honestly it seems that most of Web Standards are used mostly for fingerprinting - I think a small number of websites uses IndexedDB (who even needs it) for actually storing data rather than fingerprinting.
That's why expansion of web standards is wrong. Browser should provide minimal APIs for interacting with device and features like IndexedDB can be implemented as WebAssembly library, leaking no valuable data.
For example, if canvas provided only access to picture buffer, and no drawing routines calling into platform-specific libraries, it would become useless for fingerprinting.
This excerpt from the article describes the risk well.
> In Firefox Private Browsing mode, the identifier can also persist after all private windows are closed, as long as the Firefox process remains running. In Tor Browser, the stable identifier persists even through the "New Identity" feature, which is designed to be a full reset that clears cookies and browser history and uses new Tor circuits.
Would it though? I guess state agencies already know all nodes or may know all nodes. When you have a ton of meta-information all cross-linked, they can probably identify people quite accurately; may not even need 100% accuracy at all times and could do with less. I was thinking about that when they used information from any surrounding area or even sniffing through walls (I think? I don't quite recall the article but wasn't there an article like that in the last 3-5 years? The idea is to amass as much information as possible, even if it may not primarily have to do with solely the target user alone; e. g. I would call it "identify via proxy information").
There's an instructive example on the page. Suppose a page creates the databases `a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p`, then queries their order. They might get, for example `g,c,p,a,l,f,n,d,j,b,o,h,e,m,i,k`, based on the global mapping of database names to UUIDs.
The key vulnerability here is that, for the lifetime of that Firefox process, any website that makes that set of databases is going to see the exact same output ordering, no matter what the contents of those databases are. That makes this a fingerprint: it's a stable, high-entropy identifier that persists across time, even if the contents of those databases are not preserved. It is shared even across origins (where the contents would not be), and preserved after website data is deleted -- all a website has to do to re-acquire the fingerprint is recreate the databases with the same names and observe their ordering.
It's the mapping of UUIDs to databases that is shared across origins in the browser. Only the subset of databases associated with an origin are exposed to that origin.
Disabling JavaScript actually greatly increases your fingerprint as not many users turn it off, so that instantly puts you in a much smaller bucket that you need to be unique in. Yes, not having JS means it limits your options for gathering other details, but it also requires much less effort to be unique now without JS.
Tor Browser also doesn't spoof navigator.platform at all for some reason, so sites can still see when you use Linux, even if the User-Agent is spoofing Windows.
> Disabling JavaScript actually greatly increases your fingerprint as not many users turn it off, so that instantly puts you in a much smaller bucket that you need to be unique in.
I've heard a handful of people say this but are there examples of what I would imagine would have to be server-side fingerprinting and the granularity? Since most fingerprinting I'm aware of is client-side, running via JS. While I expect server-side checks to be limited to things like which resources haven't be loaded by a particular user and anything else normally available via server logs either way, which could limit the pool but I wonder how effective in terms of tracking uniqueness across sites.
I have my problems with that argument. Yes, less identifying bits means a smaller bucket but for the trackers, it also means more uncertainty, doesnt it? So when just a few others without JS join your bucket eg. via a VPN, profiling should become harder.
> increases your fingerprint as not many users turn it off
We're talking about users of the Tor browser, and I'd be very surprised if this was the case (that a majority keep JS turned on)
Basically every Tor guide (heh) tells you to turn it off because it's a huge vector for all types of attacks. Most onion sites have captcha systems that work without JS too which would indicate that they expect a majority to have it disabled.
I would imagine most users of Tor are using Tor Browser. I am reading there was a responsible disclosure to Mozilla but is it me or did that section leave out when the Tor Project planned to respond or release a fixed Tor Browser? Do they like keep very close or is there a large lag?
The best for Tor would just be Links2/Links+ with the socks4a proxy set to 127.0.0.1:9050, enforcing all connection thru a proxy in the settings (mark the checkbox) and disabling cookies altogether.
In the last ten years has qubes moved on to support more hardware? Every 4 years I would try to use it only to find it didn't support any of my hardware.
We buy off the shelf laptops, not sure anyone ever checked that it can run Qubes specifically before trying to install it (I'm sure of at least one person: myself). Doesn't just about any x64 machine with hardware where drivers are available in standard kernels also work with Qubes? What have you bought that's not supported?
Most hardware (especially GPUs) is hard to virtualize in a secure manner, which is the entire point of Qubes. People who use it typically buy compatible hardware.
Tested hardware can be found here https://qubes-os.org/hcl. New hardware is being constantly added. If you plan to switch to Qubes, consider buying something from that list or, better, certified, or community-recommended hardware linked there.
On Qubes, you do not create a new identity in the same VM. This would go against the Qubes approach to security/privacy. Using separate VMs for independent tasks is the whole point of using Qubes.
> For developers, this is a useful reminder that privacy bugs do not always come from direct access to identifying data. Sometimes they come from deterministic exposure of internal implementation details.
> For security and product stakeholders, the key point is simple: even an API that appears harmless can become a cross-site tracking vector if it leaks stable process-level state.
This reads almost LLM-ish. The article on the whole does not appear so, but parts of it do.
Well that sucks. I guess in the long run we need a new engine and different approach. Someone should call the OpenBSD guys to come up with working ideas here.
> Mozilla has quickly released the fix in Firefox 150 and ESR 140.10.0, and the patch is tracked in Mozilla Bug 2024220.
Did you even read the article at all? Ah my children did bad in school, time to replace them with new children and a different spouse. This is what you're suggesting essentially. A browser is not just something you simply make out of thin air. There's decades of nuance to browser engines, and I'm only thinking of the HTML nuances, not the CSS or JS nuances.
Given the dangers of JS and WASM they could just fork Netsurf and enhance the CSS3 support. If you are a journalist, running Tor with JS and tons of modern web tech enable makes you a bright white spot in a sea of darkness.
Very cool research and wonderfully written.
I was expecting an ad for their product somewhere towards the end, but it wasn't there!
I do wonder though: why would this company report this vulnerability to Mozilla if their product is fingeprinting?
Isn't it better for the business (albeit unethical) to keep the vulnerability private, to differentiate from the competitors? For example, I don't see many threat actors burning their zero days through responsible disclosure!
We don't use vulnerabilities in our products.
I don't understand what you mean. What separates this from other fingerprinting techniques your company monetizes?
No software wants to be fingerprinted. If it did, it would offer an API with a stable identifier. All fingerprinting is exploiting unintended behavior of the target software or hardware.
It makes sense to me, they're likely not trying to actually fingerprint Tor users. Those users will likely ignore ads, have JS disabled, etc. the real audience is people on the web using normal tooling.
Uhh okay, so they do exploit vulnerabilities, they just try to target victims who can be served ads? What a weird distinction.
Well presumably they want to make money.
Side channels that enable intended behavior, versus a flat-out bug like the above, though the line can often be muddied by perspective.
An example that comes to mind that I've seen is an anonymous app that allows for blocking users; you can programmatically block users, query all posts, and diff the sets to identify stable identities. However, the ability to block users is desired by the app developers; they just may not have intended this behavior, but there's no immediate solution to this. This is different than 'user_id' simply being returned in the API for no reason, which is a vulnerability. Then there's maybe a case of the user_id being returned in the API for some reason that MIGHT be important too, but that could be implemented another way more sensibly; this leans more towards vulnerability.
Ultimately most fingerprinting technologies use features that are intended behavior; Canvas/font rendering is useful for some web features (and the web target means you have to support a LOT of use cases), IP address/cookies/useragent obviously are useful, etc (though there's some case to be made about Google's pushing for these features as an advertising company!).
The real reason is that fingerprint.com's selling point is tracking over longer periods (months, their website claims), and this doesn't help them with that.
So it's the criminal that convinced themselves they are the good guys, I didn't expect that one. You are a malware company get a grip.
Would you prefer that they kept this for themselves instead of disclosing it?
I get criticizing their business and what they do wrong, but doesn't seem right to criticizing them for doing the right thing.
They probably are not relying on it and disclosure means others can't either.
I question why websites can even access all this info without asking or notifying the user.
Why don't browsers make it like phones where the server (app) has to be granted permission to access stuff?
Honestly it seems that most of Web Standards are used mostly for fingerprinting - I think a small number of websites uses IndexedDB (who even needs it) for actually storing data rather than fingerprinting.
That's why expansion of web standards is wrong. Browser should provide minimal APIs for interacting with device and features like IndexedDB can be implemented as WebAssembly library, leaking no valuable data.
For example, if canvas provided only access to picture buffer, and no drawing routines calling into platform-specific libraries, it would become useless for fingerprinting.
From the sounds of this it sounds like it doesn't persist past browser restart? I think that would significantly reduce the usefulness to attackers.
This excerpt from the article describes the risk well.
> In Firefox Private Browsing mode, the identifier can also persist after all private windows are closed, as long as the Firefox process remains running. In Tor Browser, the stable identifier persists even through the "New Identity" feature, which is designed to be a full reset that clears cookies and browser history and uses new Tor circuits.
This is where you use id bridging.
1. Website fingerprints the browser, stores a cookie with an ID and a fingerprint.
2. During the next session, it fingerprints again and compares with the cookie. If fingerprint changed, notify server about old and new fingerprint.
Many users leave their browsers open for months.
Would it though? I guess state agencies already know all nodes or may know all nodes. When you have a ton of meta-information all cross-linked, they can probably identify people quite accurately; may not even need 100% accuracy at all times and could do with less. I was thinking about that when they used information from any surrounding area or even sniffing through walls (I think? I don't quite recall the article but wasn't there an article like that in the last 3-5 years? The idea is to amass as much information as possible, even if it may not primarily have to do with solely the target user alone; e. g. I would call it "identify via proxy information").
I'm confused.
The IndexedDB UUID is "shared across all origins", so why not use the contents of the database to identify browers, rather than the ordering?
There's an instructive example on the page. Suppose a page creates the databases `a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p`, then queries their order. They might get, for example `g,c,p,a,l,f,n,d,j,b,o,h,e,m,i,k`, based on the global mapping of database names to UUIDs.
The key vulnerability here is that, for the lifetime of that Firefox process, any website that makes that set of databases is going to see the exact same output ordering, no matter what the contents of those databases are. That makes this a fingerprint: it's a stable, high-entropy identifier that persists across time, even if the contents of those databases are not preserved. It is shared even across origins (where the contents would not be), and preserved after website data is deleted -- all a website has to do to re-acquire the fingerprint is recreate the databases with the same names and observe their ordering.
It's the mapping of UUIDs to databases that is shared across origins in the browser. Only the subset of databases associated with an origin are exposed to that origin.
The content is obviously scoped to an origin, or IndexedDB would be a trivial evercookie.
Does Tor Browser still allow JavaScript by default? Because if you block execution of JavaScript, you won't be affected from what I understand.
Disabling JavaScript actually greatly increases your fingerprint as not many users turn it off, so that instantly puts you in a much smaller bucket that you need to be unique in. Yes, not having JS means it limits your options for gathering other details, but it also requires much less effort to be unique now without JS.
Tor Browser also doesn't spoof navigator.platform at all for some reason, so sites can still see when you use Linux, even if the User-Agent is spoofing Windows.
> Disabling JavaScript actually greatly increases your fingerprint as not many users turn it off, so that instantly puts you in a much smaller bucket that you need to be unique in.
I've heard a handful of people say this but are there examples of what I would imagine would have to be server-side fingerprinting and the granularity? Since most fingerprinting I'm aware of is client-side, running via JS. While I expect server-side checks to be limited to things like which resources haven't be loaded by a particular user and anything else normally available via server logs either way, which could limit the pool but I wonder how effective in terms of tracking uniqueness across sites.
I have my problems with that argument. Yes, less identifying bits means a smaller bucket but for the trackers, it also means more uncertainty, doesnt it? So when just a few others without JS join your bucket eg. via a VPN, profiling should become harder.
> increases your fingerprint as not many users turn it off
We're talking about users of the Tor browser, and I'd be very surprised if this was the case (that a majority keep JS turned on)
Basically every Tor guide (heh) tells you to turn it off because it's a huge vector for all types of attacks. Most onion sites have captcha systems that work without JS too which would indicate that they expect a majority to have it disabled.
I would imagine most users of Tor are using Tor Browser. I am reading there was a responsible disclosure to Mozilla but is it me or did that section leave out when the Tor Project planned to respond or release a fixed Tor Browser? Do they like keep very close or is there a large lag?
Tor Browser is always quick to rebase on the latest Firefox ESR. They released an update the next day:
https://blog.torproject.org/new-release-tor-browser-15010/
The best for Tor would just be Links2/Links+ with the socks4a proxy set to 127.0.0.1:9050, enforcing all connection thru a proxy in the settings (mark the checkbox) and disabling cookies altogether.
Would whonix fit that bill?
It seems Qubes OS and Qubes-Whonix are not affected.
In the last ten years has qubes moved on to support more hardware? Every 4 years I would try to use it only to find it didn't support any of my hardware.
No problems on framework laptop that I've run into at least.
We buy off the shelf laptops, not sure anyone ever checked that it can run Qubes specifically before trying to install it (I'm sure of at least one person: myself). Doesn't just about any x64 machine with hardware where drivers are available in standard kernels also work with Qubes? What have you bought that's not supported?
Actually, it should work indeed, unless it lacks some Linux drivers or VT-d.
Most hardware (especially GPUs) is hard to virtualize in a secure manner, which is the entire point of Qubes. People who use it typically buy compatible hardware.
I would expect that most Qubes users (including myself) do not virtualize GPUs and use the CPU to render graphics outside of dom0.
Tested hardware can be found here https://qubes-os.org/hcl. New hardware is being constantly added. If you plan to switch to Qubes, consider buying something from that list or, better, certified, or community-recommended hardware linked there.
How so? If you kept a disposable VM open and just created new identities in tor browser, how does Qubes mitigate the threat here?
On Qubes, you do not create a new identity in the same VM. This would go against the Qubes approach to security/privacy. Using separate VMs for independent tasks is the whole point of using Qubes.
Source?
Different VMs result in different identifiers.
> For developers, this is a useful reminder that privacy bugs do not always come from direct access to identifying data. Sometimes they come from deterministic exposure of internal implementation details.
> For security and product stakeholders, the key point is simple: even an API that appears harmless can become a cross-site tracking vector if it leaks stable process-level state.
This reads almost LLM-ish. The article on the whole does not appear so, but parts of it do.
Well that sucks. I guess in the long run we need a new engine and different approach. Someone should call the OpenBSD guys to come up with working ideas here.
> Mozilla has quickly released the fix in Firefox 150 and ESR 140.10.0, and the patch is tracked in Mozilla Bug 2024220.
Did you even read the article at all? Ah my children did bad in school, time to replace them with new children and a different spouse. This is what you're suggesting essentially. A browser is not just something you simply make out of thin air. There's decades of nuance to browser engines, and I'm only thinking of the HTML nuances, not the CSS or JS nuances.
Given the dangers of JS and WASM they could just fork Netsurf and enhance the CSS3 support. If you are a journalist, running Tor with JS and tons of modern web tech enable makes you a bright white spot in a sea of darkness.
Here you go: https://qubes-os.org.