https://github.com/WICG/floc/issues/100 Skip to content Sign up Sign up * Why GitHub? Features - + Mobile - + Actions - + Codespaces - + Packages - + Security - + Code review - + Project management - + Integrations - + GitHub Sponsors - + Customer stories- * Team * Enterprise * Explore + Explore GitHub - Learn and contribute + Topics - + Collections - + Trending - + Learning Lab - + Open source guides - Connect with others + The ReadME Project - + Events - + Community forum - + GitHub Education - + GitHub Stars program - * Marketplace * Pricing Plans - + Compare plans - + Contact Sales - + Education - [ ] [search-key] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up Sign up {{ message }} WICG / floc * Notifications * Star 609 * Fork 43 * Code * Issues 52 * Pull requests 3 * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Security * Insights New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. Pick a username [ ] Email Address [ ] Password [ ] [ ] Sign up for GitHub By clicking "Sign up for GitHub", you agree to our terms of service and privacy statement. We'll occasionally send you account related emails. Already on GitHub? Sign in to your account Jump to bottom Cohort IDs can be collected over time to create cross-site tracking IDs #100 Open johnwilander opened this issue Apr 14, 2021 * 6 comments Open Cohort IDs can be collected over time to create cross-site tracking IDs #100 johnwilander opened this issue Apr 14, 2021 * 6 comments Comments @johnwilander Copy link @johnwilander johnwilander commented Apr 14, 2021 * edited In #99, it is stated that "FLoC is not useful for tracking." I don't think that's accurate. As far as I know, the user's cohort will not be partitioned per first party site so multiple sites can observe the cohort ID in sync as it changes week after week. A hash of the cohorts seen so far will likely get more and more unique as the weeks go by. Websites or tracker scripts on websites can expose arrays of the cohorts they've seen to help all trackers identify the user, like this: let cohortCollectionForWebsiteA = [ "week01_2022" : "0666", "week03_2022" : "A566", "week04_2022" : "2111", "week05_2022" : "1171", "week07_2022" : "749B", ] let cohortCollectionForWebsiteB = [ "week01_2022" : "0666", "week02_2022" : "0030", "week05_2022" : "1171", "week06_2022" : "7311", "week07_2022" : "749B", ] Trackers send these to a server for matching across websites, in the example above, resulting in the intersection [ "week01_2022", "week05_2022", "week07_2022" ]. The cohort collections can be tied to PII on sites that have access to such information about the user. This would allow a tracker with just a collection on one site to call a server and get back PII for that user. If cohorts were partitioned (maybe they are?), the tracking effort would take longer but observed partitioned cohort IDs can be sorted to potentially create a unique ID across websites. You get something like snippets of a DNA that eventually become unique, as a set, and trackers will know which cohorts are widespread and which quickly reduce the search space. Even if the tracker cannot get to a unique ID for a particular user, the entropy boost from collected cohort IDs is tremendous and can easily be combined with existing fingerprinting entropy such as language settings. Sorry if I'm missing something in the above analysis or if this was filed earlier. The text was updated successfully, but these errors were encountered: 47 5 [?] 3 16 @johnwilander Copy link Author @johnwilander johnwilander commented Apr 14, 2021 To take this to the crowd metaphor: Before the pandemic and some time back, I attended a Mew concert, a Ghost concert, Disney on Ice, and a Def Leppard concert. At each of those events I was part of a large crowd. But I bet you I was the only one to attend all four. 72 @dmarti Copy link Contributor @dmarti dmarti commented Apr 14, 2021 There is a suggestion to make the cohort "sticky" for a given site, so that once a site has seen the cohort ID once, it will not see a different one. ("Longitudinal Privacy" section: d822a35 ) 3 @johnwilander Copy link Author @johnwilander johnwilander commented Apr 14, 2021 There is a suggestion to make the cohort "sticky" for a given site, so that once a site has seen the cohort ID once, it will not see a different one. ("Longitudinal Privacy" section: d822a35 ) Thanks. That's more or less the same analysis. I don't think updating cohort IDs at different times solves the problem. See the sorting attack I mentioned. Making it sticky is interesting. First of all, I assume the website will not be allowed to delete it so it becomes a persistent "visited" flag. Second, the sticky cohort ID becomes a persistent fingerprinting signal per website that carries over even if different accounts log in to the site or the site tries to clear its state. Third, sticky cohort IDs could be set up for a small set of bounce tracking domains and be used to pick up a persistent ID. Finally, being persistently assigned a cohort for ad targeting purposes can be really bad for users (see the stories on baby ads after miscarriage and marriage ads after cancelled wedding) and probably not popular with advertisers who want "fresh" interest signals to target. 8 @dmarti dmarti mentioned this issue Apr 14, 2021 Sites recording a user entering and leaving a sensitive category #77 Closed @dmarti Copy link Contributor @dmarti dmarti commented Apr 14, 2021 Thank you, good points about how sticky cohort IDs could interact with other state preserved by a site. (#77 covers the similar issue of sites being able to observe the timing of when a user joins and leaves the "null cohort"). @othermaciej Copy link @othermaciej othermaciej commented Apr 15, 2021 If cohort IDs are sticky, would the user still be able to delete/ reset their cohort ID, e.g. by deleting website data or clearing history? @michaelkleber Copy link Collaborator @michaelkleber michaelkleber commented Apr 15, 2021 Hi John, Right, this is indeed the "Longitudinal privacy" question. We've been considering a few different mitigations. As you know, this is an iterative and open process, and we expect to implement one or more of these solutions in future versions of FLoC. (Remember that third-party cookies are still around in Chrome, so FLoC-based "slow fingerprinting" does not pose any tracking risk beyond what 3p cookies are already offering today.) 1. There's stickiness, as Don pointed out -- or maybe not permanent stickiness, but the cohort changing only slowly on each site. (Of course it would still need to be cleared along with any other first-party state, for the reasons you mentioned above. That would put a person into the has-no-cohort category until the next time it would get re-calculated.) 2. There's the related idea of computing a person's cohort at different times on different sites. This isn't the same as updating at different times, which I think you were referring to above. The idea here is that different sites a person visits would see a flock derived from a different time window. As @npdoty pointed out in #69, how useful this is depends on how different a person's browsing is on different days. Real-world data seems like the best way to measure the decrease in fingerprintability here. 3. There's the idea of adding per-site noise to the output of the hash function, as mentioned in the original explainer. This is mostly a Differential Privacy approach to further address the concern about leaking browsing history. But once you're adding noise, that noise can vary by which site you're on, so that your history of cohorts-over-time on different sites look pretty different from each other. This requires measuring the privacy/utility trade-off as you vary the amount of noise. If the noise weight is 0, we need to worry about the attack you described; if the noise weight is large, it drowns out the browsing-based signal entirely, and the cohort ID is effectively your per-site random number of the week. The question is whether there is a useful value in between. Those aren't the only possibilities, but they do seem collectively promising enough to warrant further exploration. Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment Assignees No one assigned Labels None yet Projects None yet Milestone No milestone Linked pull requests Successfully merging a pull request may close this issue. None yet 4 participants @dmarti @johnwilander @othermaciej @michaelkleber * (c) 2021 GitHub, Inc. * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.