https://github.com/WICG/floc/issues/100

Skip to content
 
Sign up Sign up

  * Why GitHub?
    Features -
      + Mobile -
      + Actions -
      + Codespaces -
      + Packages -
      + Security -
      + Code review -
      + Project management -
      + Integrations -
      + GitHub Sponsors -
      + Customer stories-
  * Team
  * Enterprise
  * Explore
      + Explore GitHub -

    Learn and contribute

      + Topics -
      + Collections -
      + Trending -
      + Learning Lab -
      + Open source guides -

    Connect with others

      + The ReadME Project -
      + Events -
      + Community forum -
      + GitHub Education -
      + GitHub Stars program -
  * Marketplace
  * Pricing
    Plans -
      + Compare plans -
      + Contact Sales -
      + Education -

[                    ] [search-key]

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this organization All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in Sign up Sign up
{{ message }}

WICG / floc

  * Notifications
  * Star 609
  * Fork 43

  * Code
  * Issues 52
  * Pull requests 3
  * Actions
  * Projects 0
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

New issue

Have a question about this project? Sign up for a free GitHub account
to open an issue and contact its maintainers and the community.

Pick a username
    [                    ]

Email Address
    [                    ]

Password
    [                    ]

[                    ] Sign up for GitHub
By clicking "Sign up for GitHub", you agree to our terms of service
and privacy statement. We'll occasionally send you account related
emails.

Already on GitHub? Sign in to your account

Jump to bottom

Cohort IDs can be collected over time to create cross-site tracking
IDs #100

Open
johnwilander opened this issue Apr 14, 2021 * 6 comments
Open

Cohort IDs can be collected over time to create cross-site tracking
IDs #100

johnwilander opened this issue Apr 14, 2021 * 6 comments

Comments

@johnwilander
Copy link

@johnwilander johnwilander commented Apr 14, 2021 *
edited

In #99, it is stated that "FLoC is not useful for tracking." I don't
think that's accurate.

As far as I know, the user's cohort will not be partitioned per first
party site so multiple sites can observe the cohort ID in sync as it
changes week after week. A hash of the cohorts seen so far will
likely get more and more unique as the weeks go by.

Websites or tracker scripts on websites can expose arrays of the
cohorts they've seen to help all trackers identify the user, like
this:

let cohortCollectionForWebsiteA = [
  "week01_2022" : "0666",
  "week03_2022" : "A566",
  "week04_2022" : "2111",
  "week05_2022" : "1171",
  "week07_2022" : "749B",
]

let cohortCollectionForWebsiteB = [
  "week01_2022" : "0666",
  "week02_2022" : "0030",
  "week05_2022" : "1171",
  "week06_2022" : "7311",
  "week07_2022" : "749B",
]

Trackers send these to a server for matching across websites, in the
example above, resulting in the intersection [ "week01_2022",
"week05_2022", "week07_2022" ].

The cohort collections can be tied to PII on sites that have access
to such information about the user. This would allow a tracker with
just a collection on one site to call a server and get back PII for
that user.

If cohorts were partitioned (maybe they are?), the tracking effort
would take longer but observed partitioned cohort IDs can be sorted
to potentially create a unique ID across websites. You get something
like snippets of a DNA that eventually become unique, as a set, and
trackers will know which cohorts are widespread and which quickly
reduce the search space.

Even if the tracker cannot get to a unique ID for a particular user,
the entropy boost from collected cohort IDs is tremendous and can
easily be combined with existing fingerprinting entropy such as
language settings.

Sorry if I'm missing something in the above analysis or if this was
filed earlier.

The text was updated successfully, but these errors were encountered:

 47  5 [?] 3  16
@johnwilander
Copy link
Author

@johnwilander johnwilander commented Apr 14, 2021

To take this to the crowd metaphor: Before the pandemic and some time
back, I attended a Mew concert, a Ghost concert, Disney on Ice, and a
Def Leppard concert. At each of those events I was part of a large
crowd. But I bet you I was the only one to attend all four.

 72
@dmarti
Copy link
Contributor

@dmarti dmarti commented Apr 14, 2021

There is a suggestion to make the cohort "sticky" for a given site,
so that once a site has seen the cohort ID once, it will not see a
different one. ("Longitudinal Privacy" section: d822a35 )

 3
@johnwilander
Copy link
Author

@johnwilander johnwilander commented Apr 14, 2021

    There is a suggestion to make the cohort "sticky" for a given
    site, so that once a site has seen the cohort ID once, it will
    not see a different one. ("Longitudinal Privacy" section: d822a35
    )

Thanks. That's more or less the same analysis.

I don't think updating cohort IDs at different times solves the
problem. See the sorting attack I mentioned.

Making it sticky is interesting. First of all, I assume the website
will not be allowed to delete it so it becomes a persistent "visited"
flag. Second, the sticky cohort ID becomes a persistent
fingerprinting signal per website that carries over even if different
accounts log in to the site or the site tries to clear its state.
Third, sticky cohort IDs could be set up for a small set of bounce
tracking domains and be used to pick up a persistent ID. Finally,
being persistently assigned a cohort for ad targeting purposes can be
really bad for users (see the stories on baby ads after miscarriage
and marriage ads after cancelled wedding) and probably not popular
with advertisers who want "fresh" interest signals to target.

 8
@dmarti dmarti mentioned this issue Apr 14, 2021
Sites recording a user entering and leaving a sensitive category #77
Closed
@dmarti
Copy link
Contributor

@dmarti dmarti commented Apr 14, 2021

Thank you, good points about how sticky cohort IDs could interact
with other state preserved by a site. (#77 covers the similar issue
of sites being able to observe the timing of when a user joins and
leaves the "null cohort").

@othermaciej
Copy link

@othermaciej othermaciej commented Apr 15, 2021

If cohort IDs are sticky, would the user still be able to delete/
reset their cohort ID, e.g. by deleting website data or clearing
history?

@michaelkleber
Copy link
Collaborator

@michaelkleber michaelkleber commented Apr 15, 2021

Hi John,

Right, this is indeed the "Longitudinal privacy" question. We've been
considering a few different mitigations. As you know, this is an
iterative and open process, and we expect to implement one or more of
these solutions in future versions of FLoC. (Remember that
third-party cookies are still around in Chrome, so FLoC-based "slow
fingerprinting" does not pose any tracking risk beyond what 3p
cookies are already offering today.)

 1. There's stickiness, as Don pointed out -- or maybe not permanent
    stickiness, but the cohort changing only slowly on each site. (Of
    course it would still need to be cleared along with any other
    first-party state, for the reasons you mentioned above. That
    would put a person into the has-no-cohort category until the next
    time it would get re-calculated.)

 2. There's the related idea of computing a person's cohort at
    different times on different sites. This isn't the same as
    updating at different times, which I think you were referring to
    above. The idea here is that different sites a person visits
    would see a flock derived from a different time window.

    As @npdoty pointed out in #69, how useful this is depends on how
    different a person's browsing is on different days. Real-world
    data seems like the best way to measure the decrease in
    fingerprintability here.

 3. There's the idea of adding per-site noise to the output of the
    hash function, as mentioned in the original explainer. This is
    mostly a Differential Privacy approach to further address the
    concern about leaking browsing history. But once you're adding
    noise, that noise can vary by which site you're on, so that your
    history of cohorts-over-time on different sites look pretty
    different from each other.

    This requires measuring the privacy/utility trade-off as you vary
    the amount of noise. If the noise weight is 0, we need to worry
    about the attack you described; if the noise weight is large, it
    drowns out the browsing-based signal entirely, and the cohort ID
    is effectively your per-site random number of the week. The
    question is whether there is a useful value in between.

Those aren't the only possibilities, but they do seem collectively
promising enough to warrant further exploration.

Sign up for free to join this conversation on GitHub. Already have an
account? Sign in to comment
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
@dmarti @johnwilander @othermaciej @michaelkleber

  * (c) 2021 GitHub, Inc.
  * Terms
  * Privacy
  * Security
  * Status
  * Docs

 

  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.