[HN Gopher] What's in email tracking links and pixels?
       ___________________________________________________________________
        
       What's in email tracking links and pixels?
        
       Author : bengtan
       Score  : 101 points
       Date   : 2021-06-09 12:01 UTC (11 hours ago)
        
 (HTM) web link (bengtan.com)
 (TXT) w3m dump (bengtan.com)
        
       | dynm wrote:
       | Here's an question... Suppose I'd like to send emails that
       | include images. The images are content, I don't care about
       | tracking. Is there any way to do that in a way that's privacy
       | friendly?
       | 
       | The natural way of doing this would be embedded images. However,
       | it seems that many mail clients don't support these.
       | (https://www.emaillistvalidation.com/blog/embedded-image-supp...)
       | 
       | Are there any other options? The only other option I can see
       | would be to use SVG images and then sort of "compile" the SVG
       | into the html source. However, given how email clients have
       | limited html support, this doesn't seem workable either...
       | 
       | It's frustrating that these tracking pixels have made genuine
       | content images so unreliable.
        
         | colechristensen wrote:
         | Gmail proxies images, if you send everybody the same image you
         | will get very little information about who is grabbing the
         | image and when (i.e. you'll be able to tell when google
         | (re)populates the cache which gives some small indication that
         | your email is being opened).
        
           | dynm wrote:
           | This indeed prevents me from tracking. I should have been
           | more clear that my "real" goal is that privacy-sensitive
           | readers will be able to see images. I think these people
           | won't know that the image isn't unique, and so won't load the
           | images.
        
         | legitster wrote:
         | Tracking pixels and tracking links only work because there are
         | unique identifiers in the URL. So if you just reference the
         | image's direct link in the HTML of the email there's really no
         | information to be gleaned outside of the normal email server
         | handshake.
         | 
         | However, when Google proxies the image in an email, there is no
         | way for the user to know the original URL and see if it has a
         | unique identifier or not.
        
         | kayodelycaon wrote:
         | The email validation page is incorrect (possible due to being
         | out of date). Apple Mail on iPhone can render embedded images
         | just like Safari can. I use them in a few personal projects.
        
         | crispyporkbites wrote:
         | If it's just one or a handful of emails, or a small image,
         | attach the image and use it's Content ID to refer to it in the
         | HTML of the email:
         | 
         | <img src="cid:some-image-cid" alt="img" />
         | 
         | pretty much all email clients support it
        
           | dynm wrote:
           | Thanks! Do you think this might increase the odds of the
           | email going to spam? (This might be why you mentioned not
           | having too many images.)
        
       | austinkhale wrote:
       | Per my most recent Substack email, they have 55k+ publications,
       | 37M+ posts, and 19M+ users. Interesting.
        
       | zzyzxd wrote:
       | This is an interesting reading. Although there are more tracking
       | mechanisms than pixels. Surely you can configure your email
       | client to not to load remote content automatically, but most of
       | the clients will still leak information in various html/css
       | elements.
       | 
       | A while ago, I used https://www.emailprivacytester.com/ to test
       | several famous iOS email clients, and most of them more or less
       | leaked _something_, even without loading remote content. In the
       | end, I found Fastmail and Apple's built-in iOS mail client to be
       | the top-notch in terms of privacy (Fastmail leaked nothing but
       | only their server side DNS server via DNS prefetch[1][2], which
       | has nothing to do with client. Apple is slightly worse, but still
       | far better than any other email clients like Outlook, Spark,
       | Edison...)
       | 
       | 1.
       | https://www.emailprivacytester.com/testDescription?test=dnsL...
       | 
       | 2.
       | https://www.emailprivacytester.com/testDescription?test=dnsA...
        
         | ipaddr wrote:
         | Thunderbird by default.
         | 
         | Turning in html should be an option done only when really
         | needed.
        
           | zzyzxd wrote:
           | I think it depends on the software's targeting user group.
           | This is okay, and probably the preferred behavior if your
           | users are all tech-savvy. But it is hard to explain to non-
           | technical users why this ugly text email is better than that
           | that email with beautiful pictures, or even what HTML is.
        
             | wizzwizz4 wrote:
             | The pictures aren't in the email. The email contains
             | instructions saying "phone Steve and ask for the images,
             | then put them in this gap", but if your computer follows
             | those instructions then Steve knows when you're reading
             | your emails, and where.
             | 
             | Who is Steve? Nobody knows, but he's in the "knowing who's
             | reading emails and when" business. It's a shady business.
             | Don't let your computer phone Steve.
        
         | entropyie wrote:
         | My email client / provider leaked only DNS prefetch... nothing
         | else... Before I even opened the message! I reckon it was my
         | provider, as the IP address reported was wrong for me.
        
         | rectang wrote:
         | > _Surely you can configure your email client to not to load
         | remote content automatically_
         | 
         | Last time I checked, although I could prevent image loading in
         | Gmail for desktop web browser, I could _not_ do so in the Gmail
         | iOS app.
        
           | yosito wrote:
           | If you're using Gmail, all hope is lost for not being tracked
           | anyway.
        
             | wizzwizz4 wrote:
             | You can still _reduce_ your tracking... even if the
             | companies can still get that information of yours, it 's at
             | a slightly higher cost to them.
        
         | lprd wrote:
         | > Surely you can configure your email client to not to load
         | remote content automatically, but most of the clients will
         | still leak information in various html/css elements.
         | 
         | I believe MailMate does this by default? I've been using
         | MailMate for a little over a year now and I've fallen
         | completely in love with it.
         | 
         | https://freron.com/
        
           | defaultuser9 wrote:
           | Long time user of MailMate and was just about to ask this! I
           | love MailMate for this privacy feature and ability to compose
           | in markdown (P.S. - this is also my first HN comment ever)
        
       | OldGoodNewBad wrote:
       | Do people load remote images in 2021?
        
         | LeifCarrotson wrote:
         | Those that do get counted and optimized for. The rest of us
         | might as well not exist.
        
         | jabroni_salad wrote:
         | Yeah. Something I did not expect when I became a mail
         | administrator was meeting a lot of people who actually read
         | those marketing newsletters I spend so much time trying to
         | avoid.
         | 
         | I've got a constant contact sender (a local chamber of
         | commerce) in my tickets right now who sends exclusively
         | pictures of text.
        
         | doc_gunthrop wrote:
         | That's like asking "do people allow javascript when opening
         | webpages in 2021?"
         | 
         | It's common for browser-based email services (such as Gmail) to
         | default to loading remote images.
        
           | seedless-sensat wrote:
           | My impression is that Gmail prefetches ALL email images, and
           | then serves them to the reader via their CDN. (Checking a
           | random email in my inbox demonstrates this,
           | https://ci3.googleusercontent.com/proxy/...)
           | 
           | As a result, I thought there was no signal for tracking
           | pixels? I might be wrong though
        
             | neolog wrote:
             | They know when google loads the image, which is when you
             | open the email.
        
               | spicybright wrote:
               | They only know when google fetches the image, which can
               | be any time between you receiving it and opening it. I
               | highly doubt it's on the fly right when you open it.
        
               | snowwrestler wrote:
               | It is in fact on the fly when you open it.
               | 
               | All Gmail does is proxy the request to hide your IP from
               | the server hosting the image file. Gmail does not change
               | the timing of the request, the URL, or the image file.
        
         | dheera wrote:
         | The default setting in Gmail is to load remote images. You can
         | disable it in Settings but 99% of people don't know that.
         | 
         | I really don't think it should be the default setting, but it
         | is.
        
       | legitster wrote:
       | There's also Litmus, which uses a really advanced set of multiple
       | pixels to give data on how long a user is reading an email.
       | Presumably, they insert delays into how long it takes to load
       | each pixel, and if any of the requests get cancelled they can get
       | an idea of how long the email was open for.
       | 
       | The Litmus pixels are usually dropped into another ESP's
       | template, so the data you get would be used to supplement the
       | normal tracking pixel email.
        
         | cmehdy wrote:
         | Is it done with the "loading" attribute[1] for the img tag?
         | (i.e. lazy loading)
         | 
         | (in which case I assume it's only useful in some instances,
         | since viewports might be of various sizes and there aren't that
         | many emails that are long enough[2] to involve much scrolling
         | for example.
         | 
         | [1] https://developer.mozilla.org/en-
         | US/docs/Web/Performance/Laz...
         | 
         | [2] https://sleeknote.com/blog/ideal-email-length
        
           | anonred wrote:
           | Presumably the server just delays the response for x seconds,
           | with the assumption that any in-flight network requests are
           | cancelled by the email client when the user closes the window
           | or app.
        
           | legitster wrote:
           | In general, email clients are really, really, really dumb.
           | Everything gets loaded at once. So unless it was an HTML
           | attribute that was available in the 90s, it's better to
           | assume the magic is happening server side.
        
       | bengtan wrote:
       | Hi,
       | 
       | Author here.
       | 
       | This investigation into email tracking attempts to deconstruct
       | tracking links and pixels and highlight the data that is being
       | collected. It covers Mailchimp, ConvertKit, Substack and other
       | Mailgun retailers.
       | 
       | There's also some attempted (albeit unsuccessful) reverse-
       | engineering of an opaque token in the Substack section (If you
       | like reading stuff about reverse-engineering).
       | 
       | Happy to answer any questions.
       | 
       | Thanks.
        
         | verdverm wrote:
         | Nice work!
         | 
         | Have you considered what Salesforce, HubSpot, and the like
         | have? They use the BCC to record entire email chains and
         | users...
        
           | codingdave wrote:
           | How does that work? I mean, sure they can BCC an address when
           | they send an email, but any replies that I send back won't
           | include that BCC?
        
             | verdverm wrote:
             | If I reply to your reply, then they see the chain. I can
             | only imagine how much insider info they could be holding on
             | to.
             | 
             | Also of concern, are you even aware which emails / other
             | people have be uploaded to their systems?
        
               | codingdave wrote:
               | Ah, gotcha. That makes sense.
               | 
               | As far as insider info, most larger companies I've been
               | at use a variety of confidentiality levels for their
               | data, the highest of which cannot be emailed or put in
               | the cloud. I believe that most corporate governance
               | professionals are well aware of the risks and options for
               | how to work with such things. But to be fair, your
               | average office worker is not, so compliance with such
               | policies becomes a cultural and education concern.
        
           | dewey wrote:
           | > Have you considered what Salesforce, HubSpot, and the like
           | have? They use the BCC to record entire email chains and
           | users...
           | 
           | But that's usually done to add "state" to emails so they can
           | be tied to one thread in the support system and people can
           | reply to either the email chain or via some web interface. I
           | don't think you necessarily want to interfere with that.
        
             | verdverm wrote:
             | There are privacy considerations on the unknown users side.
             | Have I consented to HubSpot (et al) having PII and my email
             | contents? (I don't know how this works today, with the
             | GDPR, any future privacy laws)
             | 
             | I have more experience with the sales product than the
             | support ones, where typically more sensitive information
             | can be discussed.
        
         | 02thoeva wrote:
         | Convertkit is a front-end for Sendgrid, so possibly they use
         | the same format as them?
        
         | OrvalWintermute wrote:
         | Appreciated the blog post, I found it very handy in
         | understanding the tracking activities.
         | 
         | Looking forward to more!
        
       | blibble wrote:
       | if you were a large email service and you really wanted to mess
       | with this sort of tracking could you                 - fetch the
       | images at the point the mail is accepted for delivery       -
       | cache the result       - rewrite the URLs transparently in the UI
       | to point to your cached copy
        
         | snowwrestler wrote:
         | The majority of emails are never opened. So why would an email
         | service greatly increase their complexity and costs by
         | downloading images no one would otherwise ever see, storing
         | them indefinitely, and rewriting their customers' email
         | content. The risk/reward ratio is way off on that.
         | 
         | I wonder how many customers would welcome the feature
         | announcement "we are now programmatically altering the content
         | of emails you receive through us." Look how well everyone loved
         | it when ISPs injected content into unencrypted web pages they
         | delivered.
        
           | mike-cardwell wrote:
           | > The majority of emails are never opened. So why would an
           | email service greatly increase their complexity and costs by
           | downloading images no one would otherwise ever see
           | 
           | If gmail and some of the other large providers started doing
           | this, people would just stop using tracking pixels because
           | they would no longer work. So less stuff for gmail to proxy.
           | 
           | Then emails would only contain "legit" images, which would be
           | shared across many emails. e.g, you send 100,000 emails with
           | an image that has no tracking information, gmail only needs
           | to downloads it once. And why would a sender choose to serve
           | 100,000 copies of the same image from slightly different
           | URLs, when they can just serve it up once?
           | 
           | The gains are obvious and would be large if you ask me. The
           | scale of the costs, debatable, imo.
        
             | snowwrestler wrote:
             | > why would a sender choose to serve 100,000 copies of the
             | same image from slightly different URLs, when they can just
             | serve it up once?
             | 
             | To provide open tracking, which is a core metric that all
             | of their customers demand and rely on.
             | 
             | There is nothing special about a tracking pixel, it's just
             | a tiny image file with a personalized URL. Email marketing
             | platforms could easily personalize the URLs of other image
             | files or even all image files.
             | 
             | The costs are asymmetric. The sender only needs one copy of
             | the image file, and a tiny bit of code to map the
             | personalized URLs to that file. But the receiving platform
             | would have to cache every copy of the image separately
             | since they would all have different URLs. Or run some sort
             | deduping scheme across all inboxes and emails, which would
             | also be expensive.
        
         | legitster wrote:
         | This is already how Gmail handles images.
        
           | mike-cardwell wrote:
           | No it's not. Gmail fetches images when you open an email to
           | read it. You can test this yourself using
           | https://www.emailprivacytester.com.
           | 
           | The only thing Gmail does is hide your IP when it fetches the
           | image. It doesn't hide the fact that you've opened the email.
           | Which frankly, is the most useful piece of information to the
           | tracker.
        
             | legitster wrote:
             | The email privacy detector is seeing Google fetch the
             | image. In the HTML of the email the sent me the image URL
             | points to a Google proxy link.
             | 
             | On subsequent opens of the email, the detector is not
             | seeing the image being requested again.
             | 
             | Unless you were proposing the email server should download
             | and proxy ALL images, even before the email is delivered.
             | Some anti-spam clients already do a version of this,
             | although it should be noted that giving an email sender the
             | signal that you are eagerly reading all of their emails may
             | produce unintended consequences.
        
       | smbv wrote:
       | CyberChef helped me decode the URL:
       | 
       | It was a zlib deflate and a URL-safe Base64 code.
       | 
       | https://gchq.github.io/CyberChef/#recipe=From_Base64('A-Za-z...
       | 
       | Update: Finishing reading the article, someone beat me to this.
        
       | SimeVidas wrote:
       | Those tracking links are so annoying. They make it hard to see
       | where the link is actually going. A newsletter could be linking
       | to Wikipedia, but if you open the message in Gmail, there could
       | be two or more layers of trackers in that URL.
       | 
       | Example: The Frontend Focus newsletter in Gmail
       | 
       | The link of the first news headline is something like
       | https://www.google.com/url?q=https%3A%2F%2Ffrontendfoc.us%2Flink%
       | 2F109272%2Fc0daad1d97&sa=D&sntz=1&usg=AFQgCNFEh5TaNZpHqsqyBGWEaq2
       | iL9MwCg
       | 
       | The actual URL is
       | https://www.slashgear.com/safari-overhaul-includes-tab-groups-
       | and-web-extensions-on-mobile-07676634/
        
         | boneitis wrote:
         | Of course, this does nothing to subvert the tracking services
         | like mailchimp that bury the final destination behind their own
         | link, but...
         | 
         | Ctrl+Alt+T                 $ python3       >>> from urllib
         | import parse       >>> parse.unquote('https%3A//www.google.com/
         | url%3Fq%3Dhttps%3A//www.example.com/foo.php%3Fp0%3Darg0%26p1%3D
         | arg1%26p2%3Darg2%26yoursoul%3DaXNtaW5l')       'https://www.goo
         | gle.com/url?q=https://www.example.com/foo.php?p0=arg0&p1=arg1&p
         | 2=arg2&yoursoul=aXNtaW5l'       >>> from base64 import
         | b64decode as bd       >>> bd(b'aXNtaW5l')       b'ismine'
         | >>>
         | 
         | Copy into clipboard: https://www.example.com/foo.php
         | 
         | Ctrl+D
         | 
         | Ctrl+D
         | 
         | Why I even bother in 2021?
         | 
         | I don't know.
        
           | dredmorbius wrote:
           | URL expanders may also be useful here, where expanding
           | encoded URLs isn't sufficient.
           | 
           | I've found https://urlex.org/ useful (top DDG search result).
           | You end up with the disambiguated link in most cases
           | (Twitter, Bitly, and similar shorteners).
           | 
           | I've not looked to see how many levels of
           | redirection/misdirection it will resolve.
        
             | OJFord wrote:
             | What does thay actually achieve though? You've still given
             | an 'opened' hit, even if urlex expands it on the server
             | instead of client-side (which would be _truly_ useless).
             | 
             | You're disguising location and device information, but
             | that's about it?
        
               | dredmorbius wrote:
               | Fair point, though there are some benefits:
               | 
               | - The "hit" comes from the resolver rather than your own
               | IP. So long as there's no referrer pass-through of
               | personal information, your location is minimised.
               | 
               | - Such links often come through other social media, in my
               | case, rather than email. _In the specific case of email
               | this practice is of little use in protecting privacy._
               | However if you 're sanitising links pulled off social
               | media shares or the like, you're at least preventing
               | downstream contamination.
               | 
               | - Another practice is to randomly scramble any visible
               | identifiers. This presumes longer URLs, rather than
               | shortened ones.
               | 
               | - In practice, I scrub any "utm-medium" or similar URI
               | attributes as a matter of course. URLEX is helpful for
               | expanding shortened links ... which I've _not_
               | encountered so much in email, though truth be told, I 've
               | largely abandoned email for numerous reasons, the present
               | topic included.
        
               | boneitis wrote:
               | At the least, a log hit from a different IP, I suppose.
               | They're right, I totally forgot to mention the
               | unshortener services, which are what I actually mostly
               | use my Python routine (shared upthread) on. It's largely
               | for self-amusement, admittedly.
               | 
               | For the Google links, I actually use an extension to
               | automatically restore the original URL links.
        
         | professorsnep wrote:
         | I just had to deal with an annoying tracking link to
         | _unsubscribe_ from an unsolicited mailing list. uBlock even
         | blocked the link click, I had to temporarily allow the tracker
         | to unsubscribe.
        
           | legitster wrote:
           | Unsubscribe links have to have your identifier in them so you
           | know who to unsubscribe when you click the link.
           | 
           | We used to ask people for their email address to unsubscribe
           | them, but then they accused us of using a dark pattern to
           | keep them subscribed. So letting people unsubscribe easier
           | with fewer hoops to jump through seems like the lesser of two
           | evils.
        
         | twobitshifter wrote:
         | Yes and then if you use a content blocker the link will
         | inevitably end up blocked.
        
       | dheera wrote:
       | PSA: (a) Disable automatic loading of e-mails in Gmail if you
       | don't want to be tracked. (b) Don't ever click links from
       | e-mails, Google for the content instead.
       | 
       | Settings -> General -> Images -> Ask before displaying external
       | images
       | 
       | (I've also been debating sending an auto-reply back to users of
       | such e-mail apps (e.g. Superhuman) with an autoresponse to the
       | effect of "Due to the use of tracking pixels your e-mail has been
       | de-prioritized. If you would like a faster response please send
       | me a plain text e-mail" to discourage people from using these
       | privacy invasions.)
        
       | reader_1000 wrote:
       | One interesting thing I noticed with Linkedin emails is that it
       | dynamically fetches unread notification count. For example, if
       | someone views your profile, there will be a notification in the
       | website. If you go to your mail and open an _old_ Linkedin email
       | before you check the notification in the website, you will see a
       | little red 1 on the corner of Linkedin logo. Later, if you go to
       | website, clear notification, and then open the same email, you
       | will see that notification counter is gone. If find it quite
       | interesting that Gmail lets this behaviour.
        
         | RussianCow wrote:
         | The image is dynamically generated at request time, so there
         | isn't much Gmail can do, aside from eagerly preloading all
         | images as soon as the email comes in.
        
           | reader_1000 wrote:
           | As far as I remember, Gmail used to prefetch images to
           | prevent senders learning if and when recepient opens an
           | email, but if this behaviour changed, I didn't know that.
        
             | snowwrestler wrote:
             | All Gmail does (or ever did) is proxy the image file so the
             | server hosting it cannot do reverse IP lookup to collect
             | client metadata like geolocation. The server hosting the
             | image sees a Google IP address request the image, not (for
             | example) your phone's IP address.
             | 
             | But the image request still happens at the time you open
             | the email. Google does not prefetch the images in unopened
             | emails.
             | 
             | And if the image URL is personalized, it can still be
             | correlated with your email address by the sender to record
             | an open. Google does not try to guess which part of the URL
             | they can dump without breaking the image.
        
         | have_faith wrote:
         | >gmail let's this behaviour
         | 
         | I'm assuming the server is just responding with a different
         | image depending on a query param embedded in the image url? (an
         | old technique), what should google do? any remote image url
         | could respond with a new image in an old email it's just rare
         | that it happens.
        
           | reader_1000 wrote:
           | It used to prefetch external images [1]. Another option would
           | be asking whether to download external images. I think one
           | can enable this in settings, default is always display
           | external images.
           | 
           | [1] https://arstechnica.com/information-
           | technology/2013/12/gmail... [2]
           | https://news.ycombinator.com/item?id=6896378
        
       | sergiotapia wrote:
       | I love my Hey email because of this. they block tracking with no
       | configuration. It's great!
        
         | mike-cardwell wrote:
         | "they block tracking"
         | 
         | They block _some_ tracking. What percentage of tracking they
         | block is anyones guess. 99.999%, 90%, 50%, 10% ? Who knows.
         | 
         | Also, they don't block targetted tracking, which would be used
         | by a stalker for example. They only block widespread well known
         | trackers.
         | 
         | Only way to be safe is to disable loading of all remote
         | resources and don't click links.
        
         | msoad wrote:
         | Do they also block tracking of link clicks?
        
       | polyrand wrote:
       | Related to the post, I've enjoyed using the Trocker extension[0].
       | 
       | [0] https://trockerapp.github.io/
        
         | jerrygoyal wrote:
         | I wish something like this existed for Mobile also. Seems like
         | it's impossible to block trackers from gmail app.
        
           | polyrand wrote:
           | I guess your best chance in mobile is blocking automatic
           | image loading, at least to avoid tracking pixels.
        
       ___________________________________________________________________
       (page generated 2021-06-09 23:01 UTC)