[HN Gopher] What's in email tracking links and pixels?
___________________________________________________________________
What's in email tracking links and pixels?
Author : bengtan
Score : 101 points
Date : 2021-06-09 12:01 UTC (11 hours ago)
(HTM) web link (bengtan.com)
(TXT) w3m dump (bengtan.com)
| dynm wrote:
| Here's an question... Suppose I'd like to send emails that
| include images. The images are content, I don't care about
| tracking. Is there any way to do that in a way that's privacy
| friendly?
|
| The natural way of doing this would be embedded images. However,
| it seems that many mail clients don't support these.
| (https://www.emaillistvalidation.com/blog/embedded-image-supp...)
|
| Are there any other options? The only other option I can see
| would be to use SVG images and then sort of "compile" the SVG
| into the html source. However, given how email clients have
| limited html support, this doesn't seem workable either...
|
| It's frustrating that these tracking pixels have made genuine
| content images so unreliable.
| colechristensen wrote:
| Gmail proxies images, if you send everybody the same image you
| will get very little information about who is grabbing the
| image and when (i.e. you'll be able to tell when google
| (re)populates the cache which gives some small indication that
| your email is being opened).
| dynm wrote:
| This indeed prevents me from tracking. I should have been
| more clear that my "real" goal is that privacy-sensitive
| readers will be able to see images. I think these people
| won't know that the image isn't unique, and so won't load the
| images.
| legitster wrote:
| Tracking pixels and tracking links only work because there are
| unique identifiers in the URL. So if you just reference the
| image's direct link in the HTML of the email there's really no
| information to be gleaned outside of the normal email server
| handshake.
|
| However, when Google proxies the image in an email, there is no
| way for the user to know the original URL and see if it has a
| unique identifier or not.
| kayodelycaon wrote:
| The email validation page is incorrect (possible due to being
| out of date). Apple Mail on iPhone can render embedded images
| just like Safari can. I use them in a few personal projects.
| crispyporkbites wrote:
| If it's just one or a handful of emails, or a small image,
| attach the image and use it's Content ID to refer to it in the
| HTML of the email:
|
| <img src="cid:some-image-cid" alt="img" />
|
| pretty much all email clients support it
| dynm wrote:
| Thanks! Do you think this might increase the odds of the
| email going to spam? (This might be why you mentioned not
| having too many images.)
| austinkhale wrote:
| Per my most recent Substack email, they have 55k+ publications,
| 37M+ posts, and 19M+ users. Interesting.
| zzyzxd wrote:
| This is an interesting reading. Although there are more tracking
| mechanisms than pixels. Surely you can configure your email
| client to not to load remote content automatically, but most of
| the clients will still leak information in various html/css
| elements.
|
| A while ago, I used https://www.emailprivacytester.com/ to test
| several famous iOS email clients, and most of them more or less
| leaked _something_, even without loading remote content. In the
| end, I found Fastmail and Apple's built-in iOS mail client to be
| the top-notch in terms of privacy (Fastmail leaked nothing but
| only their server side DNS server via DNS prefetch[1][2], which
| has nothing to do with client. Apple is slightly worse, but still
| far better than any other email clients like Outlook, Spark,
| Edison...)
|
| 1.
| https://www.emailprivacytester.com/testDescription?test=dnsL...
|
| 2.
| https://www.emailprivacytester.com/testDescription?test=dnsA...
| ipaddr wrote:
| Thunderbird by default.
|
| Turning in html should be an option done only when really
| needed.
| zzyzxd wrote:
| I think it depends on the software's targeting user group.
| This is okay, and probably the preferred behavior if your
| users are all tech-savvy. But it is hard to explain to non-
| technical users why this ugly text email is better than that
| that email with beautiful pictures, or even what HTML is.
| wizzwizz4 wrote:
| The pictures aren't in the email. The email contains
| instructions saying "phone Steve and ask for the images,
| then put them in this gap", but if your computer follows
| those instructions then Steve knows when you're reading
| your emails, and where.
|
| Who is Steve? Nobody knows, but he's in the "knowing who's
| reading emails and when" business. It's a shady business.
| Don't let your computer phone Steve.
| entropyie wrote:
| My email client / provider leaked only DNS prefetch... nothing
| else... Before I even opened the message! I reckon it was my
| provider, as the IP address reported was wrong for me.
| rectang wrote:
| > _Surely you can configure your email client to not to load
| remote content automatically_
|
| Last time I checked, although I could prevent image loading in
| Gmail for desktop web browser, I could _not_ do so in the Gmail
| iOS app.
| yosito wrote:
| If you're using Gmail, all hope is lost for not being tracked
| anyway.
| wizzwizz4 wrote:
| You can still _reduce_ your tracking... even if the
| companies can still get that information of yours, it 's at
| a slightly higher cost to them.
| lprd wrote:
| > Surely you can configure your email client to not to load
| remote content automatically, but most of the clients will
| still leak information in various html/css elements.
|
| I believe MailMate does this by default? I've been using
| MailMate for a little over a year now and I've fallen
| completely in love with it.
|
| https://freron.com/
| defaultuser9 wrote:
| Long time user of MailMate and was just about to ask this! I
| love MailMate for this privacy feature and ability to compose
| in markdown (P.S. - this is also my first HN comment ever)
| OldGoodNewBad wrote:
| Do people load remote images in 2021?
| LeifCarrotson wrote:
| Those that do get counted and optimized for. The rest of us
| might as well not exist.
| jabroni_salad wrote:
| Yeah. Something I did not expect when I became a mail
| administrator was meeting a lot of people who actually read
| those marketing newsletters I spend so much time trying to
| avoid.
|
| I've got a constant contact sender (a local chamber of
| commerce) in my tickets right now who sends exclusively
| pictures of text.
| doc_gunthrop wrote:
| That's like asking "do people allow javascript when opening
| webpages in 2021?"
|
| It's common for browser-based email services (such as Gmail) to
| default to loading remote images.
| seedless-sensat wrote:
| My impression is that Gmail prefetches ALL email images, and
| then serves them to the reader via their CDN. (Checking a
| random email in my inbox demonstrates this,
| https://ci3.googleusercontent.com/proxy/...)
|
| As a result, I thought there was no signal for tracking
| pixels? I might be wrong though
| neolog wrote:
| They know when google loads the image, which is when you
| open the email.
| spicybright wrote:
| They only know when google fetches the image, which can
| be any time between you receiving it and opening it. I
| highly doubt it's on the fly right when you open it.
| snowwrestler wrote:
| It is in fact on the fly when you open it.
|
| All Gmail does is proxy the request to hide your IP from
| the server hosting the image file. Gmail does not change
| the timing of the request, the URL, or the image file.
| dheera wrote:
| The default setting in Gmail is to load remote images. You can
| disable it in Settings but 99% of people don't know that.
|
| I really don't think it should be the default setting, but it
| is.
| legitster wrote:
| There's also Litmus, which uses a really advanced set of multiple
| pixels to give data on how long a user is reading an email.
| Presumably, they insert delays into how long it takes to load
| each pixel, and if any of the requests get cancelled they can get
| an idea of how long the email was open for.
|
| The Litmus pixels are usually dropped into another ESP's
| template, so the data you get would be used to supplement the
| normal tracking pixel email.
| cmehdy wrote:
| Is it done with the "loading" attribute[1] for the img tag?
| (i.e. lazy loading)
|
| (in which case I assume it's only useful in some instances,
| since viewports might be of various sizes and there aren't that
| many emails that are long enough[2] to involve much scrolling
| for example.
|
| [1] https://developer.mozilla.org/en-
| US/docs/Web/Performance/Laz...
|
| [2] https://sleeknote.com/blog/ideal-email-length
| anonred wrote:
| Presumably the server just delays the response for x seconds,
| with the assumption that any in-flight network requests are
| cancelled by the email client when the user closes the window
| or app.
| legitster wrote:
| In general, email clients are really, really, really dumb.
| Everything gets loaded at once. So unless it was an HTML
| attribute that was available in the 90s, it's better to
| assume the magic is happening server side.
| bengtan wrote:
| Hi,
|
| Author here.
|
| This investigation into email tracking attempts to deconstruct
| tracking links and pixels and highlight the data that is being
| collected. It covers Mailchimp, ConvertKit, Substack and other
| Mailgun retailers.
|
| There's also some attempted (albeit unsuccessful) reverse-
| engineering of an opaque token in the Substack section (If you
| like reading stuff about reverse-engineering).
|
| Happy to answer any questions.
|
| Thanks.
| verdverm wrote:
| Nice work!
|
| Have you considered what Salesforce, HubSpot, and the like
| have? They use the BCC to record entire email chains and
| users...
| codingdave wrote:
| How does that work? I mean, sure they can BCC an address when
| they send an email, but any replies that I send back won't
| include that BCC?
| verdverm wrote:
| If I reply to your reply, then they see the chain. I can
| only imagine how much insider info they could be holding on
| to.
|
| Also of concern, are you even aware which emails / other
| people have be uploaded to their systems?
| codingdave wrote:
| Ah, gotcha. That makes sense.
|
| As far as insider info, most larger companies I've been
| at use a variety of confidentiality levels for their
| data, the highest of which cannot be emailed or put in
| the cloud. I believe that most corporate governance
| professionals are well aware of the risks and options for
| how to work with such things. But to be fair, your
| average office worker is not, so compliance with such
| policies becomes a cultural and education concern.
| dewey wrote:
| > Have you considered what Salesforce, HubSpot, and the like
| have? They use the BCC to record entire email chains and
| users...
|
| But that's usually done to add "state" to emails so they can
| be tied to one thread in the support system and people can
| reply to either the email chain or via some web interface. I
| don't think you necessarily want to interfere with that.
| verdverm wrote:
| There are privacy considerations on the unknown users side.
| Have I consented to HubSpot (et al) having PII and my email
| contents? (I don't know how this works today, with the
| GDPR, any future privacy laws)
|
| I have more experience with the sales product than the
| support ones, where typically more sensitive information
| can be discussed.
| 02thoeva wrote:
| Convertkit is a front-end for Sendgrid, so possibly they use
| the same format as them?
| OrvalWintermute wrote:
| Appreciated the blog post, I found it very handy in
| understanding the tracking activities.
|
| Looking forward to more!
| blibble wrote:
| if you were a large email service and you really wanted to mess
| with this sort of tracking could you - fetch the
| images at the point the mail is accepted for delivery -
| cache the result - rewrite the URLs transparently in the UI
| to point to your cached copy
| snowwrestler wrote:
| The majority of emails are never opened. So why would an email
| service greatly increase their complexity and costs by
| downloading images no one would otherwise ever see, storing
| them indefinitely, and rewriting their customers' email
| content. The risk/reward ratio is way off on that.
|
| I wonder how many customers would welcome the feature
| announcement "we are now programmatically altering the content
| of emails you receive through us." Look how well everyone loved
| it when ISPs injected content into unencrypted web pages they
| delivered.
| mike-cardwell wrote:
| > The majority of emails are never opened. So why would an
| email service greatly increase their complexity and costs by
| downloading images no one would otherwise ever see
|
| If gmail and some of the other large providers started doing
| this, people would just stop using tracking pixels because
| they would no longer work. So less stuff for gmail to proxy.
|
| Then emails would only contain "legit" images, which would be
| shared across many emails. e.g, you send 100,000 emails with
| an image that has no tracking information, gmail only needs
| to downloads it once. And why would a sender choose to serve
| 100,000 copies of the same image from slightly different
| URLs, when they can just serve it up once?
|
| The gains are obvious and would be large if you ask me. The
| scale of the costs, debatable, imo.
| snowwrestler wrote:
| > why would a sender choose to serve 100,000 copies of the
| same image from slightly different URLs, when they can just
| serve it up once?
|
| To provide open tracking, which is a core metric that all
| of their customers demand and rely on.
|
| There is nothing special about a tracking pixel, it's just
| a tiny image file with a personalized URL. Email marketing
| platforms could easily personalize the URLs of other image
| files or even all image files.
|
| The costs are asymmetric. The sender only needs one copy of
| the image file, and a tiny bit of code to map the
| personalized URLs to that file. But the receiving platform
| would have to cache every copy of the image separately
| since they would all have different URLs. Or run some sort
| deduping scheme across all inboxes and emails, which would
| also be expensive.
| legitster wrote:
| This is already how Gmail handles images.
| mike-cardwell wrote:
| No it's not. Gmail fetches images when you open an email to
| read it. You can test this yourself using
| https://www.emailprivacytester.com.
|
| The only thing Gmail does is hide your IP when it fetches the
| image. It doesn't hide the fact that you've opened the email.
| Which frankly, is the most useful piece of information to the
| tracker.
| legitster wrote:
| The email privacy detector is seeing Google fetch the
| image. In the HTML of the email the sent me the image URL
| points to a Google proxy link.
|
| On subsequent opens of the email, the detector is not
| seeing the image being requested again.
|
| Unless you were proposing the email server should download
| and proxy ALL images, even before the email is delivered.
| Some anti-spam clients already do a version of this,
| although it should be noted that giving an email sender the
| signal that you are eagerly reading all of their emails may
| produce unintended consequences.
| smbv wrote:
| CyberChef helped me decode the URL:
|
| It was a zlib deflate and a URL-safe Base64 code.
|
| https://gchq.github.io/CyberChef/#recipe=From_Base64('A-Za-z...
|
| Update: Finishing reading the article, someone beat me to this.
| SimeVidas wrote:
| Those tracking links are so annoying. They make it hard to see
| where the link is actually going. A newsletter could be linking
| to Wikipedia, but if you open the message in Gmail, there could
| be two or more layers of trackers in that URL.
|
| Example: The Frontend Focus newsletter in Gmail
|
| The link of the first news headline is something like
| https://www.google.com/url?q=https%3A%2F%2Ffrontendfoc.us%2Flink%
| 2F109272%2Fc0daad1d97&sa=D&sntz=1&usg=AFQgCNFEh5TaNZpHqsqyBGWEaq2
| iL9MwCg
|
| The actual URL is
| https://www.slashgear.com/safari-overhaul-includes-tab-groups-
| and-web-extensions-on-mobile-07676634/
| boneitis wrote:
| Of course, this does nothing to subvert the tracking services
| like mailchimp that bury the final destination behind their own
| link, but...
|
| Ctrl+Alt+T $ python3 >>> from urllib
| import parse >>> parse.unquote('https%3A//www.google.com/
| url%3Fq%3Dhttps%3A//www.example.com/foo.php%3Fp0%3Darg0%26p1%3D
| arg1%26p2%3Darg2%26yoursoul%3DaXNtaW5l') 'https://www.goo
| gle.com/url?q=https://www.example.com/foo.php?p0=arg0&p1=arg1&p
| 2=arg2&yoursoul=aXNtaW5l' >>> from base64 import
| b64decode as bd >>> bd(b'aXNtaW5l') b'ismine'
| >>>
|
| Copy into clipboard: https://www.example.com/foo.php
|
| Ctrl+D
|
| Ctrl+D
|
| Why I even bother in 2021?
|
| I don't know.
| dredmorbius wrote:
| URL expanders may also be useful here, where expanding
| encoded URLs isn't sufficient.
|
| I've found https://urlex.org/ useful (top DDG search result).
| You end up with the disambiguated link in most cases
| (Twitter, Bitly, and similar shorteners).
|
| I've not looked to see how many levels of
| redirection/misdirection it will resolve.
| OJFord wrote:
| What does thay actually achieve though? You've still given
| an 'opened' hit, even if urlex expands it on the server
| instead of client-side (which would be _truly_ useless).
|
| You're disguising location and device information, but
| that's about it?
| dredmorbius wrote:
| Fair point, though there are some benefits:
|
| - The "hit" comes from the resolver rather than your own
| IP. So long as there's no referrer pass-through of
| personal information, your location is minimised.
|
| - Such links often come through other social media, in my
| case, rather than email. _In the specific case of email
| this practice is of little use in protecting privacy._
| However if you 're sanitising links pulled off social
| media shares or the like, you're at least preventing
| downstream contamination.
|
| - Another practice is to randomly scramble any visible
| identifiers. This presumes longer URLs, rather than
| shortened ones.
|
| - In practice, I scrub any "utm-medium" or similar URI
| attributes as a matter of course. URLEX is helpful for
| expanding shortened links ... which I've _not_
| encountered so much in email, though truth be told, I 've
| largely abandoned email for numerous reasons, the present
| topic included.
| boneitis wrote:
| At the least, a log hit from a different IP, I suppose.
| They're right, I totally forgot to mention the
| unshortener services, which are what I actually mostly
| use my Python routine (shared upthread) on. It's largely
| for self-amusement, admittedly.
|
| For the Google links, I actually use an extension to
| automatically restore the original URL links.
| professorsnep wrote:
| I just had to deal with an annoying tracking link to
| _unsubscribe_ from an unsolicited mailing list. uBlock even
| blocked the link click, I had to temporarily allow the tracker
| to unsubscribe.
| legitster wrote:
| Unsubscribe links have to have your identifier in them so you
| know who to unsubscribe when you click the link.
|
| We used to ask people for their email address to unsubscribe
| them, but then they accused us of using a dark pattern to
| keep them subscribed. So letting people unsubscribe easier
| with fewer hoops to jump through seems like the lesser of two
| evils.
| twobitshifter wrote:
| Yes and then if you use a content blocker the link will
| inevitably end up blocked.
| dheera wrote:
| PSA: (a) Disable automatic loading of e-mails in Gmail if you
| don't want to be tracked. (b) Don't ever click links from
| e-mails, Google for the content instead.
|
| Settings -> General -> Images -> Ask before displaying external
| images
|
| (I've also been debating sending an auto-reply back to users of
| such e-mail apps (e.g. Superhuman) with an autoresponse to the
| effect of "Due to the use of tracking pixels your e-mail has been
| de-prioritized. If you would like a faster response please send
| me a plain text e-mail" to discourage people from using these
| privacy invasions.)
| reader_1000 wrote:
| One interesting thing I noticed with Linkedin emails is that it
| dynamically fetches unread notification count. For example, if
| someone views your profile, there will be a notification in the
| website. If you go to your mail and open an _old_ Linkedin email
| before you check the notification in the website, you will see a
| little red 1 on the corner of Linkedin logo. Later, if you go to
| website, clear notification, and then open the same email, you
| will see that notification counter is gone. If find it quite
| interesting that Gmail lets this behaviour.
| RussianCow wrote:
| The image is dynamically generated at request time, so there
| isn't much Gmail can do, aside from eagerly preloading all
| images as soon as the email comes in.
| reader_1000 wrote:
| As far as I remember, Gmail used to prefetch images to
| prevent senders learning if and when recepient opens an
| email, but if this behaviour changed, I didn't know that.
| snowwrestler wrote:
| All Gmail does (or ever did) is proxy the image file so the
| server hosting it cannot do reverse IP lookup to collect
| client metadata like geolocation. The server hosting the
| image sees a Google IP address request the image, not (for
| example) your phone's IP address.
|
| But the image request still happens at the time you open
| the email. Google does not prefetch the images in unopened
| emails.
|
| And if the image URL is personalized, it can still be
| correlated with your email address by the sender to record
| an open. Google does not try to guess which part of the URL
| they can dump without breaking the image.
| have_faith wrote:
| >gmail let's this behaviour
|
| I'm assuming the server is just responding with a different
| image depending on a query param embedded in the image url? (an
| old technique), what should google do? any remote image url
| could respond with a new image in an old email it's just rare
| that it happens.
| reader_1000 wrote:
| It used to prefetch external images [1]. Another option would
| be asking whether to download external images. I think one
| can enable this in settings, default is always display
| external images.
|
| [1] https://arstechnica.com/information-
| technology/2013/12/gmail... [2]
| https://news.ycombinator.com/item?id=6896378
| sergiotapia wrote:
| I love my Hey email because of this. they block tracking with no
| configuration. It's great!
| mike-cardwell wrote:
| "they block tracking"
|
| They block _some_ tracking. What percentage of tracking they
| block is anyones guess. 99.999%, 90%, 50%, 10% ? Who knows.
|
| Also, they don't block targetted tracking, which would be used
| by a stalker for example. They only block widespread well known
| trackers.
|
| Only way to be safe is to disable loading of all remote
| resources and don't click links.
| msoad wrote:
| Do they also block tracking of link clicks?
| polyrand wrote:
| Related to the post, I've enjoyed using the Trocker extension[0].
|
| [0] https://trockerapp.github.io/
| jerrygoyal wrote:
| I wish something like this existed for Mobile also. Seems like
| it's impossible to block trackers from gmail app.
| polyrand wrote:
| I guess your best chance in mobile is blocking automatic
| image loading, at least to avoid tracking pixels.
___________________________________________________________________
(page generated 2021-06-09 23:01 UTC)