hngopher.com

       [HN Gopher] Protecting your email address via SVG instead of Jav...
       ___________________________________________________________________
        
       Protecting your email address via SVG instead of JavaScript
        
       Author : FrostKiwi
       Score  : 256 points
       Date   : 2024-05-13 07:33 UTC (15 hours ago)
        
 (HTM) web link (rouninmedia.github.io)
 (TXT) w3m dump (rouninmedia.github.io)
        
       | cwillu wrote:
       | Email is still plain-text within an xml document referenced in
       | the page source.
        
         | _joel wrote:
         | The idea being that spam bots don't parse svg's looking for
         | email addresses, just the page html. I'm not sure how effective
         | this really is with modern spam protection, however.
        
           | turboturbo wrote:
           | The idea also seem to be that spam bots don't look for
           | `href="mailto:something"` in the DOM
        
             | rrr_oh_man wrote:
             | That seems surprising, tbh
        
             | edave64 wrote:
             | The mailto is inside the SVG, not the HTML document. So
             | that's not "also" it's the same idea of bots not looking at
             | the svg at all
        
         | majestic5762 wrote:
         | yeah, useless stuff portrayed as smart
        
         | shanehoban wrote:
         | Try to query it though via document.querySelectorAll('a') for
         | example. It's a good first line of defense as a lot of scraping
         | techniques do this approach.
         | 
         | However, if you have a headless browser setup for scraping, and
         | simply fetch the current URL while on the page[0], you can get
         | the plain text, and do a regex search for email addresses which
         | will get you the email address - albeit this is a strange
         | approach to take I admit.
         | 
         | [0]: fetch('./').then((res) => res.text()).then((text) =>
         | console.log(text))
        
           | nolok wrote:
           | > It's a good first line of defense as a lot of scraping
           | techniques do this approach.
           | 
           | Most basic scrappers, the ones that are not for your testing
           | or devtools or automation or ... Actually use basic text,
           | without any interpretation. They grep the source code, they
           | don't run a dom and javascript engine, because it's a major
           | difference in computing needs and speed.
           | 
           | I am not saying there is no evil scrapper doing dom
           | evaluation, there are tons, I am reacting to your "FIRST line
           | of defense", that one is scrambling the raw text, which is
           | why we got there.
           | 
           | What parent is saying, is that this is trying to upgrade the
           | defense that we have generated to stop the threat that
           | evolved, but it forgot why we got there and thus makes itself
           | vulnerable to the original threat.
        
             | cqqxo4zV46cp wrote:
             | If they're saying it, I think that they're wrong. One of
             | those naively written scrapers won't pick up an email
             | address 'protected' in this way. It's simply continuing the
             | game of cat and mouse.
        
             | animuchan wrote:
             | Absolutely. The basic tools just fetch sites recursively
             | and use regular expressions. The advanced tools are
             | Chromium-based, so will render SVGs just fine (and then
             | potentially run OCR / AI to extract text even from JPEGs).
             | 
             | This technique protects from a "neither here nor there"
             | subset of programs, I wonder how large is that set in
             | practice.
        
           | nkozyra wrote:
           | You can just query for all the image elements and then read
           | any svg using the document model.
           | 
           | This is trivial to overcome for most basic scrapers and not
           | much harder even if you try to obfuscate with paths for more
           | sophisticated ones.
        
       | throwaway11460 wrote:
       | Don't have time to test myself right now - what about
       | accessibility, can a screen reader read it?
        
         | Operyl wrote:
         | Given the entire bottom section, it seems like accessibility
         | was taken into account here.
        
           | throwaway11460 wrote:
           | Unfortunately then I think it won't help at all - going
           | through the accessibility tree is a standard web crawling
           | play.
        
         | gostsamo wrote:
         | I tested and seems accessible on the live demo. Not sure if is
         | as protected as the author claims though, but it might throw
         | some bots for a spin.
        
           | rrr_oh_man wrote:
           | Man, I've always wondered how to test apps with a (simulated)
           | screen reader, but never got too far
        
             | throwaway11460 wrote:
             | I use this: https://chromewebstore.google.com/detail/aria-
             | devtools/dneem...
             | 
             | Not sure about desktop apps.
        
             | gostsamo wrote:
             | My secret is that I'm not simulating. Being blind forces
             | you into it. :D
             | 
             | For testing purposes, the nvda screen reader is free and
             | open source. I'm not sure if there is a driver for it to
             | have an api access to what it would output, but it might be
             | a fun project to try for a11y testing purposes.
        
           | dylan604 wrote:
           | > but it might throw some bots for a spin.
           | 
           | Until some bot dev sees this, accepts the challenge, and then
           | solves it as a function within their package that never needs
           | updating again because it is now done. So, live it up while
           | it is not solved. After that, just shrug your shoulders at
           | yet another idea no longer being useful
        
             | gostsamo wrote:
             | The key in this case is that this is not a problem for me
             | even if someone implements such a protection.
             | 
             | The rest is mice and traps.
        
       | SahAssar wrote:
       | If concealing it in an object tag works then you could just have
       | the object tag show it as plain text or html, right? Not sure why
       | its an svg.
        
         | juped wrote:
         | probably because the scraper has "that's an image, skip it"
         | logic
        
       | okasaki wrote:
       | Is it still necessary to obfuscate email addresses? Mine isn't
       | and I get around 50 generic spam emails per month to gmail.
        
         | ale42 wrote:
         | I think that nowadays most spam lists come from data breaches
         | and address-collecting malware. It's cheaper than running a bot
         | to scan the web for addresses. We get spam on addresses that
         | were never published online.
        
           | RaoulP wrote:
           | I think so too. And I think the majority of data breaches
           | that have lead to spam for me are from ages ago, from random
           | services I signed up for as a teenager.
           | 
           | For a few years after that I did the "+" Gmail alias thing,
           | to try to filter and catch companies. But I realised that's
           | easy and obvious to strip, so it wasn't worth the effort
           | (although I have caught PayPal leaking my email somehow).
        
             | ale42 wrote:
             | If you self-host your email, you can use "." as a delimiter
             | instead of the "+". People would already need to know they
             | can strip that part...
        
               | RaoulP wrote:
               | Sounds good! I might go even further and just use a
               | custom address for each service, i.e. paypal@example.com
               | or something.
               | 
               | But self-hosting email is an adventure I'm nervous to
               | embark on.
        
               | nobody9999 wrote:
               | >Sounds good! I might go even further and just use a
               | custom address for each service, i.e. paypal@example.com
               | or something.
               | 
               | Which is exactly what I do. As soon as I see spam sent to
               | any particular email address, I know who it is that
               | leaked the address and I can block it without issue.
               | 
               | >But self-hosting email is an adventure I'm nervous to
               | embark on.
               | 
               | Why are you nervous about it? I've been doing so for
               | decades and haven't had many issues at all. There are a
               | bunch of all-in-one solutions like mailinabox[0] (I roll
               | my own, but as I said, I've been doing this for decades)
               | and others which would likely make things simpler for
               | you. Go for it! You won't be disappointed.
               | 
               | [0] https://en.wikipedia.org/wiki/Mail-in-a-Box
        
               | samatman wrote:
               | Anecdotally, sending mail to example.com from
               | example@mydomain.com can cause a whole host of human-
               | factors problems which can be eliminated with something
               | like RaoulPtoExample@mydomain.com.
        
         | martyvis wrote:
         | Is that all. I get around 70 genuine spam emails to my Gmail
         | account every day now (all detected correctly by Gmail)
        
           | tempestn wrote:
           | I get a similar volume, and gmail likely detects almost all
           | of them. Problem is, it also falsely detects the occasional
           | non-spam message, so I do need to periodically scan through
           | the spam box, which is a bit of a pain when it contains
           | hundreds or thousands of emails.
        
         | RaoulP wrote:
         | I think this is a valid question. I see lots of effort at
         | obfuscation but don't know if there's still a need.
         | 
         | I barely get spam and have a bigger issue with false positives
         | in my spam folder. On the other hand I don't think there are
         | many pages on the web that display my email address, so I'm
         | curious about others' experience.
        
         | sitzkrieg wrote:
         | it isnt but people like to make a problem of it with elaborate
         | whatifs
        
       | fp64 wrote:
       | I don't get it, I can just curl the svg and grep for mailto?
        
         | rany_ wrote:
         | Yes, but these scrapper bots aren't that sophisticated.
        
           | winternewt wrote:
           | But they will be as soon as this sees widespread use.
        
             | _joel wrote:
             | it won't be widespread imho, not when you share you email
             | address with other parties that then lose/sell your
             | details. fastmail like 'temporal' email addresses could
             | help, however.
        
           | amsterdorn wrote:
           | Querying DOM nodes is inherently more complicated than a
           | regex on unparsed HTML.
        
           | fp64 wrote:
           | Crawl every link, now including SVG, and grep all 'mailto:'
           | does not sound super sophisticated?                   wget
           | --recursive --quiet $BASE_URL && grep -roh 'mailto:\([^"]*\)'
           | 
           | works on the example and just prints the email
        
             | planede wrote:
             | I think the idea is that email scraper bots typically don't
             | bother downloading images referenced by <img> tags.
        
       | magnat wrote:
       | > even when a human visitor has their JavaScript turned off, the
       | email address displayed on the page remains usable
       | 
       | NoScript on Firefox with default settings don't render <object>
       | tags (replaces them with placeholders), so this technique doesn't
       | work here.
       | 
       | https://imgur.com/2tCAgAf
        
         | Laaas wrote:
         | uBlock Origin can block JS too FWIW. There's a convenient
         | button for it in the extended menu.
        
           | brettermeier wrote:
           | Thank you, didn't know that!
        
         | jaeh wrote:
         | it's the same in chromium.
        
         | yau8edq12i wrote:
         | That's a different thing, though. Not sure why you'd make this
         | point.
        
       | dannyobrien wrote:
       | I would like to push back on the idea that you should obfuscate
       | your email address at all.
       | 
       | My email addreas is danny@spesh.com. I get a lot of spam --
       | possibly, since I have been distributing that address
       | deliberately on the web and inadvertently in hacked datadumps, a
       | near maximum amount of spam.
       | 
       | But the benefits of having people easily find a way to contact me
       | directly has for me far outweighed the (largely solved) challenge
       | of discarding automated spam.
       | 
       | Publish your email address! It's okay! Very little bad will
       | happen, and people will be able contact you without going through
       | some strange social media intermediary!
        
         | parasti wrote:
         | This is appropriate advice for the average HN reader. For
         | everyone else, probably not. I've seen first hand otherwise
         | intelligent people being unable to discern an obvious (to me)
         | online scam from a legitimate business. These are the people
         | spammers are targeting. These are the people that need to
         | obfuscate their email address.
        
           | _joel wrote:
           | So you're saying the same people unable to discern a spam
           | email knows how to embed a mailto: link in an XML document
           | and write webpages. Ok.
        
             | parasti wrote:
             | Never said that. I'm a web developer. People ask me to add
             | their emails to web pages. Comment quality on here seems to
             | have taken a dive.
        
           | richrichardsson wrote:
           | Even sophicasted users can slip up in the right
           | circumstances.
           | 
           | Personal anecdote: one morning, whilst still quite sleepy
           | received a _very_ well crafted Namecheap phishing expedition.
           | I half knew the product they were claiming was lapsed was
           | actually fine, but I had just recently renewed so I thought
           | perhaps there had been a problem I missed, and it was
           | convincing enough that I clicked the link before doing the
           | normal sanity checks. Thankfully the address it went to didn
           | 't resolve. Hopefully I would have noticed the obviously
           | incorrect URL before I entered any details, and I have 2FA
           | enabled, but still, I should and do know better, it was just
           | perfect timing for a well crafted attack...
        
         | SushiHippie wrote:
         | > in hacked datadumps
         | 
         | https://haveibeenpwned.com/
         | 
         | 45 data breaches and 7 pastes
         | 
         | Wow, I don't know if I've ever seen a real address in so many
         | breaches haha
        
       | cyptus wrote:
       | there is a quite big stackoverflow discussion about ideas how to
       | protect your email on your website:
       | https://stackoverflow.com/q/163628/1216595
        
         | zigzag312 wrote:
         | Sadly stackoverflow closed the discussion. Even though
         | discussion is both interesting and valuable.
        
       | karol wrote:
       | Spam filters work in 2024.
       | 
       | Does the fact someone independently discovers Gauss method to sum
       | up all the numbers 1...100 today make it worth sharing?
       | 
       | My point is that this is a primitive and easy to break workaround
       | and better methods exist.
        
       | geuis wrote:
       | Why? What's the point?
       | 
       | All you're doing I making it slightly more difficult for the
       | people that want to contact you to do so.
       | 
       | OCR has been a thing for years.
       | 
       | Just put your email out there. That's what spam filters are for.
       | 
       | charles@geuis.com. There. Scrape it. Spam it. I don't care.
       | 
       | Edit:
       | 
       | Yes, thank you for signing me up for the DNC (already a member),
       | some random Trump org, something about Scientology, and another
       | random christian-based website. Honestly, I'm kind of sad at the
       | lack of originality given the otherwise extremely ingenious
       | community we have here.
        
         | Maxatar wrote:
         | But you just proved the point. You might not care to be signed
         | up for some random Trump org, Scientology, or whatever, but
         | other people do care and if you want to author a website that
         | responsibly uses people's emails without subjecting them to
         | unnecessary spam, then it's worth taking these techniques (not
         | necessarily this specific one) into consideration.
         | 
         | While OCR does exist it's incredibly expensive compared to text
         | scraping. The main way to combat spam is to make the cost of
         | spamming more expensive than the benefit.
        
       | ceving wrote:
       | It does not work if you change the font-size.
        
       | brap wrote:
       | I've been using the same gmail address for like 20 years.
       | 
       | I don't think I got a single spam email in the last 5-10 years.
       | 
       | SMS, on the other hand...
        
         | rvnx wrote:
         | A couple of modern spammers send you spam from Gmail and say "I
         | included my colleague in CC please hit 'reply all' if you are
         | interested"
        
       | Etheryte wrote:
       | While the specific claim made about copying is true, you can
       | right click and select copy email address, simply selecting the
       | text and doing copy does not work. Similarly if you do select all
       | into copy etc, so all in all, I wouldn't expect a regular user to
       | be able to successfully copy this.
        
       | miki123211 wrote:
       | While there's nothing stopping this technique from being
       | accessible in principle, the example given in the article is a
       | really bad one.
       | 
       | The article uses "Email us!" as the label on the svg and a
       | elements, which effectively hides the actual email address from
       | screen readers. Using aria labels in this way is a really bad
       | practice, a screen reader user should have the same experience as
       | anybody else unless there's a very good reason to do otherwise,
       | and if you think your reason is a good reason, you're probably
       | wrong.
       | 
       | The proper way to do this would be to put the actual email
       | address in the labels,.
        
         | 47282847 wrote:
         | Isn't the whole point of the exercise to not have the document
         | contain the email address in a (machine-)readable format?
        
           | Doe-_ wrote:
           | The email address wouldn't be in the document directly, only
           | in the SVG. Whether the title of the SVG contains "Email us"
           | or the email address wouldn't affect how it works.
           | 
           | If the scrapper is searching the DOM rather than simply
           | downloading the webpages, then the email will found
           | regardless.
        
           | janosdebugs wrote:
           | The NVDA screen reader reads this text as: "This is my email
           | frame link email us." That is by no means equivalent to
           | actually seeing the email address. I found that HTML entity
           | encoding every single character of the link takes care of any
           | spam problem already and is much more accessible.
        
         | matteason wrote:
         | This can also affect voice dictation software like Dragon - if
         | a user says 'Click myemail@mydomain.tld' it won't activate the
         | link as Dragon is expecting 'Click email us', as that's now
         | what the browser exposes as the link text.
         | 
         | That point might be academic anyway as I'm not sure Dragon
         | would activate a link inside an SVG
        
       | nloomans wrote:
       | I tested the example using the TalkBack screenreader on Android.
       | With Firefox I was able to select and click on the link, but it
       | did not announce the email address. With Chromium it completely
       | ignored the existence of the SVG email. I was unable to select it
       | and it was like the email wasn't there at all.
       | 
       | So yeah, I wouldn't call this accessible.
        
       | yreg wrote:
       | > Email addresses published on webpages usually need to be
       | protected from email-harvesting spambots.
       | 
       | Do they though?
       | 
       | I have had my email address published on my website in a <a
       | href="mailto:... for like 20 years and I don't get spam that
       | would get through the spam filter.
       | 
       | I use both Gmail and (for some other addresses) a webmail hosted
       | by a local company which uses some other filter. Both work well,
       | so it's not something only Google can do.
        
         | xyst wrote:
         | this used to be a problem in the early 00s. I don't think spam
         | filtering was as good back then so protecting your public email
         | from spam was necessary.
         | 
         | Also this was a time when mail boxes were often allocated 10-25
         | megabytes. So spam bots could easily flood your email.
        
           | WirelessGigabit wrote:
           | When I signed up for Hotmail it was 2MB.
           | 
           | Then on April 1st, 2004 Google launched wasn't an April 1st
           | joke... GMail with 1GB! I remember getting a beta invite and
           | inviting others.
        
         | zufallsheld wrote:
         | I host my own Mailserver and all addresses that are publicly
         | visible get spam, e.g. my blog or my mail that was visible on
         | github.
        
         | nozzlegear wrote:
         | Same here, I've had my email plainly visible on my website in
         | mailto links and on Github, and I don't get any spam that
         | breaks through Fastmail's spam filters.
        
         | digging wrote:
         | My preference is to not have my email harvested at all when
         | possible, even if I don't personally see the spam emails. (I'm
         | not saying it's a critical privacy/security issue, but a
         | preference.)
        
           | jakubmazanec wrote:
           | So then you never use your email, right?
        
             | digging wrote:
             | What?
        
             | r-w wrote:
             | I think they're obliquely referring to the scanning
             | practices of major providers like Gmail, which most people
             | use to filter their spam.
        
         | adrianpike wrote:
         | I've also had my email posted in mailto's in a half dozen
         | places for... a long time. I remember in the early 00's when
         | I'd cargo cult the old "type the whole email out as adrian at
         | adrianpike dot com" thing on forums thinking it would work as
         | some mystical talisman, and it turns out considering emails to
         | be secret isn't worth the time.
        
         | a_random_canuck wrote:
         | They do. My wife lost her 10-year-old Instagram account to a
         | well crafted phishing attack against an email she had
         | published...
         | 
         | Instagram/Meta's customer support is absolutely atrocious and
         | disgraceful on this front. They basically treat my wife like
         | she's also a spammer and there's no way to recover the account
         | or undo any of the changes the spammers made.
         | 
         | It's hilarious how they ask you to "appeal" a ban by clicking a
         | single button without giving any chance to rectify what the
         | spammers did to her account. Of course their automated bots
         | just reject your appeal almost instantly. Shameful.
        
           | hoherd wrote:
           | This gave me "Press F to appeal ban" images.
        
           | crtasm wrote:
           | Does her email show up on any leaks on
           | https://haveibeenpwned.com/ ? I'm wondering if not publishing
           | it would have made any difference to receiving phishing
           | messages.
        
           | dgb23 wrote:
           | This could happen to anyone. You're tired or thinking of
           | something else, the attack weirdly aligns and you don't
           | notice it until it's too late.
        
           | chefandy wrote:
           | Would such an attacker be stymied by this? It seems like
           | automated email harvesting wouldn't be a big time saver for
           | any attack that required a well-crafted anything. I don't
           | know anything about that particular attack, though.
        
           | qingcharles wrote:
           | Clicking the appeal button is like a trap to permanently ban
           | your account.
           | 
           | You can get it back by paying off a Meta employee through a
           | site like Swapd. It's either that or get your comment to the
           | front page of HN. Those are the only two customer support
           | channels for Meta or Google.
        
         | qingcharles wrote:
         | I have two people I designed web sites for in the last year and
         | I put both their email addresses in the footer and neither one
         | of their accounts has received a single spam message in all of
         | that time (not even something dropped into the Spam folder).
         | Both sites are popular and have thousands of visitors and get
         | scraped by every search engine and AI bot you can think of.
        
           | r-w wrote:
           | Interesting. Maybe footer emails tend to be support contact
           | addresses rather than personal inboxes. Otherwise I'd find
           | that discrepancy very surprising.
        
         | paradox460 wrote:
         | The practice of email address "obfuscation" feels like a relic
         | of a bygone era, one that was never actually sound in its
         | methodology, but spread. A form of cargo-cultism has kept it
         | alive
        
           | SoftTalker wrote:
           | Yeah just looking at this, it appears to add about 1K of
           | overhead and at least one additional http request for
           | something that ultimately boils down to a mailto: link, so it
           | can still be scraped, and just adds bloat to your web page.
        
         | crazygringo wrote:
         | Exactly.
         | 
         | I definitely recall in the early 2000's it absolutely _did_
         | lead to spam, and e-mail obfuscation techniques were a real
         | thing that genuinely helped.
         | 
         | But by 2015 or so it didn't matter at all anymore, in my
         | personal experience. It didn't even lead to spam that needed to
         | filtered. Spammers just stopped looking for e-mails that way.
         | 
         | Which makes perfect sense -- most people don't have their
         | e-mail address listed anywhere online in the first place, but
         | you can _purchase_ gigantic lists of e-mail addresses. That
         | either originate from companies that sell their own user lists,
         | or people who hacked the companies ' servers.
         | 
         | These days if you want to send spam, trawling the web for
         | e-mails makes zero sense. It's practically the least efficient
         | thing you could do.
        
           | r-w wrote:
           | Unless you're the one trying to sell them, in which case
           | that's part of doing business :)
        
           | treflop wrote:
           | I've been having all my email addresses posted plain text
           | since like 2005 and I've signed up on like every website
           | imaginable (my password manager has over 2,000 entries) and
           | I've never had a spam problem, at least on Gmail.
        
         | 4u00u wrote:
         | very recently, within a day of publishing an email on a footer
         | of a page i got a phishing email that was not filtered by spam
         | and looked very genuine
        
         | dhosek wrote:
         | My thoughts exactly. On the other hand, an email address I used
         | with Usenet ca 1999-2001 has had a consistent flood of spam. I
         | think most spammers are using the same 20+-year-old list of
         | emails.
         | 
         | The email address on my website doesn't even get stuff that
         | goes to the spam filter. Nothing, nada zilch.
         | 
         | I do think that there are some mailing lists that get generated
         | by trying to guess emails, brute-forcing gmail addresses by
         | trying dictionary attacks of the FIRSTNAME.LASTNAME variety or
         | 1-10 letters. I get a tiny amount of spam sent to a
         | domain@domain.com address I have, but that's typically on the
         | order of one message a year.
         | 
         | And all else aside, the overall volume of spam email has
         | declined dramatically, even ignoring the effect of the gmail
         | spam filter. I'm guessing that email as a spam vector just
         | doesn't make sense anymore and most of what goes out is a mix
         | of 419 scammers trying to make their quotas and would-be
         | scammers who've been scammed into buying that 20-year-old list
         | of emails.
        
       | janmo wrote:
       | Here is what I do:
       | 
       | <span class="contact-email">rea<span
       | class="hidden">nospam</span>l@mai<span
       | class="hidden">sjs</span>l.com</span>
       | 
       | I still receive "spam" tho, but it seems they manually collected
       | the email because what I receive are B2B proposals clearly
       | targeted at the topic of my website.
        
         | jszymborski wrote:
         | If the scraper uses a headless browser, I think that it might
         | defeat your method. That said, using a headless browser to
         | crawl for emails is relatively expensive so perhaps the spam is
         | not from your site.
        
       | dns_snek wrote:
       | Is there really a point to any of this? It's a fun exercise, but
       | also a complete waste of time if you're actually trying to hide
       | from spammers. You're making a piece of information public by
       | sharing it with the entire world, yet somehow expecting it to
       | only stay accessible to the "good guys".
       | 
       | Unless you change your email address at least monthly, all it
       | takes is for _one_ person or company to share your contact with
       | someone else or enter it into a database /CRM, or _one_ service
       | to get breached, then your email address is on a list that
       | eventually gets propagated to every spammer worldwide. If you use
       | that email with any regularity, the chance of those things
       | happening can be rounded up to 100%.
       | 
       | If hiding your email address from scrapers actually worked, spam
       | wouldn't exist. I never published my personal contact anywhere,
       | yet I get dozens of spam emails per week. They all get filtered
       | as spam, it's not a big deal.
        
       | muzster wrote:
       | Heavily guarded fortress would indicate something of value
       | inside, and the big crooks may spend a little more effort. In the
       | age of AI, this becomes even easier.                  {
       | "model" : "gpt-4-turbo",          "messages" : [             {
       | "role" : "system",              "content" : [ {
       | "type" : "text",               "text" : "return a json array of
       | all valid emails found in the image."               } ]
       | },             {              "role" : "user",
       | "content" : [ {                "type" : "image_url",
       | "image_url" : {                "url" : "data:image/png;base64,{{
       | INSERT_BASE64_PNG_DATA }}"              }            } ]
       | } ],           "temperature" : 0.5,           "max_tokens" :
       | 2048,           "top_p" : 1.0,           "frequency_penalty" :
       | 0.0,           "presence_penalty" : 0.0         }
       | 
       | Edit: Converting web page to an image is trivial.
        
         | zipping1549 wrote:
         | It won't make sense cost wise though
        
           | omneity wrote:
           | Except the cost is only going down over time
        
         | internetter wrote:
         | We've had OCR for _decades_ before GPT. I suspect GPT might
         | perform _worse_ than OCR. What a waste.
        
           | muzster wrote:
           | Agreed - it's a waste. GPT is not too bad at reading text
           | from image and with the added bonus that you can reason with
           | it.
        
       | hhsectech wrote:
       | Interesting idea...but could a crawler not just incorporate some
       | AI like LLava2 or convert the SVG to a JPG and use OCR to get the
       | email addresses out?
       | 
       | It just seems like this adds a couple of steps to existing
       | crawler scripts.
        
       | mediumsmart wrote:
       | this works if you write it into the html on fullmoon tuesdays :
       | 
       |  _< a href="&#109;&#x61;&#105;&#x6c;&#116;&#x6f;&#58;&#x73;&#111;
       | &#x6d;&#101;&#x2e;&#100;&#x75;&#100;&#x65;&#64;&#x74;&#104;&#x65;
       | &#46;&#x6f;&#116;&#x68;&#101;&#x72;&#100;&#x75;&#100;&#x65;&#115;
       | &#x2e;&#115;&#x69;&#116;&#x65;">&#115;&#x6f;&#109;&#x65;&#46;&#x6
       | 4;&#117;&#x64;&#101;&#x40;&#116;&#x68;&#101;&#x2e;&#111;&#x74;&#1
       | 04;&#x65;&#114;&#x64;&#117;&#x64;&#101;&#x73;&#46;&#x73;&#105;&#x
       | 74;&#101;</a>_
        
         | kevin_thibedeau wrote:
         | That works for humans. There's no reason to believe bots aren't
         | handling entity parsing.
        
           | robszumski wrote:
           | In my experience they haven't been in the past, but LLMs
           | change the game by doing it by default.
        
         | rishikeshs wrote:
         | how des this work
        
       | xyst wrote:
       | Kind of neat but I would rather just have a "throwaway" email if
       | I was sharing globally.
       | 
       | In my case, I setup an email alias with a sieve rule (if email
       | sent to alias move to "public inquiry" folder). Prior to
       | processing rule, spam assassin takes care of the non technical
       | folks that couldn't be bothered to run their spam campaign
       | through spam assassin testers. Or even nontechnical folks that
       | wouldn't know how to setup their domain for sending email (spf,
       | dkim, dmarc, ...)
        
       | throwaway598 wrote:
       | My domain: 24 years registered to me. A .com.
       | 
       | My email address: Listed at the top of the front page. In a H3
       | tag.
       | 
       | This email address's spam problem: Not a problem. 15ish per day
       | get to me including Junk folder. Thanks Purelymail.
       | 
       | What is a problem: Transactional email unrelated to transactions,
       | Promotional email which is newsletter junk spam, Social networks
       | complaining of not being used.
        
         | zufallsheld wrote:
         | 15 spam mails do seem quite much to me. I blacklisted addresses
         | for less.
        
         | SoftTalker wrote:
         | > Social networks complaining of not being used
         | 
         | This is my biggest one. I get more spam from Facebook begging
         | me to log in than I do from almost anything else. I haven't
         | used the account in about 7 years, you'd think they'd figure it
         | out.
        
           | kevincox wrote:
           | > you'd think they'd figure it out.
           | 
           | Cost of sending spam: Effectively zero.
           | 
           | Cost of pissing off inactive user: Essentially zero.
           | 
           | Cost of convincing inactive user to come back: Positive.
           | 
           | Add in a bunch of other factors like some product manager
           | twisting stats to make it look like they are getting users
           | back even if they really aren't and you see why it happens.
        
       | emayljames wrote:
       | a much easier way is to convert the email address into html
       | entities. It then displays and can be copied, but the actual
       | source code doesnt have the email address.
        
       | seanvelasco wrote:
       | i bought an premium .app domain a few months ago. not published
       | in websites yet. no history of previous owners. just a fact that
       | it's listed as a premium domain on registrars.
       | 
       | first emails I received after the gmail welcome email were b2b
       | sales from construction companies (i'm not in this field),
       | shopify optimizations (i don't run one), agencies suggesting how
       | i improve the ui/ux of my site (no website yet).
       | 
       | thankfully, they're all in the spam folder. i'm using google
       | workspace.
       | 
       | i believe these spammers get their leads on newly-registered
       | domains. so, how do we protect ourselves from that?
        
         | hu3 wrote:
         | I believe the only effective protection against these fresh
         | domain spammers is what you did:-some pretty good anti-spam
         | mechanism such as Gmail.
        
       | franky47 wrote:
       | Ironically, the only spam I receive these days comes from the
       | address I used here for the "Who wants to be hired" threads.
        
       | zaxomi wrote:
       | Cool.
       | 
       | 1 hour later.
       | 
       | Spam-scraper updated to support this.
        
         | mrbluecoat wrote:
         | Exactly
        
       | ChrisMarshallNY wrote:
       | That's a pretty cool trick.
       | 
       | I was not aware that we could embed CSS in SVG.
        
       | iforgotmysocks wrote:
       | I just have a simple contact page that sends message to discord
       | webhook
        
       | dxs wrote:
       | This is fun [2008]:
       | https://web.archive.org/web/20180908103745/http://techblog.t...
       | 
       | "Nine ways to obfuscate e-mail addresses compared
       | 
       | "When displaying an e-mail address on a website you obviously
       | want to obfuscate it to avoid it getting harvested by spammers.
       | But which obfuscation method is the best one? I drove a test to
       | find out."
        
       | cantSpellSober wrote:
       | Can't be copied and pasted.
       | 
       | It's _your_ domain, why not just have  "contact@example.com" for
       | incoming mail instead?
       | 
       | (Novel approach, thanks for sharing!)
        
       | kees99 wrote:
       | Not only "protecting your email" is pointless like others have
       | already pointed out, it's actively harmful.
       | 
       | There are a fair few sites, where most all content is perfectly
       | readable without JS, except things like "1920x1080@60Hz" are
       | displayed as literal "[email protected]" text.
        
         | digging wrote:
         | > There are a fair few sites, where most all content is
         | perfectly readable without JS, except things like
         | "1920x1080@60Hz" are displayed as literal "[email protected]"
         | text.
         | 
         | Do you have one on hand? That sounds absurd and I've never seen
         | it
        
           | tentacleuno wrote:
           | Mastodon instances fronted by Cloudflare (with Email
           | Protection on) are good examples.
        
       | helsinkiandrew wrote:
       | Don't modern spam filters filter out most mails received this way
       | and most spammers purchase lists for a specific targeted domains
       | - house owners, porn users, dentists etc. rather than blindly
       | scraping the web?
        
       | dartos wrote:
       | Idk LLM powered scraping can pull the email out of this without
       | any issue
        
         | stkdump wrote:
         | It even uses the exact same syntax as in html, so as long as
         | svg content isn't specifically excluded, normal web scraping
         | would just work without modification.
        
         | judge2020 wrote:
         | Perhaps, but I think OCR is more likely.
        
       | portaouflop wrote:
       | Maybe I'm too stupid but I don't get why you would want to do
       | this at all. Had my email in plaintext on the website for ages
       | and never had an issue with spam...
        
       | robbyiq999 wrote:
       | How about posting 2 email addresses, a hidden one, and the actual
       | one. Using the hidden one to filter the actual one
        
         | JohnFen wrote:
         | This has been my approach since the mid '90s. It works very
         | well.
        
       | butz wrote:
       | I assume that nowadays emails are pulled directly from hacked
       | mailbox contacts list. Nobody has the time to go through each
       | individual website and collect emails one by one.
        
         | Closi wrote:
         | I assume that emails are pulled from every method available.
        
         | Tagbert wrote:
         | No body. Web crawler bots.
        
       | donatj wrote:
       | A friend of mine is an absolute wizard and has been building
       | essentially "responsive images" as SVGs with JS inside. They
       | adapt to their size programmatically. It's... interesting.
       | 
       | The fact that SVGs can even have JS embedded feels both untapped
       | and kind of dangerous.
        
         | soperj wrote:
         | SVGs are responsive out of the box? I'm confused about what the
         | Javascript would be doing to help that situation within the
         | svg.
        
           | asynchronous wrote:
           | I think they're talking about dynamically actually changing
           | the image itself, not just resizing
        
         | johnny99k wrote:
         | This has been known in the security community for quite some
         | time.
        
         | alemanek wrote:
         | That sounds super interesting. Does your friend have a GitHub
         | or site that shows what they're doing on that front. If so
         | could you post link.
         | 
         | This is super far out of my wheelhouse technically as a backend
         | engineer but it sounds really cool.
        
       | replete wrote:
       | <a href="{rewritten by js}">domain.com</a> a::before { content:
       | "username@" }
        
       | CM30 wrote:
       | I think the main thing people forget with stuff like this is that
       | yes, all these setups are possible (or even trivial) to bypass,
       | but you're not really dealing with a dedicated adversary that's
       | targeting you in particular.
       | 
       | Spammers probably aren't going to update their tools to take into
       | account every possible way every site obfuscates their email
       | addresses, so the main trick to dealing with them would be to do
       | something other sites/services don't. If you or your company
       | become successful enough that people are actually targeting you
       | in particular, then congrats, you're probably in a good place
       | anyway.
        
         | cmiller1 wrote:
         | > Spammers probably aren't going to update their tools to take
         | into account every possible way every site obfuscates their
         | email addresses
         | 
         | But this is also sort of a security through obscurity approach,
         | if enough people adopt one of these methods of obfuscation then
         | the spammers absolutely will change their tools.
        
       | sircastor wrote:
       | I think I get more unsolicited email from related businesses
       | trying to get a foot in the door with my company - I assume
       | they're connecting dots either from LinkedIn or Github (probably
       | both). This is an interesting solution to the problem, but I
       | don't genuinely think that anyone is scraping websites for email
       | addresses anymore. I don't think it's cost effective for the
       | modern spammer.
        
       | readmemyrights wrote:
       | Funny I'm seeing this now, I've finally ade the first tentative
       | steps into making a website, and noticed that pandoc has an
       | --email-obfuscation option and the whole topic was on my mind. I
       | don't remember the last time I received an actual spam email (not
       | counting desparate marketters trying to remind me of that one
       | website I tried ages ago). Funnily enough, the new frontier seems
       | to be what's app and SMS of all things. A month or two back I got
       | a job offer from an indonesian phonenumber from what's app, and
       | then something similar directly to my SMS. I didn't publish my
       | phone number anywhere online, the closest thing to making it
       | public was joining my college's what's app group and giving my
       | phone number to a bank for a student credit card, and honestly I
       | wouldn't put leaking them to some spam agency beyond either.
       | 
       | I'm using voice over on MacOS chromium and I have the same
       | experience as the NVDA user, although if I interact with the
       | "link" I'll eventually find the email. If I wasn't aware of the
       | ofuscation however I probably would just think the webpage was
       | weird, saying "this is an email" but actually giving a mailto:
       | link. In general, if you're doing something special to improve
       | accessibility then odds are you're doing it wrong, and if it's
       | anything web related the odds are at least 90%. Most
       | accessibility issues on the internet are developers trying to be
       | smart by using ARIA labels or such which usually just make it
       | worse. The example I have to deal with most often are manpages on
       | man.openbsd.org. All of their cross references to other manpages
       | say something like "openssl, section 1" instead of "openssl(1)",
       | which is what's displayed on the screen and what the browser's
       | find command sees while searching.
       | 
       | For completeness, I also tried the page with various terminal
       | browsers, specifically lynx, felinks, w3m, and edbrowse. None,
       | and I mean NONE of them could display the svg properly, they
       | couldn't even recognise it as an image.
        
       | _blk wrote:
       | Seems like a great solution but I'd like to embed the data
       | directly rather than linking an external file. Then one issue I
       | see is that dumb scrapers just look for the email address (also
       | in the embedded SVG, which they might not for external <object>
       | or <img> files.) But for direct embeds, if the string is not
       | otherwise encoded, that could potentially leak the email address.
       | 
       | While this obviously (re)introduces JS into the mix, how would a
       | simple compressed string fare against base64 svg embedding?
       | 
       | ``` const compressedBase64Svg = '...';
       | 
       | function decompressAndInsertSVG(encodedData) { const decodedData
       | = atob(compressedBase64Svg); const decompressedSvg =
       | decompress(decodedData); const svgContainer =
       | document.getElementById('svgContainer'); svgContainer.innerHTML =
       | decompressedSvg; }
       | 
       | decompressAndInsertSVG(encodedSVG); ```
        
       | nojs wrote:
       | This is a cool trick. The email is in cleartext in the source,
       | meaning mailto works and copy-paste works. But most scrapers
       | probably skip the .svg file.
        
         | pdonis wrote:
         | _> most scrapers probably skip the .svg file_
         | 
         | But they won't as soon as they realize it's just easy to parse
         | text that contains data they're looking for.
        
       | CodeWriter23 wrote:
       | Seems kind of easy to defeat, just read the SVG to extract the
       | email address from the mail to: link contained therein. Bonus the
       | harvesting bots will now download all SVG files going forward.
        
       | kindawinda wrote:
       | google might start indexing your email
        
       | saint-loup wrote:
       | At that point, isn't adding a good old contact form a simpler
       | solution? You can link it with your email address or other
       | channels. It can even works with static websites, I hooked up
       | mine with Nextcloud Forms.
       | 
       | I appreciate the hacker creativity at display here, but as other
       | said obfuscating an email address raises accessibility issues.
       | Hiding content from some programs and not others (spam bots vs
       | assistive technologies) seems inherently a losing game, for you
       | or for users.
        
       | kelnos wrote:
       | I gave up on this sort of thing. Spam filters are good enough
       | nowadays that I don't think I see an increase in spam by having
       | my email address publicly available without obfuscation. (That
       | is, an increase beyond other spam sources, like crappy companies
       | who have my email address for a legitimate purpose, but sell it
       | to third parties.) In general I see less than 1 spam email hit my
       | inbox per day, and that's fine.
       | 
       | Granted, this may depend on email provider and spam filter, so
       | YMMV, but it hasn't been an issue for me.
        
       | niutech wrote:
       | This requires loading an external SVG file, better use an inline
       | version:                   <object data="data:image/svg+xml,%3Csv
       | g%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20viewBox%
       | 3D%220%200%20200%2024%22%3E%3Ca%20href%3D%22mailto%3Amyemail%40my
       | domain.tld%22%3E%3Ctext%20x%3D%2250%25%22%20y%3D%2250%25%22%20dom
       | inant-baseline%3D%22middle%22%20text-anchor%3D%22middle%22%3Emyem
       | ail%40mydomain.tld%3C%2Ftext%3E%3C%2Fa%3E%3C%2Fsvg%3E"
       | type="image/svg+xml"></object>
       | 
       | Also have a look at this:
       | https://spencermortensen.com/articles/email-obfuscation/
        
       ___________________________________________________________________
       (page generated 2024-05-13 23:01 UTC)