[HN Gopher] Your e-mail validation logic is wrong
       ___________________________________________________________________
        
       Your e-mail validation logic is wrong
        
       Author : Tomte
       Score  : 240 points
       Date   : 2021-05-24 11:27 UTC (11 hours ago)
        
 (HTM) web link (www.netmeister.org)
 (TXT) w3m dump (www.netmeister.org)
        
       | danrl wrote:
       | I have a very short email address in the format a@b.tld and for
       | my special friends that don't know how to validate correctly I
       | have created abc@ur-email-validation-is-broken.b.tld
       | 
       | I need to use the latter ~5% of the time. Most often I take my
       | business to someone else for the sake of principle.
        
       | anttisalmela wrote:
       | It doesn't really hurt if some more exotic email addresses are
       | not accepted, no one can really use them anyway.
        
         | asddubs wrote:
         | in fact it probably catches more mistakes than deliberate weird
         | email addresses to be stricter than the standard mandates
        
         | jiofih wrote:
         | Well, you can't use them because of attitude like yours. You're
         | not only annoying those of us who want to use plus signs or
         | international domains, but ensuring the "exotic" half of the
         | world that doesn't use the Latin alphabet is kinda left out.
        
           | anttisalmela wrote:
           | Non-ascii alphabets need a lot more support than just
           | accepting them. But I wouldn't really consider plus signs
           | exotic anyway.
        
             | jiofih wrote:
             | What does that even mean? We shouldn't support them because
             | it requires more work?
        
         | [deleted]
        
         | abrowne wrote:
         | If you are consistent. A couple times I've successfully signed
         | up for something with a username+string@gmail.com address but
         | then have been unable to unsubscribe because my address is
         | "invalid".
        
           | NullPrefix wrote:
           | I kinda recall a lawsuit many years ago about a unsubscribe
           | confirmation email.
        
       | cratermoon wrote:
       | I've decided that the best way to validate email address is to
       | not validate them, but require that any signup be finalized by
       | the individual following a link emailed to them.
       | 
       | This allows a person to use any damn thing they want as their
       | email address, provided it works and they can get the email.
        
         | manmal wrote:
         | My cheap-o approach to this is: Check there's an @, and that
         | there is a dot afterwards. This excludes local domains
         | obviously, but I don't want those anyway.
        
         | Arubis wrote:
         | 100% agreed here. Accept a text field; maybe validate that it
         | has an @ in it and a . after the @.
         | 
         | Send that address a confirmation email. Now you've got
         | consensual opt-in and you've somewhat protected yourself from
         | adding a wrong address to your recurring mailing list.
         | 
         | Prevent abuse with long (seconds) delays between submissions
         | from the client. If the user thinks they did it right, they're
         | waiting on their email inbox anyway; if they immediately
         | realize they made a typo, it'll take 2-3s to fix.
         | 
         | The RFCs were written when manually (not from cron) sending
         | email to another user on your local system as a thing that
         | actually happened. I'm certain you actively want to avoid that
         | now.
        
           | eli wrote:
           | Yup I've been working in email marketing for a long time and
           | this is what I do if I need a regex. I remember when .mobi
           | TLD came out and people with those address had a terrible
           | time signing up for things because a bunch of developers got
           | too cute and assumed a TLD could only be 2 or 3 characters.
           | You want to be really lax in what you validate.
        
         | serial_dev wrote:
         | This is also my preferred approach.
         | 
         | If I can send you an email and you can verify that you have
         | access to that email, your email is "valid enough" for me.
         | 
         | Then, the validation is basically "is there an @ and after a
         | dot in there?". I find that after that, every hour spent on
         | improving the validation will just cause more emails falsely
         | flagged as invalid, more support requests from the people who
         | couldn't sign up with valid emails, it's code we need to
         | maintain, anytime edits the validation logic risks breaking
         | sign ups completely.
         | 
         | So with more "improvements" to the validation, you just cause
         | more problems. Then why do it?
         | 
         | I hear the reputation arguments, but in practice, it never
         | happened to any of the organizations I worked for.
         | 
         | What happens though very often is naive engineers trying to
         | solve problems the business doesn't have with knowledge they
         | lack...
        
           | mro_name wrote:
           | > naive engineers trying to solve problems the business
           | doesn't have with knowledge they lack.
           | 
           | premature implementation is the source of most evil. :-)
        
         | welder wrote:
         | If sending emails is 100% free, but you still have to worry
         | about your sender reputation. [1] Sending a large amount of
         | mail to invalid emails will start getting your emails put in
         | people's spam folders. That's the reason email validation
         | services exist, to prevent sending to invalid emails. [2]
         | 
         | Also, humans make mistakes. You should detect spelling errors
         | and typos then suggest corrections. [3]
         | 
         | [1] https://www.mailjet.com/blog/news/3-factors-that-impact-
         | your...]
         | 
         | [2] https://www.mailgun.com/email-validation/
         | 
         | [3] https://www.npmjs.com/package/mailcheck
        
           | mro_name wrote:
           | Even if 0% free you'll have to do the opt-in anyway, or how
           | on earth will you figure out if the recipient wants your
           | email?
           | 
           | It's hard to be smart with something like names.
        
           | Avamander wrote:
           | Oh don't worry about this at all because spammers are going
           | to sign up with legitimate e-mail addresses that are going to
           | get your reputation lowered. Very common tactic and you won't
           | be saved by some dumb regex that would just probably hurt a
           | few real users.
        
           | eli wrote:
           | Mickey@mouse.com is a perfectly valid address but it isn't my
           | address. If that matters for your application you need to
           | spend the capital to send an email. No way around it.
        
             | bobthecowboy wrote:
             | Even worse, I have commonfirstnamecommonlastname@gmail.com
             | and get several emails a day that I didn't sign up for. Now
             | the person who did sign up isn't getting them _and_ I have
             | to figure out how to opt out of them. Sometimes these
             | website accounts already have payment /personal details
             | associated with them, which I now have access to (and
             | indeed, sometimes _have to view_ ) in order to find the
             | "stop sending me email" button.
             | 
             | Always send the confirmation "did you sign up?" email.
             | Always.
        
           | jdhawk wrote:
           | So you use another 3rd party validation service, paying
           | $300-500/million addresses.
        
           | deckard1 wrote:
           | This is really just a problem for spammers going out and
           | either buying mailing lists that haven't been validated or
           | scraping the web for email addresses. In the case of the
           | spammer, they would probably care a lot more about their
           | bounce rate than their false negative rate (i.e. valid
           | addresses that fail some sort of validation regex). In fact,
           | they would probably tune their validation to actually throw
           | away addresses that didn't look correct just to be safe.
           | 
           | Obviously, this is a different scenario than your bank not
           | accepting your valid (per RFC) email address. Which is why
           | any sort of blanket advice is pretty dumb. Not that I care to
           | aid spammers...
           | 
           | The other scenario might be a site that puts up a "paywall"
           | type thing, where you are forced to enter an email address to
           | gain quick access to something, but doesn't want to bother
           | you with going and verifying an email (e.g. instant
           | discounts, downloading a PDF, etc.). Or in-person email
           | address collection when you buy something in a store. It's
           | never a good idea to collect email addresses of people that
           | have no desire to subscribe to your marketing.
        
       | radicalriddler wrote:
       | I have two things.
       | 
       | The amount of times I've tried to sign up with my protonmail
       | account to a service and it doesn't pass validation simply
       | because it's a protonmail account (not a gmail, outlook, hotmail
       | or aol apparently). makes me wish everyone did follow the RFC. I
       | actually emailed a service one time, and they responded that it's
       | due to protonmail usually being associated with shady stuff wtf.
       | 
       | The second. I had to implement an email validator at one of my
       | previous jobs, and fell down the RFC rabbit hole. Not only did I
       | have to follow the RFC as per my bosses request, but I also had
       | make sure that Amazon SES allowed it. Came out of the office
       | wanting to just walk out onto the road. The weird things that not
       | only email servers allow, but also, what do email clients allow.
        
       | saurik wrote:
       | This is all a massive misunderstanding. An email address is the
       | local name and the host; a host can't contain an @, so the only
       | thing you frankly need do is split on "last @" and demand the
       | user not escape anything. As for validation, go ahead and try to
       | resolve the domain to make sure it works (and, if you want to
       | verify the local part, do an online check with their server).
       | 
       | If this squicks you for some reason--as maybe that format is non-
       | obvious with respect to the lack of a need to escape @--give the
       | user _two_ boxes with a hardcoded @ between them and have them
       | type the two parts separately: pre-parsed input need not ever be
       | escaped, as you aren 't going to parse it at all; no need to
       | implement " dequoting.
       | 
       | All of these escaping rules are then to support embedding this
       | identifier into SMTP. The rules for embedding the same identifier
       | into MIME are different... and even more complex! In MIME they
       | support random stuff like "comments" in the middle of the
       | string... is that part of the email address identifier? No.
       | 
       | An email address simply is not defined by the format you use to
       | send it as part of an SMTP command, nor is it defined by the
       | format you use to send it as part of a MIME message header :/.
       | Into is an identifier that exists separately from either of those
       | two (different) protocols and one would expect any number of ways
       | to escape that content.
       | 
       | To demonstrate how ridiculous this all is, imagine someone comes
       | up with a JSON protocol for mail submission and then documents
       | how email addresses now should use \u encoding and escape
       | quotation marks... does that mean users should type that into
       | your app? No.
       | 
       | Hell: your email address form is taking an email address and then
       | sending it over HTTP... the escaping rules for HTML form fields
       | are different still, yet no one is asking users to type HTML-
       | escaped strings into other applications, right?
       | 
       | The core thing wrong then with your email validation is that you
       | are simply validating the wrong thing: unless you are developing
       | an SMTP server, the rules for how to escape and parse _escaped_
       | email addresses in RFC5321 are irrelevant; and, likewise, unless
       | you are developing a MIME parser, the rules for how to escape and
       | parse _escaped_ email addresses in RFC5322 are also irrelevant.
       | 
       | The only thing that matters from either of these specifications
       | is the underlying basic rule for what semantically can exist in a
       | hostname and a localpart, and RFC5321 is _extremely_ lax: you can
       | use any  "ASCII graphic or space", and so excludes only ASCII
       | control characters and 8-bit characters... and then, as
       | mentioned, another RFC removes the 7-bit limitation and opens up
       | the world of Unicode.
       | 
       | (To push on it even further: it isn't even clear to me that one
       | should consider the ASCII control character limitation to be
       | fundamental to the email address identifier or a weird limitation
       | of the current version of SMTP; and since none of those email
       | addresses are going to _work_ , I think one may as well just
       | consider the local part to be any string of Unicode code points.)
       | 
       | Think about this: it is up to your SMTP library to correctly
       | escape the email address you give it for SMTP, and here's the fun
       | part: if you give it a _pre-escaped_ email address, then clearly
       | it is going to have to _double escape it_ , right? So,
       | semantically, these extended discussions of quoted strings and
       | character limitations are always just so ridiculous :/... you
       | absolutely _should not_ be dealing in SMTP-escaped addresses or
       | asking your user to understand SMTP (and the same goes for MIME).
       | 
       | (BTW, if you want some "real hell", one of these two protocols--I
       | forgot which... I presume SMTP--seriously supports an _empty_
       | local part. If that doesn 't tell you everything you need to know
       | about un-opinionated these RFCs are with respect to "anything
       | goes" then I don't know what will ;P.)
        
       | grouphugs wrote:
       | it's 2021, i am too poor to make a yahoo email account
        
       | [deleted]
        
       | BenjiWiebe wrote:
       | Why not accept absolutely anything in the email address field,
       | and just require an emailed link to be clicked before marking the
       | email as validated?
        
         | rblatz wrote:
         | Because it causes conversion drop off.
        
       | zzo38computer wrote:
       | I cannot send a message to the email address they provide, but
       | not because of anything wrong with the email address itself, but
       | because that email address is version 6 internet, and I have
       | version 4 internet.
        
       | jacobobryant wrote:
       | I just outsource this to Mailgun. User signs up, I send them a
       | confirmation email, account doesn't get created till they click
       | the link. If the email address is invalid, Mailgun returns an
       | error and I show a page that says "We couldn't send an email to
       | <address>. If you're sure that's a valid address, please try
       | again." (Also use recaptcha for bot detection).
        
       | biztos wrote:
       | This is a great run-down of the trouble with e-mail addresses.
       | 
       | I worked in e-mail security for quite a while. "Write an e-mail
       | address parser" was my go-to technical interview question.
       | 
       | It was pretty easy to see if the candidate had ever given any
       | real thought to e-mail (most had not); and you could also pick up
       | a lot of signals about engineering style, for instance if they
       | started with a regex (fewer did than I expected). And it was
       | trivial to adjust the difficulty: if someone thought the question
       | was easy and had a fast solution, you could just throw them a
       | test-case like the ones in this article.
       | 
       | (Note: the actual title is "Your E-Mail Validation Logic is
       | Wrong" -- and it's only about addresses, the author isn't
       | implying that e-mail systems can't validate messages nor for that
       | matter addresses.)
        
         | duxup wrote:
         | I'd sort of raise the issue that "Writing an email parser from
         | scratch is a bad idea due to the sheer complexity involved. If
         | you're looking for serious email address validation there may
         | be better options out there that have dealt with this
         | complexity rather than start from the ground up."
         | 
         | Not to say I wouldn't try just for the sake of working through
         | it as an example / 'where would you start' discussion.
         | 
         | But if we're pretending this is a real world task I'd probably
         | discuss how this is an endless / possibly ultimately futile
         | time sink and there might be better options than starting at
         | point A ;)
        
         | LambdaComplex wrote:
         | What if the answer they gave was "This is a very hard problem
         | that honestly isn't worth solving, just check against a . _@._
         | regex and call it a day? "
        
           | CydeWeys wrote:
           | "Can you write me a parser that has a <1% false negative and
           | <1% false positive rate on real email addresses?"
           | 
           | A similar enough issue happens in coding interviews anyway.
           | Sometimes the interviewee is aware of a library that
           | essentially solves the problem for you. In those cases I give
           | them some credit for knowing of it and then ask them to
           | implement it anyway, as if the library didn't exist (because
           | there are a large number of problems out there for which a
           | solution doesn't yet exist, and when hiring a SWE you need to
           | find someone who can write new solutions from scratch for
           | those situations; whether a given toy interview problem is
           | such a situation doesn't matter for the purpose of evaluating
           | said skills).
        
             | biztos wrote:
             | I would usually structure the question a bit, give a couple
             | test cases with different formats and ask something like
             | "write a class..." if in Python, etc. I wasn't trying to
             | trap anyone who might actually think /\w+@\w+/ covers the
             | range of all possible addresses.
             | 
             | Digression: I do miss the days when you could assume a
             | candidate for a position at Aquatic Widgets Incorporated
             | would know something about water, or about widgets, or at
             | least would have looked up what an aquatic widget is before
             | bothering to come in for an interview, but those days have
             | long since departed the realm of Software Engineering as
             | far as I can tell. Which may be a good thing from the
             | engineers' point of view, I'm not sure.
        
           | biztos wrote:
           | That would demonstrate an understanding of email -- it _is_ a
           | very hard problem -- but probably also an unpleasant attitude
           | you might not want in a co-worker. Whether the problem is
           | worth solving is very often not your call as an engineer.
        
             | harg wrote:
             | > Whether the problem is worth solving is very often not
             | your call as an engineer.
             | 
             | IMO well functioning teams do consider the thoughts of
             | their technical members when deciding which problems to
             | solve.
             | 
             | The person "making the call" on whether having perfect
             | email validation is worth solving may not have an
             | appreciation of how difficult it actually is, so having a
             | discussion with engineers on how much work/time it would
             | take should play a big part in prioitising it.
             | 
             | Additionally, things like validating email on signup are
             | mostly solved (albeit imperfectly) so one can and should
             | use existing implementations and focus on building their
             | product.
        
               | serial_dev wrote:
               | Yes, and it's a technical question. You wouldn't let
               | business people decide which database to use, how to
               | store data in a database, how to send data from backend
               | to frontend, etc... those questions should be up to the
               | technical team to decide.
               | 
               | Password strength requirements and email validation are
               | just like the database examples, and if a company doesn't
               | let these technical questions be answered by the
               | technical people, that's a bad sign.
        
             | duxup wrote:
             | >Whether the problem is worth solving is very often not
             | your call as an engineer.
             | 
             | True, but as an engineer you do need to provide accurate
             | feedback regarding "Hey, this is gonna work much of the
             | time but email is hard, this is a complex problem. If we do
             | this from scratch we're going to miss a lot of things
             | potentially".
        
             | jjk166 wrote:
             | No offense but if an organization does not listen to
             | engineering in determining how to deal with a technical
             | problem, that is an enormous red flag.
             | 
             | While maybe the engineer won't actually make the call, the
             | engineer should inform management's understanding of the
             | costs of the approach and the efficacy of alternatives, and
             | management should go along with that recommendation unless
             | they have a good reason not to. Of course tone is
             | important, someone saying "fuck no, I ain't doing that"
             | likely indeed would be unpleasant to work with, but a
             | respectful "I would recommend against doing that" is the
             | sign of a confident and intelligent professional.
        
           | mrunkel wrote:
           | So only a@b?
           | 
           | I think you're missing some stuff in your regex.
        
             | AnIdiotOnTheNet wrote:
             | Eh, that's why you use a validation email. Only bother with
             | 'validation' at all to catch something obviously wrong.
        
             | [deleted]
        
             | gvx wrote:
             | HN markup strikes again (OP wrote .X@.X, where X is an
             | actual asterisk, which HN renders as .<i>@.</i>)!
        
       | eli wrote:
       | No the problem is developers confusing validation with
       | verification. You can't validate your way to a correct address
       | and it's wrong to try.
       | 
       | If your goal is to catch typos you're better off with very lax
       | validation plus a library that suggests corrections like
       | "gmail.com" for "gnail.com" (both of which are of course
       | technically valid domains)
        
       | ericcholis wrote:
       | In addition to a very simple regex, you can do some light
       | verification on DNS and SMTP
       | 
       | - nslookup -type=mx email.com
       | 
       | - _pick the highest priority MX server_
       | 
       | - telnet mx1.email.com 25
       | 
       | - _validate SMTP handshake_
       | 
       | - _Start a connection:_ EHLO email.com
       | 
       | - mail from:<sender@youremail.com>
       | 
       | - rcpt to:<recipient@email.com>
       | 
       | Obviously, this might be outside the capabilities of some hosts
       | or users. There's a bunch of services that expose this workflow
       | for you as an api. (https://trumail.io/, for example)
        
         | teh_klev wrote:
         | See point 10 in the article:
         | 
         |  _" The domain name does not need to resolve"_
         | 
         | Also the mail server may be temporarily offline or unreachable.
        
           | jusssi wrote:
           | Non-resolving or offline mail servers count to your bounce
           | rate if you have a 3rd party service handling your outgoing
           | mail. So for that purpose, it is an invalid address in the
           | sense that you should avoid sending anything to it.
        
             | teh_klev wrote:
             | Those are rules made up for the convenience of marketers
             | and have nothing to do with the technical aspects of mail
             | delivery as defined in the RFCs.
             | 
             | Edit just to clarify:
             | 
             | KPI's such as bounce rates etc aren't a function of how
             | mail is delivered (RFC5321). These are KPI's collected and
             | collated by non-SMTP applications sitting on top of SMTP
             | infrastructure monitoring bounces.
             | 
             | Nowhere in RFC5321 does it mention that a mail server
             | should or must not delivery mail in respect of bounce
             | rates. These are operator defined metrics outside of the
             | scope of RFC5321, that may be aided by additional software
             | or services such as spam detection.
        
               | johncolanduoni wrote:
               | On the contrary, those kind of rules are made up to try
               | and keep marketers in check to some degree. Why would a
               | marketer want to get dinged for sending an email to a
               | nonresponsive domain?
        
               | teh_klev wrote:
               | You're going to need to quote the RFC(s) that
               | specifically mention bounce tracking to keep marketers in
               | check.
               | 
               | My original reply arose because there are times when a
               | receiving domain or destination email address can be _"
               | temporarily"_ unavailable. I pointed this out to
               | demonstrate that services that pre-validate recipient
               | addresses upon submission of a form don't take into
               | account transient outages due to any number of valid
               | factors.
               | 
               | SMTP was designed with this in mind, i.e. try to re-
               | deliver up to some acceptable threshold and then at some
               | point give up (the hard bounce which is the thing that
               | should cause the "ding", especially if they keep retrying
               | beyond "soft bounces").
        
               | johncolanduoni wrote:
               | You're going to need to show me the RFC(s) that
               | specifically mention bounce tracking is for the
               | convenience of marketers. Or maybe give up on every
               | practical aspect of a technology defined in RFC(s) being
               | covered by those RFC(s). SMTP seems a particularly bad
               | example if you expect to be able to write a useful
               | program using only the RFC(s), since every MTA has a
               | whole host of workarounds for non-spec behavior.
        
               | teh_klev wrote:
               | > You're going to need to show me the RFC(s) that
               | specifically mention bounce tracking is for the
               | convenience of marketers.
               | 
               | Perhaps re-read "jusssi"'s comment then mine. I didn't
               | assert that bounce tracking was for the convenience of
               | marketers, or suggest it was mentioned in any way in the
               | RFC's, _they_ implicitly did and I wanted to point out
               | the error in their understanding.
               | 
               | > SMTP seems a particularly bad example if you...etc
               | 
               | But the central theme of this whole HN discussion thread
               | is about SMTP.
               | 
               | If you're interested, sections 6 of RFC5321[0] are where
               | bounce messages are mentioned (just three times in the
               | whole RFC - bouncing, bounced and bounce) with no
               | reference to marketers. See also 6.1:
               | 
               |  _Some delivery failures after the message is accepted by
               | SMTP will be unavoidable. For example, it may be
               | impossible for the receiving SMTP server to validate all
               | the delivery addresses in RCPT command(s) due to a "soft"
               | domain system error, because the target is a mailing list
               | (see earlier discussion of RCPT), or because the server
               | is acting as a relay and has no immediate access to the
               | delivering system._
               | 
               | Which brings us back to my original comment, far above,
               | that services that check once if an email address is
               | "valid" using trumail.io or whatever when upon form
               | filling are flawed solutions.
               | 
               | [0]: https://datatracker.ietf.org/doc/html/rfc5321
        
           | icedchai wrote:
           | Ok, so the article is wrong. For an email to be valid right
           | now, yes, the domain part _has to resolve._ If you 're
           | accepting any email address that might be an email in the
           | future, then they are correct, but for 99.9% of use cases:
           | yes, the domain has to resolve.
        
           | [deleted]
        
         | jeffbee wrote:
         | Close but you also need to fallback to AAAA or A lookup of the
         | domain when the MX record doesn't exist. Also, do you really
         | want transient unavailability to stop your signup flow? The
         | whole point of the way mailers are written is the mail gets
         | delivered even in the face of transient unavailability.
        
       | wyldfire wrote:
       | > The local part is case-sensitive.
       | 
       | This seems more like a bug than a feature. Maybe in 1983 the
       | average email user knew what DNS was and could be expected to
       | know one part of the email address would be case sensitive and
       | the other not.
       | 
       | But email RFCs are probably like any other RFCs out there and
       | specify existing behavior for the sake of interoperability.
        
         | deckard1 wrote:
         | yeah, and that's a hill I'm willing to die on.
         | 
         | Imagine the average non-technical person talking to some
         | customer service agent on the phone and having to figure out if
         | her email is JANEWATSON@gmail.com, JaneWatson@gmail.com,
         | janewatson@gmail.com, or Janewatson@gmail.com. Could you
         | imagine the horror and complete security nightmare of multiple
         | people running around using the _same_ gmail address with
         | different case. Those Jane email addresses above would be _four
         | different people_. We 'd be receiving mail intended for other
         | people all day long.
        
       | aeharding wrote:
       | I just want my .dev email address to not be rejected.
        
       | permo-w wrote:
       | I think the author is stretching the words "valid" and "invalid"
       | past their limits here for the sake of hooking you into the
       | article. Yeah, in some countries it's a valid social practice to
       | spit on the floor in public, but in most, it's not.
       | 
       | let's say I'm a dev at google, and I'm writing some aspect of
       | gmail. Is !"PS$%@gmail.com a valid email? No. So the word valid
       | is clearly not being used correctly here.
       | 
       | At the core of smtp these emails are allowed, but in practice
       | they almost never are, and so the opposite is true. All the cases
       | he described are, in practice, invalid.
        
       | duckfang wrote:
       | Email validation :
       | 
       | Accept email from user.
       | 
       | Send email to that address with a link to verify.
       | 
       | Go/no-go test if link is clicked.
       | 
       | (If you're doing some fever garbage or otherwise trying to parse
       | it, you're doing it wrong.)
        
         | swiley wrote:
         | This doesn't work with hotmail where sometimes a robot will
         | click the link but refuse to deliver the mail.
        
       | JoyfulPanda wrote:
       | There is is awesome talk about E-Mail by Ricardo Signes:
       | 
       | https://www.youtube.com/watch?v=JENdgiAPD6c
       | 
       | The first 5 minutes are perl specific, but the rest is email and
       | just hilarious.
        
       | sneak wrote:
       | This is a bunch of weird edge cases that nobody uses in real life
       | except maybe the plus trick.
       | 
       | American Express and Walgreens don't let you set a
       | whatever@whatever.email address because they check for a TLD
       | known at the time of their app's validation code, or something.
        
         | novok wrote:
         | I use a amex@whatever.com & walgreens@whatever.com email for
         | both amex and walgreens?
         | 
         | I've run into 2, old-ish institutions that didn't quite work
         | with my whatever@whatever.com and had to modify it slightly for
         | them.
        
           | sneak wrote:
           | .email is a new-ish TLD.
        
       | csours wrote:
       | This feels like a discussion for backend implementations/email
       | forwarders, not for email signups... but hey while this has some
       | attention - For god's sake, put a button that says "This ain't
       | me", at least for important stuff.
       | 
       | I'm sorry, but I just can't bring in Clyde's truck for the oil
       | change, cause Clyde ain't me!
       | 
       | I also cannot attend Cassidy's parent teacher conference,
       | apologies, I am not in Ohio.
        
         | cratermoon wrote:
         | >This feels like a discussion for backend implementations/email
         | forwarders, not for email signups...
         | 
         | And yet I've worked multiple places where product people asked
         | for "simple email validation" on user signup. If they insist, I
         | ask them to provide some actual test cases that they care
         | about. Sometimes the product folks can be convinced to drop the
         | validation requirement if they can be shown that anyone who
         | can't sign up because their email address doesn't validate will
         | simply move on and not sign up.
         | 
         | In the case where your product is B2B and all the employees of
         | your customers are users (say an HR product), then the first
         | time a VIP at an important customer complains, that's usually
         | enough to convince your stakeholders to disable the email
         | validation.
        
       | cphoover wrote:
       | If your validation function works for 99.99% of your user's email
       | addresses and it's a big unnecessary lift to get that other .01%
       | your logic is not wrong.
        
         | LinAGKar wrote:
         | Just get rid of that pointless filtering altogether.
        
         | tgv wrote:
         | I don't think I've seen a bang path since 1990. The claim "Your
         | E-Mail Validation Logic is Wrong" is just pedantry.
        
           | icedchai wrote:
           | A little later for me. I last used bang paths in 1994, when I
           | had a UUCP feed.
        
           | strken wrote:
           | What you've seen since the 80s ended is unfortunately only a
           | subset of all the horrible edge cases your users will run
           | into.
        
             | ok123456 wrote:
             | What sendmail rules would you even use in 2021 to process a
             | bang path? Just deny.
        
       | mfbx9da4 wrote:
       | I once had a crack at building a sensible email validation
       | library.
       | 
       | * Validate the string contains "@" and a "." to the right of it.
       | 
       | * Validate common typos
       | 
       | * Validate disposable emails
       | 
       | * Validate MX records
       | 
       | * Validate SMTP server and mailbox
       | 
       | https://github.com/mfbx9da4/deep-email-validator
       | 
       | I don't have the time to keep it maintained but it works for the
       | most part!
        
       | quercusa wrote:
       | These days, I think 80% of email validation is just catching
       | 'gmial.com'
        
       | Yaina wrote:
       | What this article really showed me that this RFC is actually
       | pretty harmful.
       | 
       | Supporting all of the rules outlined in the spec is probably a
       | huge burden for maintainers of mail clients and servers.
       | Obviously some parts of the spec are going to be omitted. It's
       | hard to blame them for it, but the same person that rightfully
       | skipped over implementing the routing thingy might've also
       | wrongfully assumed there won't be a Japanese character in the
       | address. And that's what's so bad.
       | 
       | You might introduce more issues in your system, by taking the
       | full spec into consideration for your validation, instead of
       | using the whatwg regex someone posted here.
        
         | nradov wrote:
         | Well if there are problems with the RFC then you should work
         | with the IETF to correct those. They have an open standards
         | development process.
        
           | awestroke wrote:
           | Another option is to just ignore the RFC
        
             | forgetfulness wrote:
             | That does mean that there will only be an ad-hoc
             | undocumented standard for email addresses, rather than one
             | that's serviceable.
             | 
             | Web application validation forms add a different layer to
             | the standard and are sort of hard to tame; anyone can push
             | together a few lines of PHP or Javascript code and conjure
             | their own email address standard out of thin air.
        
               | gifnamething wrote:
               | Will be? There _is_ an ad-hoc standard.
               | 
               | If the standard fails to be used, the standard is
               | defective.
        
             | numpad0 wrote:
             | Isn't it just not very nice to ignore a Request for
             | Comments
        
       | buro9 wrote:
       | I have a tld that was recently created (2014) and I still cannot
       | use it in an email address reliably.
       | 
       | The domain in question being david.kitchen, so an email may be
       | email@david.kitchen
       | 
       | The issue I encounter more than any other is trivial: Most sites
       | still have a tld validation that only accepts domains that end in
       | net|com|org and some other small list of accepted suffixes such
       | as co.uk
       | 
       | The list of TLDs is constantly expanding
       | https://newgtlds.icann.org/en/program-status/sunrise-claims-...
       | so even `[a-z0-9.-]+@[a-z0-9.-]+\\.[a-z0-9]+` would be better
       | than what I see in the wild.
        
         | toyg wrote:
         | yeah, I routinely use .email and .cloud and it's so annoying
         | when the occasional site goes "THAT IS NOT AN EMAIL ADDRESS
         | !!111!1 YOU HAXXXOR!".
        
         | sparrc wrote:
         | I have the same issue. I use sparr.email and it fails
         | validation on a few critical websites, namely my online
         | utilities account (seattle public utilities) and payroll
         | processor (ADP).
        
         | tmk1108 wrote:
         | I managed to buy firstname.dev a while ago and this was one of
         | my fears of using it as my email address. I ended up switching
         | to a .com one just to avoid any issues. I certainly don't want
         | government services emails not to work just because maybe they
         | didn't account for .dev TLD
        
         | vidarh wrote:
         | I was involved in setting up .name back in 2001. We spent ages
         | contacting people with validation rules based on the old set of
         | TLDs. Given that was the first expansion of the gTLD space in a
         | long time, it wasn't so unreasonable _then_. But it 's just
         | astounding that it's still and issue 20 years later.
        
           | duped wrote:
           | sometimes you get regressions too. Kaiser Permanente
           | invalidated my email address earlier this year.
        
             | arkitaip wrote:
             | Someone found a sleeper regexp on Stack Overflow...
        
         | znpy wrote:
         | and the regex you provide doesn't even account for unicode..
        
       | flerchin wrote:
       | The one I see over and over is failing to trim the email before
       | doing validation. This is especially egregious at account
       | creation where you want no friction. Users enter their email with
       | their smartphone, and it may append a space at the end. More than
       | once, I've had a relative call me trying to figure out why
       | $website wouldn't accept their email as valid.
        
       | schwinn140 wrote:
       | Link is just hanging.
       | 
       | Also, anyone notice the OP posts the same couple of posts
       | constantly?
        
       | hutrdvnj wrote:
       | There is no point in many cases. Even if you can verify that the
       | email address is syntactically valid, you'll still need to check
       | that it was not mistyped, and that it actually goes to the person
       | you think it does. The only way to do that is to send them an
       | email and have them click a link to verify.
       | 
       | However, if you still want to validate an email address then use
       | a library. All popular programming languages have email
       | validation libraries. Yes, it's an extra dependency if it's not
       | included in the std lib or the framework you use, but email
       | validation is wrong in 99% of the cases, if you wrote it
       | yourself.
        
         | yawaramin wrote:
         | Or use the browser. HTML form validation has <input
         | type="email"> which checks that the entry is a valid email
         | address.
        
       | u801e wrote:
       | I provide my email address with the +companyname suffix on the
       | local part as a way to filter my email into various folders based
       | on the To header contents.
       | 
       | Unfortunately, many websites are configured to reject email
       | addresses that contain a plus character. I've also encountered
       | websites in the past that did accept the + character when
       | creating the account where the email address serves as the user
       | name, but then could not log in because their log in form
       | rejected the + character in the user name.
        
         | theshrike79 wrote:
         | Fastmail allows for companyname@youraccount.fastmail.com -style
         | addresses. Even for your own domains.
         | 
         | Much more reliable than the + -thing, which breaks in the
         | weirdest of places.
        
           | aidenn0 wrote:
           | I've been using fastmail for years and didn't know that.
           | Thanks!
        
         | theandrewbailey wrote:
         | I use Fastmail with my own domain name and unlimited email
         | inboxes, so I use companyname@mydomain.com to sort incoming
         | mail.
        
           | jetpackjoe wrote:
           | I do the same thing and believe it or not I've seen websites
           | reject emails with their own name in the email.
        
             | hateful wrote:
             | I had one do that. When I give the address in person I get
             | "do you work here?"
             | 
             | I had to switch my hosting provider at one point because
             | they stopped supporting catch-all. I have no idea how many
             | "addresses" I've used, since I don't create a specific
             | email for each, so I had to get new hosting (note: this was
             | over 10 years ago)
        
             | Semaphor wrote:
             | I recently got a letter from a companies' law department
             | and had to explain the whole thing :D
        
         | rpadovani wrote:
         | I use a catch-all to have a <website>@<mydomain>.com login for
         | every website.
         | 
         | Samsung doesn't accept emails with "samsung" as prefix, so I
         | have samsun@mydomain.com for them. I have no idea what's the
         | logic behind.
        
         | SAI_Peregrinus wrote:
         | I got sick of companies rejecting email with "+", and bought a
         | domain to use for email (among other reasons). Now I've got a
         | wildcard entry in DNS, so any valid local part gets routed to
         | my inbox. So instead of "username+company@example.com" I can do
         | "company@example.com".
        
           | axaxs wrote:
           | Can you explain the DNS part? AFAIK the sender just looks for
           | MX on the domain itself, regardless of local part.
        
             | toomanybeersies wrote:
             | The actual address in the email header should still contain
             | the subdomain though.
        
               | hug wrote:
               | The address "company@example.com" doesn't point to a
               | subdomain, though, the only reference to the company is
               | the local part of the address, and so has nothing to do
               | with DNS.
               | 
               | If he said he used "joe@company.example.com", then it's
               | possible he has a wildcard MX record for *.example.com,
               | but that's not at all what he said, although perhaps it's
               | what he meant.
               | 
               | Regardless, the question remains unanswered.
        
           | ElFitz wrote:
           | I ended up giving up on that after one too many websites
           | rejecting my custom domain (which I'm the only one using) on
           | signup. These lazy / ignorant colleagues are _annoying_ -_-'
        
             | bcrosby95 wrote:
             | I've been using a similar scheme for about 7 years now and
             | have never had my email rejected by a website on signup.
        
               | 90minuteAPI wrote:
               | The American Kennel Club rejected mine because the domain
               | was "too similar" to their name. I guess just because it
               | had a "kc" in it? Completely bewildering.
        
               | psutor wrote:
               | I use this scheme (company@mydomain.com) and one that I
               | remember blocking for this reason is Aliexpress/Alibaba -
               | aliexpress@mydomain.com was rejected so I use
               | ali@mydomain.com.
               | 
               | No idea what sort of security this is supposed to
               | provide.
        
               | ElFitz wrote:
               | It happens rarely, but some only accept a very limited
               | number of domains (ie Gmail, Outlook, etc).
               | 
               | They probably see it as some sort of security / anti-spam
               | mechanism.
        
             | toomanybeersies wrote:
             | I use a .xyz domain for my personal email, and I sort of
             | regret it.
             | 
             | My emails have a tendency to become spam filter bycatch, to
             | the point that when I was job hunting last year I'd have to
             | ring people after I sent them my resumes etc. to confirm
             | they actually received my email.
             | 
             | And when I give people my email address, I usually have to
             | assure them that steve@stevetech.xyz is a legitimate email
             | address and not a joke (it's not actually steve, but you
             | get the point).
        
               | psutor wrote:
               | I host my own email server, and .xyz is one of the 2 or 3
               | TLDs I went in the config files and manually blocked
               | since nothing but spam comes from it (and lots of it).
               | 
               | Definitely would not recommend using it for your personal
               | address.
        
           | richardwhiuk wrote:
           | That causes weird behaviour in places, where they assume the
           | bit before the @ is a "username".
        
             | 8ytecoder wrote:
             | I've been using an own domain with wildcard emails for many
             | years now. I'm yet to encounter a single scenario of
             | inferred names.
        
             | choward wrote:
             | I've been using this strategy for years and have not
             | encountered that issue before. That would mean the part
             | before the @ would have to be unique across all domains.
             | That doesn't make any sense. You couldn't have
             | webmaster@domain1.com and webmaster@domain2.com registered
             | for example.
        
               | CydeWeys wrote:
               | Or ben@gmail.com and ben@hotmail.com couldn't both be
               | registered. This scheme is so obviously flawed I can't
               | imagine it's widely implemented.
        
           | MivLives wrote:
           | What provider do you use for email? That does sound nice.
        
             | btmiller wrote:
             | My time to shine! https://btmiller.com/2019/12/12/regain-
             | control-over-your-inb...
        
             | xyst wrote:
             | The paid version of gmail (google workspace/gsuite) offers
             | this as well (they call it "aliases"). I haven't explored
             | the option myself, but I do recall seeing something like
             | this in the admin panel. Whether they charge for it or not
             | is probably something I should look into.
             | 
             | At some point, I need to migrate away from google and build
             | out my own personal mail server.
        
             | Angostura wrote:
             | In the UK, my domain name provider offers free e-mail
             | forwarding for (I think) 10 specific e-mail address, plus a
             | catch-all forwarder for anything else. Works quite well.
        
             | gpm wrote:
             | I use migadu for this.
             | 
             | I also use greg-*@domain instead of *@domain, since their
             | docs claim that setting up *@domain tends to attract more
             | spam.
        
               | nullify88 wrote:
               | Another Migadu user here slowly degoogling myself. $19 a
               | year is a bargain for my usage and the features I get.
        
               | bluehatbrit wrote:
               | Also a migadu user, I'm a huge fan and can't speak highly
               | enough of them. Their pricing model is a perfect fit for
               | me and their support address is really quick to respond.
        
               | _rs wrote:
               | Huh, how did I not ever hear/find out about this when I
               | was choosing a provider... I think this is the first time
               | I've seen them mentioned on HN, despite searching through
               | quite a few de-googling threads. Will definitely take a
               | closer look!
        
             | fooey wrote:
             | https://forwardemail.net/ is fantastic if all you want is
             | to forward domains somewhere else.
             | 
             | It's a freemium model, but I've never needed anything in
             | the paid tier
        
             | fk33 wrote:
             | mailbox.org also provides the functionality to use your own
             | domain and a have a wildcard entry, where all emails go
             | into your inbox.
        
             | sammorrowdrums wrote:
             | I use ProtonMail and sign up to everything with
             | <service>@<custom-domain> so I can track what they do with
             | my email.
             | 
             | It's not cheap from PM, and there are loads of hosting
             | providers that will provide catch-all email for free with
             | your hosting package (but with some usually pretty poor
             | webmail client) or if you use a mail client it should work
             | too.
             | 
             | I like having good webmail and mail app and other things so
             | I pay, but there are plenty of good options available.
             | Sadly self-hosting email server is not really an option for
             | a variety of reasons, but you should easily be able to use
             | catch-all e-mail addresses.
        
             | 8ytecoder wrote:
             | Fastmail supports it. The best part about fastmail is that
             | you can reply from the same address you got the email for.
             | This is useful in customer service scenarios that identify
             | your account based on email address.
        
             | tstrimple wrote:
             | I've tried something similar with Fastmail, and it works
             | out well for the most part. I have ran into more than a
             | couple services which won't accept email addresses not on a
             | whitelisted domain for some reason and I had to use an
             | @gmail.com address which forwards to my domain.
        
               | tmk1108 wrote:
               | Out of curiosity, are those popular services? I'm in
               | process of setting up email on my own domain and it would
               | suck having to fallback to Gmail if some service uses an
               | accepted list of domains.
        
               | bluGill wrote:
               | fastmail is reasonably popular. Gmail is bigger, but
               | fastmail is big enough that they cannot be ignored,
               | unlike when I ran my own personal server and often found
               | myself in blacklists without any knowable way to get off.
        
             | SAI_Peregrinus wrote:
             | I'm on fastmail.
        
           | ryandrake wrote:
           | I just set up my mail server to use - rather than +, and
           | don't encounter this problem.
        
         | fullstop wrote:
         | Ages ago, back in myspace days, their system would permit +
         | when creating an account, but could not handle this in their
         | forgot password / password reset system. I never was able to
         | delete my account because of this.
        
           | torstenvl wrote:
           | All social media accounts are delectable on a long enough
           | timescale.
        
             | fullstop wrote:
             | As it happens, it was eventually done for me:
             | 
             | https://mashable.com/article/myspace-data-loss/
        
           | inopinatus wrote:
           | Sony's SEN used to have an account creation page that would
           | permit +, but subsequent sign-in interpreted it as a URL-
           | encoded whitespace. No login for you
        
             | ezekg wrote:
             | lol you should have tried to enter a URL-encoded plus sign,
             | %2B.
        
         | innocenat wrote:
         | I find that a lot of website don't allow + sign precisely
         | because of Gmail usage.
        
         | caymanjim wrote:
         | I got sick of + not being accepted and switched to using - for
         | all my aliases, which works everywhere I've tried. It's
         | annoying, but practical (assuming you run your own mail server,
         | or have the ability to manage it client-side).
        
           | 8ytecoder wrote:
           | Plenty of hosted solutions support wildcard - including
           | GSuite and Fastmail.
        
         | vidarh wrote:
         | If you use Gmail here's a fallback option: Gmail ignores "." in
         | the local part. So foo.bar is the same as f.ooba.r to Gmail.
         | Obviously quite limited and more hassle to keep track of.
        
           | grey-area wrote:
           | This pattern is often abused by spambots trying to avoid dupe
           | detection, so using it excessively may lead to your login
           | being treated as spam.
        
           | caymanjim wrote:
           | One of my primary pet peeves with Gmail. It leads to a lot
           | more junk mail arriving in my inbox. My real Gmail address is
           | 'first.m.last', and almost all the spam I get is addressed to
           | 'firstmlast'. Gmail is great at filtering out spam so that I
           | don't see most of it, but if not for their unconventional
           | filtering of recipients, I'd get even less. I also get a lot
           | of email from idiots who don't know their own address and
           | provide mine instead, and literally all of that would bounce
           | without their . handling.
        
             | ben509 wrote:
             | Same here. I send everything that is firstlast@gmail
             | straight to junk.
             | 
             | > I also get a lot of email from idiots who don't know
             | their own address
             | 
             | Holy crap there are a lot of them. I've got one bank
             | sending me the dude's statements. He's also been on some
             | interesting trips, seen all his hotel stays, etc.
        
               | brewdad wrote:
               | Same. I don't have a very common name but there are at
               | least two other people who share it. One has used my
               | GMail address to apply for jobs and for his unemployment
               | benefits. I'm guessing he isn't having much luck with
               | either one.
               | 
               | The other finally figured it out but his wife still
               | hasn't after more than a decade. It gets really old
               | receiving reminders to service a vehicle I've never owned
               | from a dealership 2000 miles away among other similar
               | crap.
        
             | judge2020 wrote:
             | I thought it would be nice to have my name without numbers
             | as my gmail, but with all the stories i've heard, I think
             | i'm glad I have the numbers now.
        
         | ThalesX wrote:
         | I used this wonderful trick to sign up for my government issued
         | eID (it was something else but works for explaining). What they
         | decided to do is to simply remove the + and don't let me know
         | about it.
         | 
         | my_email+service@foo.bar thus became my_emailservice@foo.bar
         | 
         | I tried logging in, resetting passwords, nothing worked. I had
         | to go to the authorities and make a written request to allow
         | them to interrogate the database by the equivalent of my social
         | security number, and that's when we realized they just stripped
         | the +.
        
         | jbgreer wrote:
         | Ditto, with the same hassles mentioned by you and others, such
         | that I'm actively looking at email services that handle this
         | sort of thing better using approaches such as mentioned below -
         | domain@mydomain style registration addresses.
        
           | agustif wrote:
           | You can have unlimited handles with fastmail if you're
           | looking for that
        
         | moojd wrote:
         | I was unable to provide my email address for a retail rewards
         | program last week because the input field for the domain was a
         | dropdown in their POS. Not the TLD, the entire part of the
         | email after '@'!
        
           | StavrosK wrote:
           | Jeez, wow. How many domains were in that box?
        
             | bassdropvroom wrote:
             | "There are other emails besides gmail and hotmail? Woah!" -
             | the person who thought that was a good idea, probably.
        
           | skhr0680 wrote:
           | Until about a decade ago, this was extremely common in Japan.
           | RIP mobile email, another victim of smartphones in general
           | and the iPhone in particular
        
           | jacobkg wrote:
           | Yes this is terrible. On the other hand, if your goal is to
           | prevent people from signing up using disposable domains, the
           | blacklist approach (which I have tried before) is a never
           | ending game of whack a mole.
           | 
           | Sounds like this was in person at store though which is extra
           | weird because seems unlikely that scammers would be trying to
           | sign up en masse at a physical location (unlike if the form
           | is connected to the internet)
        
       | chrismorgan wrote:
       | I think the best syntax validation technique for email addresses
       | now is found in the HTML spec:
       | https://html.spec.whatwg.org/multipage/input.html#valid-e-ma....
       | As they say, this is a wilful violation of RFC 5322, because
       | that's simultaneously too strict, too vague and too lax to be
       | useful. They give a grammar, and the following regular expression
       | implementing it:                 /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~
       | -]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9]
       | (?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
       | 
       | Remember that the web is a platform that lives and breathes this
       | stuff. A lot of thought went into this grammar for valid email
       | addresses. This is a good way of filtering out obviously bad
       | stuff while allowing all realistic and sane inputs.
       | 
       | One part of all this that I'm _not_ aware of the situation around
       | is "8. You can put emojis in the local part." The HTML spec's
       | validator is all ASCII. It does remind you to punycode the domain
       | labels, but makes no mention of internationalised local parts,
       | and I've never learned about non-ASCII local parts or how well
       | they're supported. I gather they may require the sender to be
       | capable as well as the receiver, whereas internationalised domain
       | names were made compatible with all systems via punycode.
        
         | yjftsjthsd-h wrote:
         | > while allowing all realistic and sane inputs.
         | 
         | Isn't that a way of saying "while disallowing perfectly valid
         | options"?
        
           | jcranmer wrote:
           | What's disallowed are a) IP literal addresses and b)
           | localparts that require quoting. These email addresses are
           | highly likely to break many processing steps anyways; I've
           | only ever seen category b in sendmail configs (it can be
           | useful for internal email rerouting purposes).
           | 
           | There's a distinction to be drawn between the requirements of
           | the actual MTA/MUA/MSA layers and user applications built on
           | top of them. For the latter, considering emails to be invalid
           | if they contain IP literals or quoted localparts is going to
           | be more helpful than harmful (there's less scope for
           | vulnerabilities in doing so). It's just like assuming email
           | addresses are case insensitive: it's inappropriate if you're
           | an MTA, but for everybody else, go ahead and assume they are.
        
             | Avamander wrote:
             | > What's disallowed are [...]
             | 
             | A-ha, but here you're wrong because you've excluded IDNs.
             | This is really why you should not try to be clever.
        
               | jcranmer wrote:
               | IDN A-labels would still be accepted. Using the U-label
               | is likely to require the same level of support as EAI,
               | because without EAI support, non-ASCII strings are likely
               | to horribly, horribly screw up the lower levels of the
               | stack, and I wouldn't recommend supporting EAI without
               | actually testing to make sure your stack can really
               | handle EAI. (Not to mention EAI localparts being their
               | own can of worms).
        
         | crazygringo wrote:
         | Huh. Interesting this doesn't support international email [1]
         | addresses, e.g. kvitochka@poshta.ukr or
         | Dorte@Sorensen.example.com.
         | 
         | Seeing as the web has _long_ supported Unicode, where are
         | e-mail addresses currently at in that evolution?
         | 
         | Are full Unicode e-mail addresses something that is decently
         | supported today, or still largely theoretical? Is this regex
         | sufficient? What kind of e-mail addresses do people in China
         | most commonly use, for instance?
         | 
         | [1] https://en.wikipedia.org/wiki/International_email
        
           | Avamander wrote:
           | > where are e-mail addresses currently at in that evolution?
           | 
           | Baby shoes because of anglosphere programmers that can't
           | fathom people wanting to use their own alphabets and thus
           | forget to support it.
        
             | xeromal wrote:
             | This is a pretty pessimistic take. The real answer lies
             | somewhere between budget and speed. If someone asked me to
             | support non-latin alphabet, I'd have no idea where to start
             | and the amount of people that would use that feature isn't
             | worth the consideration. It's not that I don't fathom it,
             | it's that I don't have time for that shit.
        
             | crazygringo wrote:
             | You don't need to be so accusatory or ungenerous about it.
             | 
             | Clearly "anglosphere programmers" fathom it every day when
             | they use UTF-8 almost universally in webpages. Also, you
             | know, things like emoji are pretty popular in the
             | "anglosphere" as well.
             | 
             | It's obvious that the real reason is an ancient e-mail RFC,
             | and that while upgrading webpages to UTF-8 was relatively
             | easy, in that it only needs 2 parties to support it -- the
             | browser and the server -- upgrading e-mail is almost
             | infinitely more complicated, because you have to wait for
             | virtually all email code in the world to be upgraded, since
             | an e-mail address is pretty useless if it doesn't work
             | everywhere.
             | 
             | It other words, it's a coordination problem. Not an
             | ignorance problem.
             | 
             | And unfortunately, Punycode [1] doesn't seem to be a
             | particularly viable stepping-stone/compatibility solution
             | here. E.g. if a user tries to use domeinMing Li
             | @example.com and it fails, asking them to instead type in a
             | seemingly-gibberish eckwd4c7cu47r2wf@example.com, where
             | that could also conflict with a real e-mail address of that
             | name.
             | 
             | [1] https://en.wikipedia.org/wiki/Punycode
        
         | eli wrote:
         | How sure are you that the 61 character limit won't change in
         | some future DNS improvements? People used to think TLDs would
         | only ever be up to 3 characters long.
         | 
         | More importantly, what problem is this even trying to solve?
         | Someone accidentally typing a 300 character domain? If they are
         | intentionally feeding you gibberish they'll just give you more
         | realistic looking gibberish.
        
         | biztos wrote:
         | That regular expression fails to validate a bunch of the
         | examples from the article. And also single-word addresses,
         | which are pretty useful if you want to route email locally.
         | 
         | So what makes it the best?
         | 
         | [Edit: it also assumes you've already parsed out the "real"
         | address from the rest of the text field, which to me makes it a
         | half-validator at most.]
        
           | crazygringo wrote:
           | Yes, that's the explicit point of it.
           | 
           | But it would seem to be the best for general-purpose web use,
           | e.g. signing up for a newsletter with an e-mail address
           | that's pretty much guaranteed not to break anything.
           | 
           | Instead of being conservative in output, it's intentionally
           | being conservative in input.
        
         | judge2020 wrote:
         | Hope you're not in PHP, Perl, or Ruby!
         | 
         | http://emailregex.com/
        
           | jcranmer wrote:
           | That's a pretty bad page, given that it gives regexes that
           | match very different things for different languages, without
           | a) explanation what the differences are, and b) any rationale
           | for why you may or may not want to choose between the
           | different versions, let alone c) why different languages
           | "deserve" different versions of the regex.
           | 
           | This is already a field where there is a lot of
           | misinformation flying around, and a page that merely
           | regurgitates all of that misinformation without the
           | perspicacity to realize that its purported information is
           | internally incoherent is not helpful.
        
       | mk81 wrote:
       | Clicked thinking this might be useful information; was mostly
       | disappointed.
        
       | burke wrote:
       | I've never seen much point in trying to do better than .+@.+,
       | unless you're going to pull out the (gargantuan!) authoritative
       | version for some reason.
        
         | kevinmchugh wrote:
         | Implementing the authoritative version is a waste since you'll
         | also need to keep an up-to-date list of TLDs, and more
         | importantly, you might have a typo in the input that gives a
         | valid-but-incorrect email.
         | 
         | After doing your simple regex, the best move is to just send a
         | verification email and wait for the user to click the link, if
         | you really need to be sure.
        
         | kristaps wrote:
         | Yep, these arcane rules are maybe relevant to the 5 or so
         | people writing mailservers, but not to web developers.
        
         | wffurr wrote:
         | Yeah, do simple validation, and then just send an email. Even a
         | validated email can still be non-deliverable if there'sa typo
         | in the domain or the first portion.
        
           | dkersten wrote:
           | Or the user typos their address and it goes to the void or to
           | someone else. Even a valid address can't be assumed correct.
           | I get a ton of emails to my old gmail that aren't meant for
           | me because some people are too dumb to get their email
           | addresses correctly (even for someone's covid vaccination
           | appointment confirmation and details recently...) or just
           | make a mistake.
        
         | michaelt wrote:
         | Too many sites refuse to let me register as
         | "><script>alert("XSS");</script>@example.com
         | 
         | The oppression must end!
        
           | SAI_Peregrinus wrote:
           | I think the mismatched quotes actually do make that one
           | invalid.
        
           | Avamander wrote:
           | Honestly if that'd ever work, the website has bigger problems
           | anyways.
        
           | mro_name wrote:
           | shouldn't is read Bobby Tables rather than a mere XSS?
        
       | StavrosK wrote:
       | I gave a presentation on this topic in FOSDEM a few years ago:
       | 
       | https://www.youtube.com/watch?v=xxX81WmXjPg
       | 
       | (Loudness warning)
        
       | tehwebguy wrote:
       | Can't seem to find it anymore but wasn't there a post about how
       | the letter "d" by itself was a valid email address at one point?
        
       | crystaln wrote:
       | So <any chars>@<any chars> seems beyond good enough. There is no
       | benefit to validating beyond that for almost all cases.
        
       | lilyball wrote:
       | I'm of the opinion that all you should validate is that there is
       | an @, the text to the right side (of the final @) is a well-
       | formed dotted DNS domain, and that there exists at least one
       | (non-whitespace) character to the left of the (final) @.
       | 
       | Yes, I can craft garbage emails that pass this quite easily, but
       | who cares? If I'm crafting fake emails I can make valid ones too.
       | This rule ensures I typed the @ and the dot in my domain (we
       | really don't need to support dotless email domains and it's
       | better to catch "foo@gmailcom") and it won't reject all the weird
       | random emails people might have.
        
       | LogicX wrote:
       | Nice article.
       | 
       | My only technical nit would be the statement "if there was an MX
       | record".
       | 
       | Many systems will fall back to an A record to attempt delivery in
       | absence of a MX record.
        
       | pmontra wrote:
       | I just check that the string contains at least an @ character.
       | That ensures that we're not rejecting people with uncommon
       | patterns in their email address and takes very little time to
       | design, develop and test.
       | 
       | In a project we're doing something fancier: we check the result
       | of sending mail and store it in the database record for the
       | account (Mandrill notifies us on a webhook.) Then we might take
       | actions for bouncing addresses. The actual impact on the project
       | has been zero so far.
        
       | _wldu wrote:
       | Perfect is the enemy of good.
       | 
       | If it is a string that has an @ sign, a dot and is at least six
       | characters long, it's probably a valid email address.
       | 
       |  _a@b.cc_
       | 
       | No need to go further than this. It's not worth the time.
        
         | criddell wrote:
         | You don't need the dot. Item 12 in the list is "You can have
         | dotless domain names."
        
           | Biganon wrote:
           | Do you know many people who own a TLD?
        
             | criddell wrote:
             | Last time I checked, some of the TLDs did have an MX
             | record. Perhaps they use it for support or something? I
             | could imagine emailing info@tld or admin@tld or
             | support@tld.
        
           | harg wrote:
           | indeed, the point the parent comment was making is that
           | effective email validation need not perfectly implement the
           | RFC.
           | 
           | dotless domains are going to be so rare in practice (unless
           | your project has some niche use-case) that you can probably
           | ignore them and call them invalid for the sake of simplicity.
        
             | criddell wrote:
             | It's worth alerting the user that they likely made a
             | mistake. However, I think they should be allowed to
             | continue with a dotless address.
        
       | k__ wrote:
       | My main email, which is only 8 characters long, got denied
       | sometimes based on length.
       | 
       | Doesn't happen very often, though.
        
       | slifin wrote:
       | There's a lot of busy work in computing in the name of preventing
       | mistakes
       | 
       | At a certain level of complexity it's easier to just let mistakes
       | happen and provide correction tools if & when required
       | 
       | I remember getting our IPs blacklisted trying to programmatically
       | ask email servers if the email address we were provided was real
        
       | eldenbishop wrote:
       | It's not even about crazy emails. My wife works for an Aerospace
       | company so her work email was blahblah@blah.aero and a huge
       | number of websites still don't recognize that as a valid top
       | level domain.
        
       | xyst wrote:
       | g-mail smtp is returning this error:
       | 
       | "Verify that you have addressed this message correctly. Check
       | your SMTP server settings in Mail preferences and verify any
       | advanced settings with your system administrator.
       | 
       | The server response was: The recipient address
       | <'*+-/=?^_`{|}~#$@[ipv6:2001:470:30:84:e276:63ff:fe72:3900]> is
       | not a valid RFC-5321 address. <...> - gsmtp"
       | 
       | Not even Google engineers can get it right. We are doomed.
        
       | ds wrote:
       | If your email is <RFC>fan 69(tm)@root I am not going to let you
       | signup. Sending emails cost money and bouncing emails affects
       | your sender reputation. Also, for every user out there using
       | <RFC>fan 69(tm)@root as their email address, there is going to be
       | thousands of people accidently entering their email address
       | incorrectly and not getting a alert about it. Yes you could do
       | fancy shit like checking mx records and whatnot, but come on- Im
       | not going to maintain/build that infrastructure for the one out
       | of a million people who are trying to use that address.
       | 
       | Developer time is precious at a startup and supporting <RFC>fan
       | 69(tm)@root while still denying b ob@gmailcom is very, very far
       | down the list of things to do.
       | 
       | In summary: I don't suggest doing 'perfect' email validation to
       | RFC spec. You will save money/devtime and make more of your users
       | happy by not doing it.
        
         | toomanybeersies wrote:
         | I also came to the same conclusion some years ago. Or more
         | specifically, my manager brought me around after I tried
         | arguing that it was worth the time to make sure that users
         | could use an IPv6 address as their domain (the lack of periods
         | after the @ would cause
         | `user@2001:0db8:85a3:0000:0000:8a2e:0370:7334` to fail
         | validation)
         | 
         | He made a very convincing argument that while an IP address is
         | technically a valid domain, but how many legitimate users were
         | seriously using an IP address as their email domain? (zero)
        
         | welder wrote:
         | Yes, in practice I've found the exact same thing. Either use an
         | email validation service or be more restrictive than the RFC.
         | [1] Also prompting "Did you mean bob@gmail.com?" when the user
         | types "bob@gmaail.com" helps a lot with human input errors. [2]
         | 
         | [1] https://www.mailgun.com/email-validation/
         | 
         | [2] https://www.npmjs.com/package/mailcheck
        
           | ivraatiems wrote:
           | Except that as someone with an email at a .co domain, I get
           | really irritated when it asks me "do you mean
           | [mydomain].com?"
           | 
           | I always have to tell people, in real life, "it's .co, not
           | .com," just in case - humans do this too.
        
             | cygned wrote:
             | Worse, I had services trying to be smart correcting .co to
             | .com
        
             | welder wrote:
             | Yep, it's solving for the majority case. As long as it
             | doesn't block signup you can just ignore it.
        
         | 3np wrote:
         | How about not doing any pre-validation (save for whitespace
         | stripping) and have a validation e-mail (which you should
         | require anyway) take care of any typos?
         | 
         | With precious dev time, you can do better by doing less.
        
           | edoceo wrote:
           | You risk sending a junk message tho, which affects your
           | sender-spam score with other providers.
           | 
           | I just make folks email me first.
        
             | jacobobryant wrote:
             | I do the validation email, works great. Just be sure to
             | protect the sign up form with some type of bot detection (I
             | use recaptcha, but simpler methods are fine for most
             | sites).
        
         | josephcsible wrote:
         | This logic is why so many Web sites today won't let you use a
         | plus sign in email addresses, which ruins a really nice Gmail
         | feature.
        
           | kyrra wrote:
           | As people said, true spammers know to just strip off the "+"
           | in the email address. This is actually a fun reason to set up
           | your own domain and set up email forwarding for *@example.com
           | to go to your main gmail or whatever account, then the
           | "username" part of the email I just set to the domain of the
           | account I'm signing up for. So I'll use amazon@example.com
           | when signing up at Amazon (or whatever site).
        
             | em-bee wrote:
             | well, you could turn it around and use + addresses
             | everywhere, so that any legitimate response must be to one
             | of your + addresses. then treat anything without + as spam.
        
             | frereubu wrote:
             | Didn't you find you got a deluge of spam to generic
             | addresess like admin@, info@, offers@ and so on? I tried
             | this, although it was probably about 15 years ago now, and
             | reverted it because I got about the same amount of spam as
             | genuine emails.
        
             | progforlyfe wrote:
             | Although hardly anyone uses yahoo mail anymore, they
             | actually have this feature built in. Basically email
             | aliases.
        
             | shard wrote:
             | That makes me want to use an email address of the form
             | +myname@mydomain.com, just to see how websites would handle
             | stripping out everything starting from the +.
        
             | zzo38computer wrote:
             | I do a similar thing, except that the email is actually
             | hosted at my domain rather than being forwarded, and that I
             | have a list of email addresses that I accept and reject all
             | others; if I receive too much spam at one address, I
             | disable receiving at that address.
             | 
             | I have found this to work; I hardly receive any spam at
             | all, and do not need any separate spam filter.
        
             | judge2020 wrote:
             | Note that you should only do this for maybe 6-18
             | characters, some sites will test send an email to [30-100
             | character random string]@example.com and see if it bounces
             | - if it doesn't, it'll suspect that domain to be some
             | spammer with a catch-all email inbox and block it.
        
               | alufers wrote:
               | Do you know what sites do that? I have my own domain and
               | I haven't seen anybody do that. The obvious solution is
               | to configure your mail server to only accept usernames
               | before the '@' that adhere to some rule which only you
               | know. Like checking if it is a palindrome or something
               | obscure like this.
        
               | bbarnett wrote:
               | I watch multiple corp's mail logs extensively, this is
               | not even remotely a common thing.
               | 
               | Worse, I know at least 5 or 6 people personally, which do
               | catch all. It seems like a very poor method to reliably
               | catch spammers.
        
               | jasonjayr wrote:
               | That's a terrible approach, plenty of valid, legitimate
               | non-spamming domains use catchalls of arbitrary length
               | for all sorts of reasons.
               | 
               | Additionally, sending a test email like that might also
               | get the sender placed on a black list for triggering a
               | spam trap inadvertently.
        
               | pmontra wrote:
               | That's a worrying strategy because there are many reasons
               | for using a catchall. Example: one email per site to
               | track companies selling personal data, then maybe bounce
               | that single email address.
               | 
               | Do you know any site blocking domains with a catchall?
        
               | LorenPechtel wrote:
               | Yeah, if you have a domain of your own the sensible thing
               | is a catchall, use a different address everywhere and
               | block the ones that spam.
        
             | academia_hack wrote:
             | The + is also useful for knowing who sold your email
             | address on or was responsible for a data breach. If I start
             | getting spam to <my name>+hulu@gmail.com, then I know I
             | could chase down Hulu on Twitter for an explanation.
        
             | rootusrootus wrote:
             | > So I'll use amazon@example.com when signing up at Amazon
             | 
             | I go a little farther. I figure an attentive spammer might
             | figure out that if I use amazon@johnsmith.net to sign up
             | for Amazon, I may have exactly the scheme where
             | *@johnsmith.net will work, so they can just add that to the
             | spam list as a wildcard and pick a new address every time.
             | So instead, I use john101@johnsmith.net, john102, john103,
             | etc, to try and obscure my strategy and prolong the life of
             | the domain forwarding.
        
               | licebmi__at__ wrote:
               | I kinda imagine that spammer go for low hanging fruit. So
               | spammers won't bother with defeating a catchall domain
               | forwarding, as it's unlikely to give them returns.
               | Although a motivated attacker might decide to try to send
               | interesting phishing.
        
               | willcipriano wrote:
               | I just have a entire domain for the purposes of spam.
               | Anything sent to there ends up in my bulk folder. I use
               | amazon@domain.com so I can tell who sells my email or
               | gets hacked. Never noticed someone trying to send a email
               | to any addresses I haven't previously used.
        
               | wastholm wrote:
               | > Never noticed someone trying to send a email to any
               | addresses I haven't previously used.
               | 
               | At least a few years ago, I noticed a lot of spam to
               | <random first name>@<my domain> -- i.e., completely made-
               | up addresses that I had never used. Since messages sent
               | to those addresses were guaranteed to be spam, I started
               | treating them as free training data for the spam filter.
               | 
               | I don't know if this still happens, though, because I
               | haven't looked.
        
               | Grollicus wrote:
               | This is currently happening to my email domain. Gets
               | rejected as it doesn't have a valid hash (recipient
               | name), but the logfiles are full of <3
               | letters>@mydomain.com and <english_word>@mydomain.com
               | rejections.
        
               | klyrs wrote:
               | Yeah, this is an age-old issue -- in the early 00s, my
               | mom got a domain and used the email
               | <first_initial>@<domain>.com. She gave up battling the
               | deluge of spam after about a year. We looked through the
               | logs, and saw that her next choice of handle was also
               | getting tons of spam, too, because it was also short.
        
               | batch12 wrote:
               | I do the same thing. I use whatever@domain.email. The
               | addresses are temporary if I want them to be and I can
               | automatically lock the senders to a list that is either
               | automatically learned after x days or manually curated.
               | I've seen some 'marketing' mail get filtered but no hacks
               | yet.
        
               | cgriswald wrote:
               | I've got amazon@domain.com email for my domain and I've
               | never created such an account, much less given it out.
               | Without some uniqueness in the username, I'm not sure you
               | can tell a company sold or lost your data.
        
               | bcrosby95 wrote:
               | Spamming is a numbers game. I kinda doubt enough people
               | are using this scheme to make figuring this out
               | worthwhile for a spammer.
        
               | nerdponx wrote:
               | I've wondered about this with big companies like
               | Facebook, Google, Amazon, etc. as well as behind-the-
               | scenes spyware/ad firms who are all probably very
               | interested in linking my identity across user accounts,
               | email addresses, device fingerprints, etc. I've hoped
               | that there aren't enough people doing it (yet) for these
               | orgs to find it worth the effort.
        
               | rootusrootus wrote:
               | Given the sheer amount of money involved, I believe it is
               | likely that there are players in the market who are far
               | more capable than we give them credit for.
        
               | macNchz wrote:
               | There very much are companies doing this and selling it
               | as a service...here's an API that you can query with a
               | piece of contact information to retrieve all sorts of
               | additional information, including hashes of alternate
               | email addresses, mobile device ids, social media
               | profiles, and plenty of other stuff:
               | https://platform.fullcontact.com/docs/apis/enrich/person-
               | ins...
        
               | rootusrootus wrote:
               | At a certain point -- probably the moment it becomes a
               | business unto itself -- this kind of data collection
               | should be subject to all the same rules we've come up
               | with for credit bureaus. It should be a legal requirement
               | that I can get the entire profile they have built for me.
        
               | nerdponx wrote:
               | I was wondering specifically if they have special cases
               | to identify such "personal" email domains and use them
               | for record linkage.
               | 
               | It seems like an obvious thing to try, but maybe not
               | worth the effort of implementing it, given the high risk
               | of false positives and the low % of people who actually
               | do stuff like this (not to mention they're probably not
               | people who click on ads anyway).
        
               | typicalbender wrote:
               | Hard truth is you're not worth enough for a spammer to
               | look for that pattern, it's a numbers game and you're
               | just making it harder on yourself.
               | 
               | Also unless you're keeping a lookup table you're losing a
               | great benefit of the wildcard. You can, and I have caught
               | a few places, tell when a company sells your email. If I
               | get an email from company XYZ to my email abc@example.com
               | I know exactly who sold my email and to whom.
        
               | rootusrootus wrote:
               | I agree that I'm probably not worth the effort, but if
               | this kind of domain wildcard strategy were to become more
               | popular it is entirely feasible for a rudimentary machine
               | learning algorithm to detect its use.
               | 
               | > unless you're keeping a lookup table you're losing a
               | great benefit of the wildcard
               | 
               | That's true, I don't keep a lookup table per se, though I
               | do have a deleted items folder that I could look back in.
               | I'm not sure what I would do, though, if I knew what
               | particular company sold my email address? Send them a
               | nastygram they will just ignore? I just block the address
               | and move on.
        
           | j_wtf_all_taken wrote:
           | I don't really think that's the same. Forgetting the "+" in
           | the validation regular expression is something else than
           | refusing to implement all kinds of extra checks to support
           | very weird and very unused things.
        
           | [deleted]
        
           | ridaj wrote:
           | Why do so many people responding to this seem to assume the
           | plus sign is to fool spammers? Of course it's not useful for
           | antispam. It's mostly meant to make it easier to trace where
           | a (legit) email comes from, for example to set up filters.
           | https://gmail.googleblog.com/2008/03/2-hidden-ways-to-get-
           | mo...
        
           | jjav wrote:
           | > This logic is why so many Web sites today won't let you use
           | a plus sign in email addresses, which ruins a really nice
           | Gmail feature.
           | 
           | Contrary to popular belief, it is not a gmail feature.
           | 
           | I first heard of the + as destination filtering in the very
           | early 90s at CMU where it was broadly used. Every single
           | email address I've had since then has support the same (and
           | notably, apart from a test account, I've never used gmail
           | much, so that's not including gmail).
        
           | tyoma wrote:
           | The best are sites that let you sign up with a '+' but not
           | log in. Zappos used to be the most prominent example.
        
             | tshaddox wrote:
             | I've seen sites send emails where the unsubscribe link
             | doesn't work because the URL contains the email address I
             | signed up with and that email address contains a character
             | that their web server doesn't play well with.
        
             | scubbo wrote:
             | I once had a site _silently strip_ the + from signup email.
             | So when I submitted `myname+yoursite@gmail.com` as my email
             | address, they started sending mail to
             | `mynameyoursite@gmail.com`. Madness.
        
               | not2b wrote:
               | This is common; spammers know the semantics of '+' for
               | gmail and will strip it. You need to assume that it will
               | happen.
        
               | cgriswald wrote:
               | GP said the site stripped the "+" only, essentially
               | sending his email to another address entirely. Spammers
               | strip the "+" and whatever follows it, so the spam ends
               | up at the same address.
        
             | dangoldin wrote:
             | Interesting. How did that work? Does that mean that they
             | would only create the user account under the + suffix? I
             | imagine they must have had two email fields - the canonical
             | email for login and then a separate notification email?
        
             | nybble41 wrote:
             | I've run across at least one _banking_ site which accepted
             | a password on the sign-up page which was later rejected by
             | the login page. The validation scripts on the login page
             | used a more limited set of permissible special characters
             | which didn 't include parentheses. Fortunately it was only
             | a client-side check, so it was relatively simple to bypass
             | it once using developer tools and change the password.
        
               | wtetzner wrote:
               | Why would you ever validate the characters of a password
               | on the login page? What a weird thing to do.
        
               | marcod wrote:
               | American Express at one point let me set a password over
               | 8 characters, but logging in after only worked if I
               | provided only the first 8.
        
               | lozaning wrote:
               | At one point I know they also weren't case sensitive.
        
             | PebblesRox wrote:
             | Reminds me of a patio11 post (which I haven't been able to
             | track down) where he said he gets people signing up with a
             | '+' but then forgetting to include the extra part when they
             | log in later. His login code accepts both versions and
             | increments a counter to track how many people were too
             | smart for their own good.
        
           | raffijacobs wrote:
           | Can't you use "." Anywhere in your email to use the same
           | multiple times in Gmail?
        
             | josephcsible wrote:
             | Yes, assuming the websites following this logic don't block
             | that too, but then you have to keep track of a mapping of
             | dots to websites yourself instead of it being obvious from
             | what you put after the plus sign.
        
               | alfon wrote:
               | Or use a password manager.
        
           | gumby wrote:
           | I configured my mail server to use _ as a sub mailbox
           | identifier to stop creeps who block +. I assume they are
           | doing it to make sure their precious spam shows up in my
           | inbox.
        
           | toxik wrote:
           | OTOH, it being a standardized thing, a spammer would
           | absolutely just strip that plus part off. Better do it
           | secretly like a catch-all.
        
             | gxnxcxcx wrote:
             | I think that when using email aliases to identify spam
             | sources, the crucial part is that you can filter the
             | stripped address (as well as any unapproved alias) to be
             | directly identified as spam and then the +alias part
             | becomes a key to properly get into the inbox.
             | 
             | That whole setup for tidiness is broken the moment a
             | desired website does not accept an alias in your address,
             | of course.
        
             | nybble41 wrote:
             | It's not really all that standardized. The use of a '+'
             | character to indicate an alias or label is merely
             | convention--if you run your own server you can set the
             | separator to any character you wish, or disable the feature
             | altogether. As far as the RFCs are concerned the '+'
             | character is just part of the account name and there is no
             | reason why it cannot be a _mandatory_ part of the account
             | name on any particular server, such that stripping off the
             | '+' and any trailing characters results in an invalid
             | e-mail address, or even someone else's e-mail account. For
             | sending email or using an email address as an account
             | identifier it's definitely incorrect to treat
             | abc+xyz@example.com and abc@example.com as equivalent. The
             | same goes for account names which differ only in
             | capitalization or placement of periods: some servers are
             | case-insensitive and ignore periods in account names (e.g.
             | Google) but these are server-specific traits and compliant
             | email senders should not assume that every server will work
             | the same way.
             | 
             | The '+' alias feature is a fairly common configuration,
             | though, so for source labels it's better to either treat
             | all unlabeled messages as spam or else use a more opaque
             | labeling scheme (unique-hash@example.com) which doesn't
             | hint at an alternative untracked email address.
        
               | stonogo wrote:
               | Subaddressing is standardized in RFC 5233.
        
               | nybble41 wrote:
               | For the Sieve Email Filtering Language, yes. Which is not
               | actually part of SMTP. And even in RFC 5233 the specific
               | separator sequence is up to the server; the RFC only
               | specifies queries for ":user", ":detail", and
               | ":localpart" to filter on the different fields
               | independent of the choice of separator.
        
           | alkonaut wrote:
           | Sites likely prefer your canonical/standard email address
           | over any plus version. It would be easy to trim anything
           | after the plus too I guess and just email you at your normal
           | address
        
           | mderazon wrote:
           | Spammers aside, I'm interested to know what strategy
           | different saas companies do in regards to users creating an
           | account with + alias - Do you let users create multiple
           | accounts with the same email but different + alias ? Or do
           | you recognize that it's an alias and say that the account
           | already exists ?
           | 
           | Not all email providers support the + notion so you'd have to
           | run domain lookup on some hard coded list
        
         | markonen wrote:
         | You absolutely should check the MX records, though. It's easy
         | and catches tons of typos. I was floored by the difference when
         | I implemented this as pre-check before a Stripe Checkout form.
        
         | throwaway09223 wrote:
         | How do you reconcile your concern for the cost of sending
         | emails with your unwillingness to do super basic validation
         | like checking an MX record?
        
           | nawgz wrote:
           | From where I sit, both of those concerns sit on the same side
           | of fence. GP argues against extensive developer time spent on
           | validating edge-case emails, and says they do so in no small
           | part to avoid having emails bounce etc., as doing MX or other
           | validation to follow-up on these edge-case emails validity
           | within your service does nothing to imply others have put in
           | this same costly and nearly superfluous support, likely
           | leading to more emails bouncing and accordingly degrading the
           | trust in their business as a sender
        
         | jcelerier wrote:
         | > Sending emails cost money and bouncing emails affects your
         | sender reputation.
         | 
         | that works as long as <RFC>fan 69(tm)@root does not write
         | articles for ZDNet
        
         | vorpalhex wrote:
         | My email address is valid and has been valid for a really long
         | time.. but about 5% of ecommerce shops refuse to accept it.. so
         | they don't get my money.
         | 
         | Don't get clever, just follow the spec.
        
           | skeeter2020 wrote:
           | >> Don't get clever, just follow the spec.
           | 
           | I'd suggest being clever is wasting countless hours to handle
           | your edge case. Or writing your own email validation in the
           | first place.
        
             | shard wrote:
             | > wasting countless hours
             | 
             | Isn't email validation a solved problem in that there are
             | services or ready software which provide RFC-compliant
             | validation? If some company is wasting countless hours to
             | do something because of Not Invented Here syndrome, isn't
             | that the same as some company deciding to write
             | cryptography algorithms on their own and reaping what they
             | sow?
        
           | alkonaut wrote:
           | Your money is likely a minuscule part of the revenue and
           | supporting your email would likely cost more. This was the
           | point, that it _is_ probably clever to choose a validation
           | that covers 99.99% of customer emails rather than cover the
           | whole spec.
        
           | jchw wrote:
           | If you can show that "just follow the spec" ends up opening
           | up more opportunity than it closes off, then you can convince
           | people. However, when gmail, outlook, etc. do not allow these
           | zany e-mail addresses, you're going to have a hella hard time
           | convincing me of this unless you are in the 1% of spenders.
        
             | BenjiWiebe wrote:
             | Do GMail et al actually prevent you from sending to and
             | receiving from these zany addresses? Or merely prevent you
             | from creating one @gmail.com?
        
               | jchw wrote:
               | Creating one. But when you consider just how many
               | customers are using gmail and outlook addresses, and not
               | to mention, GSuite/fastmail/etc. addresses under custom
               | domains, it makes more sense why rejecting
               | @gmail.com@gmail.com is worth more than allowing some
               | crazy e-mail feature that is effectively not used.
        
               | not2b wrote:
               | The routing features are obsolete; they go back to the
               | days when lots of email users weren't on the Internet
               | directly and had to use relays. They are still in the
               | spec, yes.
        
               | jchw wrote:
               | I assume it comes from similar lineage as UUCP paths.
               | Either way, email standards are a bit ridiculous. It
               | needs the kind of rehaul that occurred with HTML5 of
               | looking at what email implementations actually do and
               | pushing them in one direction. I suspect that is not
               | happening ever, so failing that there will probably
               | always be things in the spec that just simply don't work
               | across everything anymore.
        
           | lisper wrote:
           | > about 5% of ecommerce shops refuse to accept it
           | 
           | That's surprising to me because there is nothing particularly
           | weird about your email address. What exactly do they complain
           | about?
        
             | mixmastamyk wrote:
             | Quotes included or not?
        
               | lisper wrote:
               | Not. Obviously, or the rejection ratio would be a lot
               | higher than 5%.
        
             | brlcad wrote:
             | I would assume because it's only 2-chars (me) and they're
             | filtering anything <3 as invalid.
        
               | lisper wrote:
               | Yeah, that's what I would guess as well. But there's a
               | big difference between "follow the [ridiculously
               | complicated] spec to the letter" and "don't do obviously
               | stupid things like filter out email addresses with short
               | names". The latter is good advice, the former not so much
               | IMHO.
        
               | Domenic_S wrote:
               | For a couple glorious years I had a 2-letter email
               | address at a single-letter .com domain. It was rejected a
               | surprisingly small number of times.
        
               | isoskeles wrote:
               | Ah, that's a good assumption. My initial assumption was
               | some sites have a very dumb whitelist of valid email
               | domains. This seems more reasonable (although, also
               | dumb).
        
           | yupper32 wrote:
           | If 5% of ecommerce shops refuse to accept it, it's likely you
           | being clever.
           | 
           | My email is refused by 0% of ecommerce shops... because I
           | just have a normal email.
           | 
           | Don't be clever, pick a better email.
        
             | woah wrote:
             | If you aren't accepting very normal email addresses at
             | perfectly valid TLDs, then you are a bad programmer. At
             | least import a list of the new TLDs every ten years.
        
               | yupper32 wrote:
               | Of course they're a bad programmer. But we live in real
               | life, where bad programmers exist.
               | 
               | Get a big brand .com email and you'll never run into an
               | issue.
        
             | drdaeman wrote:
             | What's "normal", though?
             | "<8-10latinalphanumerics>@gmail.com?"
             | 
             | My email is just "me@<my-last-name>.al"[1] which is just a
             | tiny bit "unusual" - and over the years it got refused by a
             | couple stores because of TLD. And Albania is not Cocos
             | Islands, they're surely not popular with spammers.
             | 
             | If a store believes there's only ".com" gTLD and nothing
             | else (this had really happened to me, some galaxy-brain
             | made a form with a hardcoded ".com" suffix; not even ".net"
             | or ".org" were accepted, unfortunately I don't remember the
             | site) - well, fuck that store, their loss not mine. Worst
             | case, if I really want something they sell, I'll give them
             | a throwaway email - which will contribute to their mail
             | bounces after some time.
             | 
             | __________
             | 
             | [1] ".al" is a ccTLD for Albania which is not a country of
             | my citizenship or residence. I've picked the domain name as
             | hack - because my first name is Aleksei and my first and
             | middle names form "A.L." initials as well. That, and
             | because all relevant .name domains were already taken.
        
               | yupper32 wrote:
               | Might sound strange but yes, me@<my-last-name>.al _is_
               | being clever. You found a nice short clean email by
               | buying a domain from Albania and setting up a me@
               | address. That 's clever.
               | 
               | Think about it this way: either you can get some big
               | brand .com email with no special username and never have
               | an issue, or you can flail around 5% of the time and yell
               | at the clouds.
               | 
               | Should everyone accept your email? Of course! I'm just
               | saying you live in real life, and in real life people
               | suck at building email forms. The problems you run into
               | are on you.
        
               | oarsinsync wrote:
               | > Should everyone accept your email? Of course! I'm just
               | saying you live in real life, and in real life people
               | suck at building email forms. The problems you run into
               | are on you.
               | 
               | No, the problems they run into are caused by (at best)
               | mediocre developers. They're entirely to blame. We have
               | specs and standards for a reason.
        
               | yupper32 wrote:
               | I honestly don't understand what you're trying to say.
               | What's actionable about your view? You going to call up
               | every business that doesn't accept your email and tell
               | them their programmers suck? Businesses like this are
               | never going away. It's a losing battle.
               | 
               | Instead you can just get a big name .com email and call
               | it a day. Live your life without trying to make some
               | statement about email standards.
        
         | unoti wrote:
         | Totally agree with this. Trying to be perfect is a good road to
         | paralysis and not getting things done. Software is like people:
         | it's ok to not be perfect, especially if they're always trying
         | hard to be better and doing good things for society.
        
         | jcranmer wrote:
         | The basic rule of thumb I use this: are you implementing email
         | at the MTA level (needing to build/parse RFC 5321 commands or
         | RFC 5322 blobs directly), or are you using email closer to a
         | "universal internet ID" purpose (i.e., application
         | perspective)?
         | 
         | If you are in the former category, then yes, follow the spec to
         | the letter. If you're in the latter, then screw the precise
         | guidelines of the spec and reject emails that are very unlikely
         | to be valid: no quoted localparts, no IP address literals. In
         | addition, go ahead and say that email is case-insensitive (more
         | precisely, case-preserving).
         | 
         | The hard part is if you're writing an email client, because
         | you're basically forced to have your hands in both pies.
        
         | WindyLakeReturn wrote:
         | It depends upon where you are validating email input at.
         | 
         | For the initial email input, your logic works fine. Once it is
         | applied downstream in a process, it begins to get messy.
         | Someone might do an incorrect email validation that happens to
         | block emails that you have already accepted or which you are
         | importing from a valid source. Someone has already given the
         | example of a login field not allowing them to use the email
         | they signed up with. If such upgrades occur later in a projects
         | life cycle, not only might you have to spend developer's time,
         | you may also have a production outage.
         | 
         | Personally, I suggest using some, even if imperfect, validation
         | when gathering the email initially (for the reasons you point
         | out) and then not validating that information any further.
        
           | paulmd wrote:
           | I actually run into this all the time with passwords using a
           | password manager. Lots of places will accept the _creation_
           | of a password that 's long/complex/etc but then when you
           | actually try to log in with it it won't accept a long
           | password, won't accept certain characters, will silently
           | truncate it and throw an invalid password error, etc.
           | 
           | Sometimes disabling Javascript will fix it, sometimes not. I
           | occasionally have resort to using "I forgot my password"
           | until I figure out what the actual underlying requirements of
           | the passwords are.
        
             | CodeMage wrote:
             | As a user, I got burned by that several times. Now, when I
             | create a new account somewhere, the first thing I do is log
             | out and try to log back in.
        
             | zerd wrote:
             | Etrade lets you create 32 character password, but if you
             | enable 2FA you suddenly can't login because apparently they
             | concatenate them together and then check the length. So
             | make sure your password is max 26 characters. (they
             | might've fixed this but I haven't tried).
        
               | sbierwagen wrote:
               | Like GP mentioned, Etrade also does the thing where it
               | accepts the . character on password creation, but not
               | login. That was fun to figure out.
        
               | hsbauauvhabzb wrote:
               | Curious, can you login with 26 characters and your MFA
               | seed to bypass MFA entirely?
        
             | feanaro wrote:
             | I don't encounter this very often myself. So far the only
             | place I've seen this is Paypal. _facepalm_
        
             | lcuff wrote:
             | Yup! Same thing with the ridiculous verify-my-identify
             | questions. One I encounter all the time is the local
             | community college, which let me use spaces in my answers on
             | creation, but not at entry time. Grrrr.
        
           | novok wrote:
           | I've run into this with labcorp. Their desktop webapp takes
           | subdomain emails, but their mobile iOS health webpage login
           | thinks a subdomain email is invalid and disables the login
           | button. They also don't let you change your account email so
           | you can never really fix this issue properly.
        
         | scotu wrote:
         | I found websites not allowing perfectly valid tlds, so maybe
         | they could be starting not using .com in their regex. (.email)
        
         | paulmd wrote:
         | This sounds great but what you think is "common" probably
         | isn't.
         | 
         | When I was validating myself for Amazon Prime Student, I
         | literally had Amazon refuse to accept my student email in the
         | form first.m.last@myschool.edu because there were two '.'s in
         | the mailbox portion. I had to send an email to support and it
         | was eventually dutifully fixed.
         | 
         | And that's not an uncommon format for, you know, _school
         | emails_. And that 's an Amazon engineer who should have known.
         | 
         | I imagine there's developers who think "domain.tld" is the only
         | thing valid to put in the domain portion, and that's going to
         | fail with "domain.co.uk", or uncommon TLDs, or other perfectly
         | valid constructs. And sure "it's only x% of the users" but it's
         | a pain in the ass if you're that user. You need to be
         | reasonably permissive.
         | 
         | (but on the other hand "myname@..." is not valid either, and
         | that will fail and cost you money as well... hence leading us
         | back to 'just follow the spec')
        
         | the_arun wrote:
         | Instead of every developer implementing validation logic,
         | shouldn't we have validation libraries to take care of this?
        
         | NikolaNovak wrote:
         | I get your point, but it ends up pretty arbitrary who picks up
         | what part of spec to implement / which part of spec they deem
         | "common sense".
         | 
         | e.g. It drives me BONKERS how many systems absolutely reject my
         | single-letter email (~"N@domain.com"), which I created
         | specifically to make it easy and safe to type on mobile devices
         | etc. Others will reject the "+" sign, or underscore, or
         | dot/period, or (brilliantly) two periods or underscors, etc etc
         | etc :=/
        
           | Strom wrote:
           | There are also blacklists for names you can use. My real
           | e-mail is _admin@myname.com_ but Facebook doesn 't allow me
           | to use that e-mail, warning me that only personal e-mails are
           | allowed. Paradoxically I ended up using my work e-mail to get
           | around the restriction.
        
           | hwbehrens wrote:
           | My email address ends with the .cc TLD, and the number of
           | websites which say "Did you mean to type .ca?" and then
           | _refuse to let me continue_ without changing it drives me
           | similarly batty.
        
         | [deleted]
        
         | forty wrote:
         | 100% agree. This is especially true if the address mail is
         | going to be displayed somewhere for example, it's generally a
         | good idea to limit email address to a sunset of what the RFC
         | allows.
         | 
         | To adapt from a famous quote: "all email validation logics are
         | wrong, but some of them are useful" ;)
        
         | goto11 wrote:
         | But why restrict the syntax arbitrarily in the first place? It
         | is not going to catch the common typos anyway. Most typos will
         | just result in a wrong but still syntactically valid email
         | address.
        
           | harryf wrote:
           | I've always wondered if it's possible to have a valid email
           | address which is also an SQL injection attack, XSS or similar
           | ?
        
         | novok wrote:
         | I've run into some places where a subdomain email is not ok,
         | which has been pretty annoying. All email validators should be
         | able to at least take first.last+company@subdomain.example.com
        
           | macksd wrote:
           | Especially when using a country TLD, suffixes like .co.za are
           | appended to the name of the actual ISP or email provider.
        
         | eli wrote:
         | There's also incredibly low stakes in allowing a technically-
         | invalid email address to pass validation. Just use a very
         | permissive pattern (e.g. contains an '@') and be done with it.
         | 
         | No matter what you will constantly be getting addresses that
         | conform to the spec but cannot actually receive mail.
        
       | fencepost wrote:
       | Hah, even beyond the question of address format variations there
       | are also commercial services that do some level of email address
       | validation - and one of them regards my business email address as
       | invalid (firstname@companyname.com) - adding another letter
       | works, adding punctuation works, it's just my specific first
       | name.
       | 
       | Unfortunately it's done via black box on their server(s), so it's
       | not like I can even dig through the code and figure out what's
       | going wrong.
        
       | devfatigue wrote:
       | What if we flipped email validation around and made the users
       | email a one time code to validate their email?
        
       | roachpepe wrote:
       | Not really an email issue so sorry if maybe somewhat off topic
       | but on the subject of validation I can't not bring this up -
       | this, remember the guy whose name was Null?
       | https://www.wired.com/2015/11/null/
        
       | mLuby wrote:
       | Validation errors are common, but warnings are not.
       | 
       | I'd like to see more of "Patterns like [what you entered] are
       | uncommon--are you sure?" instead of "Patterns like [what you
       | entered] are not allowed--change it to proceed."
        
         | richeyryan wrote:
         | I recently implemented this using the great Mailcheck library.
         | So if someone types "gnail.com" or "gmail.con" it detects it
         | and we can show "Did you mean gmail.com?". If someone ignores
         | the suggestion, fair enough. If someone purposely wants to give
         | us a junk email, fair enough. At least we're not frustrating
         | them needlessly.
         | 
         | https://github.com/mailcheck/mailcheck
        
         | zzo38computer wrote:
         | Mostly, yes. However, some things should probably still be
         | prohibited, such as:
         | 
         | - An email address ending with ".invalid", unless invalid email
         | addresses are supposed to be allowed (which in some cases is
         | useful, but you can then disable sending email to such an
         | address, using it only for identification). (I do use such an
         | email address for identification on NNTP.)
         | 
         | - Email addresses without at least one at sign.
         | 
         | - Email addresses containing control characters (at least ASCII
         | control characters).
         | 
         | - If the domain name does not resolve or resolves to a loopback
         | address or LAN address (except for some specialized cases where
         | such a thing is desirable). The same is true for literal IP
         | addresses; if it is a loopback or LAN address then it should be
         | disallowed, but otherwise it can be allowed.
        
       | high_byte wrote:
       | "How to Hack Things with These 13 Simple Tricks"
        
       | mro_name wrote:
       | Actually email validation is simple: do the opt-in.
       | 
       | If confirmed, it's valid.
        
       | user3939382 wrote:
       | TFA won't load for me, but I'd like to make a short PSA: RFC
       | 5322.
       | 
       | Lookin' at you, Walgreens.
        
       | sparrc wrote:
       | I have my own custom email domain (sparr.email) that fails
       | validation surprisingly often.
        
       | Wronnay wrote:
       | https://web.archive.org/web/20210408080002/https://www.netme...
        
       | mooreds wrote:
       | Hah, just went over the email address validation logic in our app
       | last week because a client asked.
       | 
       | Turns out we do minimal validation (make sure there is a local
       | and a domain, that there are not two periods next to each other,
       | and a few other things) but what we really rely on is
       | deliverability.
       | 
       | In other words, if your email needs to be verified, we'll try to
       | send an email to the address you provide. If the link is clicked
       | (or the code entered), that's good enough for us.
       | 
       | Applications using our service (we're an auth provider) can
       | decide for themselves if they need email address validation. It's
       | a boolean flag on the user object. If they do, they can use the
       | functionality we provide to ensure it.
        
       | slavik81 wrote:
       | I created myusername@hotmail.com in 1999, then immediately lost
       | the password. I couldn't recover my account or use the same name
       | again, so I created myusername_@hotmail.com.
       | 
       | In the twenty-two years that have followed, the only website that
       | has had a problem with my email address is Chapters Indigo, which
       | explicitly rejects it as invalid.
       | 
       | For email validation, keeping it simple is best.
        
       | arkitaip wrote:
       | Most of these are just overcomplicating validation. What really
       | matters is account verification, i.e. sending an email to the
       | specified email address in order to verify its authenticity
       | before sending any kind of email (transactional, marketing) to
       | the account.
       | 
       | At this point, not doing email verification should be considered
       | a dark pattern because it causes so much trouble when people's
       | email addresses are used without their permission.
        
         | delecti wrote:
         | And "permission" isn't even the only issue. Months of Doordash
         | account emails were lost to the ether because I made a typo
         | (gmail.lcom) in my personal email, and it was basically
         | impossible to change the email on an account (their SMS
         | verification seems broken). It does explain why I never got
         | order confirmations though, that had seemed odd.
        
           | jerf wrote:
           | It seems to be a common antipattern for somewhat smaller
           | sites to make the email address the primary key on the
           | account too, and then it's virtually impossible to ever
           | change it after that. As you scale up it becomes impossible
           | to ignore the fact that people change email addresses
           | sometimes, but I've lost track of the number of smaller sites
           | that assume it's a safe primary key.
        
             | cperciva wrote:
             | _raises hand_
             | 
             | I can confirm that this is a very stupid mistake to make.
             | :-(
        
       | dokem wrote:
       | Sorry no, your email address is wrong. Make a new one.
        
       | bcrl wrote:
       | I think the best email address I ever knew of was n@ai .
       | Unfortunately, the .ai TLD eventually decided it wasn't a good
       | idea to have an MX recorded resolving on the TLD.
        
         | arkitaip wrote:
         | It's cute but doomed to fail because it goes against everything
         | most people understand about email addresses.
        
       | [deleted]
        
       | [deleted]
        
       | bilater wrote:
       | Unpopular opinion: Just ignore these edge cases and focus on the
       | 99.99% of the sane population that doesn't get off on having a
       | weird email address.
       | 
       | More generally: If an edge case exists and has nothing to do with
       | accessibility (it was caused by a user having a different
       | workflow like needing a screen reader or being in a less
       | developed part of the world with slow internet) then you should
       | dismiss them and not make your code/life unnecessarily
       | complicated.
        
         | jillesvangurp wrote:
         | The only relevant email validation is verifying a user can
         | click on a link or enter some code sent to them to the address
         | specified by them. Without verifying ownership, the email
         | address is worthless as an identifier so making it conform to
         | some syntax is not that relevant and you should not use
         | unverified email addresses as identifiers (identity theft is a
         | thing).
         | 
         | Obsessing about regular expressions for these addresses is
         | generally a waste of time except for maybe preventing a lot of
         | failed attempts to send stuff to a clearly invalid email
         | address. A simple string contains '@' is probably good enough
         | for that. Worst case the email address does not work and you
         | discard the entered information after some reasonable time
         | frame. The user has the option to try again and do a better job
         | of typing their email address.
        
       ___________________________________________________________________
       (page generated 2021-05-24 23:01 UTC)