https://lwn.net/SubscriberLink/1000485/670ef0045e5e8a3e/ LWN.net Logo LWN .net News from the source LWN * Content + Weekly Edition + Archives + Search + Kernel + Security + Events calendar + Unread comments + ------------------------------------------------------------- + LWN FAQ + Write for us User: [ ] Password: [ ] [Log in] | [Subscribe] | [Register] Subscribe / Log in / New account Debian opens a can of username worms [LWN subscriber-only content] Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net! By Joe Brockmeier December 5, 2024 It has long been said that naming things is one of the hard things to do in computer science. That may be so, but it pales in comparison to the challenge of handling usernames properly in applications. This is especially true when multiple applications are involved, and they are all supposed to agree on what characters are, and are not, allowed. The Debian project is facing that problem right now, as two user-creation utilities disagreed about which names are allowable. A plan is in place to sort this out before the release of Debian 13 ("trixie") sometime next year. The useradd utility is part of the shadow-utils project, which includes programs for managing user and group accounts. The shadow-utils suite is included in Debian's passwd package. For historical reasons, and to avoid confusion with the upstream project, Debian's version of the shadow-utils sources are often referred to as "src:shadow". Most Debian users don't work with useradd, or groupadd, directly. Instead, Debian has long supplied its own adduser (and addgroup) utilities, originally written by founder Ian Murdock. These act as simpler front ends to useradd and use Debian-supplied system defaults for creating users' home directories and configurations. It should be noted that useradd, et al., have become much more full-featured since Debian's utilities were introduced, but the project continues to maintain them nonetheless. Little Bobby Tables In June, Debian developer and src:shadow maintainer Chris Hofstaedtler filed a bug against the adduser package. The src:shadow package had dropped a Debian-specific patch, originally introduced in 2003 by Karl Ramm, to allow characters far beyond what were allowed by the upstream shadow-utils project. In the patch, Ramm wrote: I can't come up with a good justification as to why characters other than ':'s and '\0's should be disallowed in group and usernames (other than '-' as the leading character). Thus, the maintenance tools don't anymore. Hofstaedtler said that he had puzzled out some of the patch's purpose from old bug reports that had been "fixed" by the patch, and those asked for two things not allowed by the upstream shadow-utils: usernames with upper-case characters or that are purely numeric. Hofstaedtler said that upper-case names had been allowed in the upstream shadow-utils project "`a long time ago'", but it seemed like a bad idea to allow purely numeric usernames. The patch enabled much more than upper-case and purely numeric names, though. With the patch dropped in version 1:4.15.2-2 of the shadow source package, one of adduser's tests--which explicitly allowed a username reminiscent of a famous xkcd comic ("`bob;>/hacked'")--had failed: For src:shadow, I would really like to not have a divergence from upstream in this regard. I think if we have clear requirements then we (I) can submit them upstream and I would expect upstream to accept patches. I do feel that making the case for "bob;>/hacked" would be very hard. Hofstaedtler said that the patch had been reapplied for the time being, it was included again in version 1:4.15.2-3, but he asked if username requirements could be sorted out in time for the Debian "trixie" release. If the patch were dropped entirely, then useradd would restrict usernames to the POSIX standard, with the exception of allowing a "$" character at the end of a username Debian developer and adduser maintainer Marc Haber replied in late October that other tests were failing as well, and thought that " `useradd upstream is being too picky here'". Since adduser depends on useradd it could not create users that useradd would reject, he said he would like to synchronize on what would be allowed or not. As part of the research into what should be allowed in usernames, Haber took over Debian's UserAccounts wiki page, which outlines Debian's username tools and policies, and started looking into whether the project should relax its requirements around usernames. Limits on usernames One of the questions that bubbles up when looking at usernames is not just allowable characters, but the allowable length of the username. The documentation for shadow-utils does not specify a length for usernames or what encoding is being used. However, the POSIX standard says that usernames should not include non-ASCII characters to be portable between systems. The standard says that usernames should be "`composed of characters from the portable filename character set'". That set is comprised of numbers 0 through 9, upper-case and lower-case "a" through "z", the period (.), underscore (_), and hyphen (-). It also specifies that usernames should not begin with a hyphen. It is, however, possible to assign characters outside that set with the tools at hand. But Linux distributions usually put up some guardrails in the adduser and useradd configurations to prevent administrators from creating usernames with non-ASCII characters unintentionally. These configurations can be overridden with adduser's --allow-bad-names option or useradd's --badname option. In November, Haber posted a message on debian-devel that he had " `opened an especially nasty can of worms'" and was finding that things were more complicated than he had understood. He sought input and opinions on a number of questions about whether Debian should allow non-ASCII characters for usernames, how to do that if so, and if it was more appropriate to document username guidance in Debian's Policy Manual rather than its wiki. His suggestion was to allow UTF-8 for regular user accounts, but to restrict to ASCII for system accounts created by Debian packages. Richard Lewis asked if enabling UTF-8 would open the door to "`some of the abuse described'" in a 2021 LWN article about flaws in Unicode handling that led to security exploits. He said that it seemed to be a bad idea to make the change, even if it would be nicer for users to have the option. Haber said that he was not sure if it would be dangerous to allow UTF-8 usernames, "`since we can expect other commands to gracefully handle a byte stream, can't we?'" Additionally, local administrators already can loosen restrictions to allow UTF-8 usernames, but Debian does not test for such use cases. Debian would become "`more robust'" if it assumed UTF-8 characters would be used in usernames. " `Vulnerabilities that could be exploited by having non-ascii user names are already here and present today, just not uncovered yet.'" It would be reasonable, Timo Rohling said, to mitigate possible homograph attacks by disallowing mixed alphabets "`such as cyrillic and latin letters in the same name'". Haber said that was not going to help if a user could directly write to /etc/passwd, and he was unwilling to implement that himself in adduser. He would accept code and test cases written by others, though. Keyboards Security concerns aside, there are other practical problems with supporting non-ASCII usernames. Etienne Mollier noted that he had " `one weird enough'" character in his first name that posed a problem if he had to log in using a keyboard layout that lacked the capability to transcribe the lower-case or upper-case 'e' acute characters ("e" or "E"). For that reason, he said, he felt better about keeping a full ASCII username and "`wouldn't feel strongly if unicode support for login never happens'". But it would be good if the gecos field of the passwd file had proper Unicode support to properly display users' real names. Not only was it difficult to type "e" on some keyboards, it could also be encoded in multiple ways. Gioele Barabucci pointed out that it could be "`e with acute'" which is encoded in UTF as U+00E9, or it could be "`e, combined with an [acute] accent'" which would be U+0065 plus U+0301: If a keyboard input system provides the former sequence of bytes, but the username is stored in the login infrastructure using the latter sequence of [bytes], then a naive comparison will not find the user "emollier" in the system. Unicode defines in Annex 15 a few normalization forms as a way to work around this problem. But a correct use of these normalization forms still requires coordination and standardization among all programs accessing the data. He asked if POSIX or other standards provided a normalization form for UTF-8 encoded usernames. Peter Pentchev responded that POSIX said to stick to the portable filename character set to ensure portability. Haber argued that it should be up to local admins to decide whether they wanted their local user database to be portable. "`I don't think that we should restrict local admins who don't need that kind of portability.'" Simon McVittie recommended that Debian consider adopting systemd's user name syntax and concepts of "strict mode" and "relaxed mode". The systemd tooling adheres to a strict naming convention when creating usernames, but it has a relaxed convention for accepting usernames created by other tools. McVittie said that seemed like a good principle for Debian to follow, even if its specific rules might differ from systemd's. Haber seemed to agree in part, but said systemd's strict mode was " `even stricter than what we currently allow for system accounts'", and he did not like that systemd's policies (especially with systemd-homed, which LWN covered recently) were not configurable. This time it's personal The discussion, perhaps not surprisingly, brought out some strong feelings about how names and usernames were represented. Especially when, as Hofstaedtler noted, usernames can be important to some users: I see and type my username hundreds times a day, people use it to address me in written and spoken conversations with it, etc. If it were my uid, which I see maybe once a week and don't have to remember, I wouldn't care. Indeed, it's not uncommon in open-source communities or within organizations to use a person's username rather than their given name--so it is unsurprising that some people feel strongly that usernames should be composed of a wider range of characters than POSIX recommends. Others dislike the practice of conflating usernames with real-world names, and see little reason to go to any trouble to go beyond ASCII. Johannes Schauer Marin Rodrigues supported allowing more than ASCII in usernames. He said it would be good for Debian to put pressure on other projects to provide Unicode support. "`We cannot find these kind of bugs if we accept translating everybody's given name to the American alphabet.'" Balint Reczey, though, asked that Debian avoid opening that can of worms and imposing needless work on upstreams. " `Keep what works reasonably well for decades.'" A plan Haber initially seemed bullish on allowing UTF-8 usernames in Debian "`as a courtesy to those people who need non-ascii user names to write their name'" and as an opportunity to find "`bugs that are already here'" in Debian's software. He acknowledged that it is late in the development cycle for trixie. But, since it was currently possible to create usernames with UTF-8 characters, he did not want to tighten restrictions in trixie versus Debian 12, only to revisit those restrictions for Debian 14. In a reply to Mollier he wondered about what advice to give in Debian's documentation "`once we have decided to officially allow UTF-8 login names'". On December 3, however, Haber said that he "`finally understood'" that UTF-8 support would require more than the ability to create an UTF-8 encoded username and write it to /etc/passwd. Homograph characters, such as U+00E9 (e) and U+0065 plus U+0301 (e), could be used with adduser to create two separate users with lookalike usernames: At the least, adduser should reject creating etienne if etienne already exists - those are different user names but look the same, and if you don't cut-and-paste user names instead of typing them you're bound to hit the wrong user depending on HOW you type and what input medium you use. Not good. Haber said that he was the only active developer working on adduser and did not have time to implement a check against lookalike usernames in time for the trixie release. Worse, he said, the Perl module that he would use (Unicode::Precis) was not packaged for Debian and had not had a release in more than five years. The next version of adduser, Haber said, would reject UTF-8 usernames by default. They would still be allowed when using the --allow-bad-names option, but he said he wanted to deprecate that option name in favor of something that doesn't use the word "bad". The --allow-all-names option will continue to pass everything verbatim to useradd. Mollier thanked Haber for his work on the problem, and suggested some alternatives to the bad names option. Barabucci also thanked Haber for taking the time to research the issue, to which Haber replied dryly, "`I have learned many things.'" Haber's current course of action for adduser seems the most prudent. There may be a day when it is more practical to expand the allowed characters for usernames, but the work required to do so right now is far greater than the benefits that users would gain in the process. [Send a free link] ----------------------------------------- [Log in] to post comments Anything but POSIX portable filename set with a conservative length restriction is dangerous Posted Dec 5, 2024 17:22 UTC (Thu) by isotopp (subscriber, #99763) [ Link] (5 responses) Login Names in Unix have always been very restricted, they are not just part of 'ls' output, but also are transported in some tar formats as owners, are automatically parts of mail adresses and have other, not cataloged use-cases. If you allow utf-8 here, and relax length restrictions, it is unclear and unknowable what will happen downstream with other applications. If you want to login as 'Kristian Kohntopp', it is probably useful to have an LDAP like name canonicalization mechanism that does a lookup to get a unix username and then tries the password with that. Anything else is very likely to break unexpected things. In my personal opinion, even a --badnames option is wrong. Or you go, and actually perform the work to define a username format for Unix (not just Linux), catalog use-cases and make sure that they actually work with full UTF-8, and whatever relaxed length limit you define. And then be prepared to handle a login with kurisu (kurisu) instead of kris. [Reply to this comment] Anything but POSIX portable filename set with a conservative length restriction is dangerous Posted Dec 6, 2024 9:33 UTC (Fri) by kleptog (subscriber, #1183) [ Link] (1 responses) I've gotten used to using --allow-bad-names all the time at my work because the default prohibits periods (".") and our company uses them in usernames as firstname.lastname. Though I've now checked the docs and apparently it's possible to change the allowed characters in the configuration file so maybe that's a better approach for ansible deployed machines. [Reply to this comment] Anything but POSIX portable filename set with a conservative length restriction is dangerous Posted Dec 6, 2024 19:28 UTC (Fri) by raven667 (subscriber, #5198) [ Link] A place I used to work was purchased by a company which had standardized on "Firstname Lastname" in their AD for username/ samAccountName so when we started joining the Unix (Mac/Linux) machines to their AD with winbind I was surprised with how few things actually broke, "/home/First Last/Documents" was OK in most GUI tools, but there were definitely things that broke. Eventually they did re-standardize on usernames without spaces, but I was surprised it worked at all. [Reply to this comment] Anything but POSIX portable filename set with a conservative length restriction is dangerous Posted Dec 6, 2024 10:21 UTC (Fri) by smurf (subscriber, #17840) [ Link] Heh. I had to troubleshoot a system with a UTF8ified username last year. Let me tell you, the number of programs that format columns by counting bytes is, umm, truly staggering. Anyway. A little bit of safety should be in everybody's interest, i.e. no mixed-charset names, and use some normal form to check for existing usernames. Writing the above sentences is significantly easier than implementing them. While cyrillic vs. greek definitely is a problem, but latin vs. CJK? not so much IMHO. Normalize to exactly which normal form using which version of the Unicode standard? What do I do on the console, type \U4E52\U4E53 instead of Ping Pang ? what if my username is "420"? On the other hand ... I never type my username anyway. When logging in on the GUI I click on my avatar, when connecting to a remote system with SSH or whatever it's the default, and a fresh text-only console login is easy because there the username is "root". [Reply to this comment] Anything but POSIX portable filename set with a conservative length restriction is dangerous Posted Dec 6, 2024 19:14 UTC (Fri) by rgb (subscriber, #57129) [Link] (1 responses) > Not only was it difficult to type "e" on some keyboards, it could also be encoded in multiple ways. I think that says it all. Unicode is made to display text, not to create IDs. [Reply to this comment] Anything but POSIX portable filename set with a conservative length restriction is dangerous Posted Dec 6, 2024 19:35 UTC (Fri) by raven667 (subscriber, #5198) [ Link] I think you are right here and restricting the username to a limited subset of bytes that existing tools don't have any trouble interpreting and displaying makes sense, but the GECOS field definitely should be extended to support full UTF-8 encoded names as a courtesy and to be friendly to actual humans and their real written names they want to use. Having machine-readable username/uid/gid(s) distinct from a human display name makes sense and changes the requirements quite a bit where having two different encodings for "e" or "420" isn't really a problem that needs to be solved. [Reply to this comment] Doesn't the GECOS field already cover some of this use case? Posted Dec 5, 2024 18:52 UTC (Thu) by NYKevin (subscriber, #129325) [ Link] (4 responses) Maybe this is my Anglophone chauvinism speaking, but you can already set an arbitrary human-readable display name in the GECOS field, and most login GUIs prefer to display that name over the username when both are available. Is it really critical to allow non-ASCII characters in the username itself? How many people are trying to log into a command line environment *and* cannot type in ASCII? [Reply to this comment] Doesn't the GECOS field already cover some of this use case? Posted Dec 5, 2024 20:57 UTC (Thu) by zeha (subscriber, #61580) [Link ] (3 responses) > Maybe this is my Anglophone chauvinism speaking Yes. [Reply to this comment] Doesn't the GECOS field already cover some of this use case? Posted Dec 5, 2024 22:00 UTC (Thu) by Cyberax ( supporter , #52523) [Link] As someone speaking several languages with non-Latin alphabets, sometimes it makes sense to stick to ASCII. Otherwise, you're just setting yourself for a world of pain. Imagine entering Chinese text on a terminal in text mode. [Reply to this comment] Doesn't the GECOS field already cover some of this use case? Posted Dec 6, 2024 14:32 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses) I would say it's "yes" and "no", simultaneously. I have meet a lot of people who simply don't know English well enough to type name in ASCII. Unfortunately the majority of them I have meet when they cried on various forums about how unfair it is that they "have only just used Cyrillic (Arabic, Farsi, etc) name" - and now have so many broken programs they couldn't even count them all. Yes, it's deeply anglophonic, yes, it's unfair, true, people genuinely suffer if your force that on them... But the experience says that it's still better for them to lean 1 (one) English world (their account name) once then suffer through innumerable programs that don't support any other names properly. [Reply to this comment] Doesn't the GECOS field already cover some of this use case? Posted Dec 6, 2024 21:46 UTC (Fri) by epk (guest, #174765) [Link] I must sadly approve of this answer. And it's not as though a non-Latin-alphabet username would really help that much, since so much text - especially in path names and URLs - is in English. There is, however, the full name of each user, and I'm guessing that should be much easier to have non-Latin UTF-8 in. And for non-computer-literate users who need a lot of hand-holding, they might actually see mostly/only their full names. [Reply to this comment] Once upon a time in the past ... Posted Dec 5, 2024 19:11 UTC (Thu) by rweikusat2 (subscriber, # 117920) [Link] (4 responses) ... people used a weird text messaging system called email to send, well, text messages to users supposed to be delivered into their mailboxes at certain hosts. The email address of a user user on host host would be user@host. IOW, a unix username is also what RFC5322 calls the local-part of an email address. And this means ASCII, even according to the newest specification. https://datatracker.ietf.org/doc/html/rfc5322 [Reply to this comment] Once upon a time in the past ... Posted Dec 5, 2024 20:25 UTC (Thu) by storner (subscriber, #119) [ Link] (2 responses) I think you are mistaken about requiring ASCII for the name of the mailbox. The RFC you refer to says (section 3.4.1): The local-part portion is a domain-dependent string. In addresses, it is simply interpreted on the particular host as a name of a particular mailbox. "Domain-dependent" means that there are really no rules as to which characters can be used. It can even be quoted to allow whitespace. [Reply to this comment] Once upon a time in the past ... Posted Dec 5, 2024 21:02 UTC (Thu) by rweikusat2 (subscriber, # 117920) [Link] (1 responses) The grammar rule defining local-part, local-part = dot-atom / quoted-string / obs-local-part is at the beginning of the RFC page which contains the statement The local-part portion is a domain-dependent string. The claim that domain-dependent would mean "no requirements" is thus obviously wrong. dot-atom and quoted-string are defined in sections 3.2.3 ("Atom") and 3.2.4 ("Quoted Strings"). Drilling down to the actual character set specifications always ends with a subset of ASCII, the most liberal one being the one for quoted strings which includes whitespace and all printable characters, ie, codepoints 32 - 126. [Reply to this comment] Once upon a time in the past ... Posted Dec 5, 2024 21:04 UTC (Thu) by rweikusat2 (subscriber, # 117920) [Link] Slight correction: The quoted-string character set excludes \ and ". [Reply to this comment] Once upon a time in the past ... Posted Dec 6, 2024 4:38 UTC (Fri) by jheiss (subscriber, #62556) [ Link] RFC 6532 extends 5322 to allow UTF-8 in email addresses in message headers, and 6531 extends 5321 (SMTP) to allow UTF-8 in SMTP addressing. There are some draft documents in the IETF mailmaint working group which try to set some guidelines about mixed languages and other possibly confusing situations, but servers that implement 6531/6532 could allow any combination of UTF-8 characters in usernames. [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 5, 2024 19:19 UTC (Thu) by rhowe (subscriber, #102862) [ Link] (12 responses) Thinking about usernames in the OS more broadly, winbind (I think by default, but if not it's certainly one of the main options) generates usernames of the form "DOMAIN\username" and I know of at least one deployment which uses this. Now, these users do not exist in the passwd file and therefore aren't created via useradd or adduser so this isn't directly relevant to the issue being discussed here, but it is certainly legitimate for usernames to contain "funky" characters and indeed potentially problematic ones. For example, if something were to treat the backslash as an escape character then all sorts of fun could occur from injecting of newlines into logs to injection of null terminators. Inadequate quoting in shell scripts being a prime example. Also, both the domain and username portions are determined by the records in Windows' Active Directory and therefore need to follow the rules for that system. For the 'sAMAccountName' field, it's documented at https://learn.microsoft.com/en-us/windows/win32/ adschema/... where interestingly it's defined as a Unicode string but not containing any of: "/ \ [ ] : ; | = , + * ? < > The more modern userPrincipalName attribute is defined as following RFC822 which is not very helpful given the broad nature of that RFC: https://learn.microsoft.com/en-us/windows/win32/adschema/... [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 5, 2024 19:31 UTC (Thu) by rweikusat2 (subscriber, # 117920) [Link] (11 responses) RFC822 is the historic internet email RFC. That's still the local-part of an email address which is a sequence of words separated by . characters, word being defined in 3.3 as either an atom or a quoted string. In the given context, this also means no UTF8 and some restrictions beyond that. [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 5, 2024 19:40 UTC (Thu) by dskoll (subscriber, #1630) [ Link] (10 responses) The local-part of your email address doesn't have to be your UNIX user name, though. It often is for convenience, but while the local-part of my email address is dianne, that is not my UNIX login name. So appealing to email as a reason to restrict UNIX login names is not a great argument. I think a better argument is simply to make life easier for programs that need to deal with login names and that don't want to worry about UTF-8 canonicalization, etc. [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 5, 2024 19:51 UTC (Thu) by rweikusat2 (subscriber, # 117920) [Link] (9 responses) Regardless of what someone's public email address might be, username@hostname, hostname here both referring to the actual hostname and the host FQDN, is always also a valid email address. The implication of this is mostly that "programs dealing with login names" include any MTA ever written for UNIX and very likely, all other programs ever written to handle email on UNIX, IOW, to name the (probably) most scary example, if you want to allow UTF8 in usernames, are prepared to patch sendmail and procmail to support that? [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 5, 2024 20:57 UTC (Thu) by zeha (subscriber, #61580) [Link ] (2 responses) It was already discovered that various MTAs and MUAs cannot deal with non-ascii in gecos, so clearly these programs no longer matter. [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 5, 2024 21:09 UTC (Thu) by rweikusat2 (subscriber, # 117920) [Link] These were some technical remarks about usernames and not an invitation to an open-ended policy discussion about who dictates (or believe he should really get to dictate) what has to "matter" to other people. [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 6, 2024 19:43 UTC (Fri) by raven667 (subscriber, #5198) [ Link] > It was already discovered that various MTAs and MUAs cannot deal with non-ascii in gecos, so clearly these programs no longer matter. I think it's worth the effort to identify and fix those programs so people can use their real name for display in the way they prefer to see it regardless of what language they use. If there is no one maintaining a particular MTA or MUA or whatever that breaks because of this, then you've learned that unmaintained software eventually breaks when the world changes around it, but this kind of change could be eased into over several release cycles by making it optional while bug reports and testing are done, before accepting it as the default and a blocker. [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 5, 2024 21:23 UTC (Thu) by dskoll (subscriber, #1630) [ Link] (5 responses) No, that would not be fun, but still... appealing to email addresses as a reason to restrict usernames isn't a good argument. Some email systems store email in ways that don't necessarily depend on UNIX login names at all (for example, Cyrus IMAP.) [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 5, 2024 21:59 UTC (Thu) by rweikusat2 (subscriber, # 117920) [Link] (4 responses) Email systems which weren't based on UNIX have existed since before UNIX gained any networking capabilities (AFAICT, even before UUCP) but that's besides the point. A UNIX system is also an email system and this system is based on using the UNIX username as local-part of an internet email address. That's just a technical fact people considering to extend the username syntax to include octets outside of the range of printable ASCII characters might want to take into account. Or not, depending on what their priorities are. [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 5, 2024 23:26 UTC (Thu) by Wol (subscriber, #4433) [Link] > Email systems which weren't based on UNIX have existed since before UNIX gained any networking capabilities I think the birth of email actually predates the birth of Unix? Cheers, Wol [Reply to this comment] UNIX and email Posted Dec 5, 2024 23:47 UTC (Thu) by KJ7RRV (subscriber, #153595) [ Link] > A UNIX system is also an email system and this system is based on using the UNIX username as local-part of an internet email address. I think I'm misunderstanding this part? It seems to mean that all UNIX systems are email servers; is that correct? [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 6, 2024 0:15 UTC (Fri) by dvdeug (subscriber, #10998) [ Link] (1 responses) > A UNIX system is also an email system A fully POSIX-compliant UNIX system has an email system, though in the modern world, very few UNIX systems are connected to Internet email. I wouldn't say it's not UNIX if it doesn't have an email system. I removed mailutils, mailx, and mailcap from my Debian unstable system, and nothing depended on them. The concept of open access to email via Internet has been lost, and system-wide email isn't very useful on a single-user system. [Reply to this comment] Real-world non-alphanumeric usernames Posted Dec 6, 2024 4:42 UTC (Fri) by KJ7RRV (subscriber, #153595) [ Link] Thank you! I didn't realize that POSIX requires email; that explains it. [Reply to this comment] usernames are a low-level implementation detail Posted Dec 6, 2024 4:55 UTC (Fri) by marcH (subscriber, #57642) [Link ] > people use it to address me in written and spoken conversations with it, etc. Just ask these people to stop. Then, do as many other people do and simply treat _both_ usernames and uids as low-level implementation details; that's what they are. Asking all programs in the universe to agree on some UTF-8 subset for usernames is totally unrealistic. This discussion and article barely scratch that surface. > I see and type my username hundreds times a day Not sure what the problem is here. Surely, anyone can find something in ASCII that's not unpleasant to look at? The simple and reliable way forward is to allow UTF-8 in some non-key, free-form, pure display field like "gecos" or similar and pressure applications to display that in User Interfaces and as many places as possible - while still relying on portable, unique and bug-free ASCII usernames in code and other implementation details. Isn't it what's happening already? [Reply to this comment] French people who believe E does not exist Posted Dec 6, 2024 5:03 UTC (Fri) by marcH (subscriber, #57642) [Link ] (3 responses) > Etienne Mollier noted that he had "one weird enough" character in his first name that posed a problem if he had to log in using a keyboard layout that lacked the capability to transcribe the lower-case or upper-case 'e' acute characters ("e" or "E") A French "fun fact" is that many French people wrongly believe that E, A, U etc. "do not exist" because... the default _Windows_ keyboard layout for France makes these incredibly hard to type! fr_FR layouts on Mac and Linux are not affected and neither are some other French-speaking countries. e/E is one of the most common characters in French. Note this is pure software issue: there's no relevant, physical difference between Windows and Macs keyboard. https://www.google.com/search?q=majuscules+accentu%C3%A9es Even more fun: you can tell whether newspapers and other editors use Windows or not by simply looking at their front page. Examples: https://www.lemonde.fr/ -> Economie https://www.liberation.fr/ -> Economie [Reply to this comment] French people who believe E does not exist Posted Dec 6, 2024 6:49 UTC (Fri) by victrid (subscriber, #163116) [ Link] (2 responses) I think that's what GECOS field is all about. In fact, you can type Japanese characters directly on the keyboard, but you cannot expect to see them in text mode. Supporting CJK characters included in UTF-8 is too complicated compared to supporting Latin-1. Imagine desperate ops logging in to rescue via the console and nothing except blank diamond symbols can be displayed. [Reply to this comment] French people who believe E does not exist Posted Dec 6, 2024 11:30 UTC (Fri) by mbunkus (subscriber, #87248) [ Link] (1 responses) I'm kinda confused when you say "In fact, you can type Japanese characters directly on the keyboard, but you cannot expect to see them in text mode.". There are tons of CLI programs out there with translations into languages that include more than ASCII characters, including but not limited to Chinese Traditional, Chinese Simplified, Japanese, and Korean. They can display their messages just fine. [Reply to this comment] French people who believe E does not exist Posted Dec 6, 2024 19:59 UTC (Fri) by wahern (subscriber, #37304) [ Link] Maybe they had in mind VGA text mode console screens. But a little Googling suggests that many consoles these days do display at least some CJK characters. The UEFI specification explicitly references VT-UTF8, for example, but I would assume products targeted at East Asian customers had solutions (e.g. PC-98) long before these problems were addressed in common standards. [Reply to this comment] It's bad Posted Dec 6, 2024 19:41 UTC (Fri) by rgb (subscriber, #57129) [Link] As long as UTF-8 names are not fully supported, the "bad" in "--allow-bad-names" serves as a crucial hint to the unaware user that "bad" things can happen. Sing it: https://genius.com/Michael-jackson-bad-lyrics [Reply to this comment] RFC 8265 defines how to normalize and compare Unicode usernames Posted Dec 6, 2024 20:48 UTC (Fri) by gioele (subscriber, #61675) [ Link] > He asked if POSIX or other standards provided a normalization form for UTF-8 encoded usernames. Later in the thread [1] Michal Politowski pointed out that RFC 8265 "Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords" and its sibling RFC 8264 "PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols" do in fact describe which normalization forms should be used when comparing Unicode usernames (as well as a number of other low-level details). [1] https://lists.debian.org/debian-devel/2024/11/msg00507.html [Reply to this comment] Copyright (c) 2024, Eklektix, Inc. Comments and public postings are copyrighted by their creators. Linux is a registered trademark of Linus Torvalds