https://lwn.net/SubscriberLink/1000485/670ef0045e5e8a3e/

LWN.net Logo LWN
.net News from the source LWN

  * Content
      + Weekly Edition
      + Archives
      + Search
      + Kernel
      + Security
      + Events calendar
      + Unread comments
      + -------------------------------------------------------------
      + LWN FAQ
      + Write for us

User: [        ] Password: [        ] [Log in]
|
[Subscribe]
|
[Register]
Subscribe / Log in / New account

Debian opens a can of username worms

[LWN subscriber-only content]

    Welcome to LWN.net

    The following subscription-only content has been made available
    to you by an LWN subscriber. Thousands of subscribers depend on
    LWN for the best news from the Linux and free software
    communities. If you enjoy this article, please consider
    subscribing to LWN. Thank you for visiting LWN.net!

By Joe Brockmeier
December 5, 2024

It has long been said that naming things is one of the hard things to
do in computer science. That may be so, but it pales in comparison to
the challenge of handling usernames properly in applications. This is
especially true when multiple applications are involved, and they are
all supposed to agree on what characters are, and are not, allowed.
The Debian project is facing that problem right now, as two
user-creation utilities disagreed about which names are allowable. A
plan is in place to sort this out before the release of Debian 13
("trixie") sometime next year.

The useradd utility is part of the shadow-utils project, which
includes programs for managing user and group accounts. The
shadow-utils suite is included in Debian's passwd package. For
historical reasons, and to avoid confusion with the upstream project,
Debian's version of the shadow-utils sources are often referred to as
"src:shadow".

Most Debian users don't work with useradd, or groupadd, directly.
Instead, Debian has long supplied its own adduser (and addgroup)
utilities, originally written by founder Ian Murdock. These act as
simpler front ends to useradd and use Debian-supplied system defaults
for creating users' home directories and configurations. It should be
noted that useradd, et al., have become much more full-featured since
Debian's utilities were introduced, but the project continues to
maintain them nonetheless.

Little Bobby Tables

In June, Debian developer and src:shadow maintainer Chris
Hofstaedtler filed a bug against the adduser package. The src:shadow
package had dropped a Debian-specific patch, originally introduced in
2003 by Karl Ramm, to allow characters far beyond what were allowed
by the upstream shadow-utils project. In the patch, Ramm wrote:

    I can't come up with a good justification as to why characters
    other than ':'s and '\0's should be disallowed in group and
    usernames (other than '-' as the leading character). Thus, the
    maintenance tools don't anymore.

Hofstaedtler said that he had puzzled out some of the patch's purpose
from old bug reports that had been "fixed" by the patch, and those
asked for two things not allowed by the upstream shadow-utils:
usernames with upper-case characters or that are purely numeric.
Hofstaedtler said that upper-case names had been allowed in the
upstream shadow-utils project "`a long time ago'", but it seemed like
a bad idea to allow purely numeric usernames.

The patch enabled much more than upper-case and purely numeric names,
though. With the patch dropped in version 1:4.15.2-2 of the shadow
source package, one of adduser's tests--which explicitly allowed a
username reminiscent of a famous xkcd comic 
("`bob;>/hacked'")--had failed:

    For src:shadow, I would really like to not have a divergence from
    upstream in this regard. I think if we have clear requirements
    then we (I) can submit them upstream and I would expect upstream
    to accept patches.

    I do feel that making the case for "bob;>/hacked" would be very
    hard.

Hofstaedtler said that the patch had been reapplied for the time
being, it was included again in version 1:4.15.2-3, but he asked if
username requirements could be sorted out in time for the Debian
"trixie" release. If the patch were dropped entirely, then useradd
would restrict usernames to the POSIX standard, with the exception of
allowing a "$" character at the end of a username

Debian developer and adduser maintainer Marc Haber replied in late
October that other tests were failing as well, and thought that "
`useradd upstream is being too picky here'". Since adduser depends on
useradd it could not create users that useradd would reject, he said
he would like to synchronize on what would be allowed or not.

As part of the research into what should be allowed in usernames,
Haber took over Debian's UserAccounts wiki page, which outlines
Debian's username tools and policies, and started looking into
whether the project should relax its requirements around usernames.

Limits on usernames

One of the questions that bubbles up when looking at usernames is not
just allowable characters, but the allowable length of the username.
The documentation for shadow-utils does not specify a length for
usernames or what encoding is being used.

However, the POSIX standard says that usernames should not include
non-ASCII characters to be portable between systems. The standard
says that usernames should be "`composed of characters from the
portable filename character set'". That set is comprised of numbers 0
through 9, upper-case and lower-case "a" through "z", the period (.),
underscore (_), and hyphen (-). It also specifies that usernames
should not begin with a hyphen.

It is, however, possible to assign characters outside that set with
the tools at hand. But Linux distributions usually put up some
guardrails in the adduser and useradd configurations to prevent
administrators from creating usernames with non-ASCII characters
unintentionally. These configurations can be overridden with
adduser's --allow-bad-names option or useradd's --badname option.

In November, Haber posted a message on debian-devel that he had "
`opened an especially nasty can of worms'" and was finding that
things were more complicated than he had understood. He sought input
and opinions on a number of questions about whether Debian should
allow non-ASCII characters for usernames, how to do that if so, and
if it was more appropriate to document username guidance in Debian's
Policy Manual rather than its wiki. His suggestion was to allow UTF-8
for regular user accounts, but to restrict to ASCII for system
accounts created by Debian packages.

Richard Lewis asked if enabling UTF-8 would open the door to "`some
of the abuse described'" in a 2021 LWN article about flaws in Unicode
handling that led to security exploits. He said that it seemed to be
a bad idea to make the change, even if it would be nicer for users to
have the option.

Haber said that he was not sure if it would be dangerous to allow
UTF-8 usernames, "`since we can expect other commands to gracefully
handle a byte stream, can't we?'" Additionally, local administrators
already can loosen restrictions to allow UTF-8 usernames, but Debian
does not test for such use cases. Debian would become "`more robust'"
if it assumed UTF-8 characters would be used in usernames. "
`Vulnerabilities that could be exploited by having non-ascii user
names are already here and present today, just not uncovered yet.'"

It would be reasonable, Timo Rohling said, to mitigate possible
homograph attacks by disallowing mixed alphabets "`such as cyrillic
and latin letters in the same name'". Haber said that was not going
to help if a user could directly write to /etc/passwd, and he was
unwilling to implement that himself in adduser. He would accept code
and test cases written by others, though.

Keyboards

Security concerns aside, there are other practical problems with
supporting non-ASCII usernames. Etienne Mollier noted that he had "
`one weird enough'" character in his first name that posed a problem
if he had to log in using a keyboard layout that lacked the
capability to transcribe the lower-case or upper-case 'e' acute
characters ("e" or "E"). For that reason, he said, he felt better
about keeping a full ASCII username and "`wouldn't feel strongly if
unicode support for login never happens'". But it would be good if
the gecos field of the passwd file had proper Unicode support to
properly display users' real names.

Not only was it difficult to type "e" on some keyboards, it could
also be encoded in multiple ways. Gioele Barabucci pointed out that
it could be "`e with acute'" which is encoded in UTF as U+00E9, or it
could be "`e, combined with an [acute] accent'" which would be U+0065
plus U+0301:

    If a keyboard input system provides the former sequence of bytes,
    but the username is stored in the login infrastructure using the
    latter sequence of [bytes], then a naive comparison will not find
    the user "emollier" in the system. Unicode defines in Annex 15 a
    few normalization forms as a way to work around this problem. But
    a correct use of these normalization forms still requires
    coordination and standardization among all programs accessing the
    data.

He asked if POSIX or other standards provided a normalization form
for UTF-8 encoded usernames. Peter Pentchev responded that POSIX said
to stick to the portable filename character set to ensure
portability. Haber argued that it should be up to local admins to
decide whether they wanted their local user database to be portable.
"`I don't think that we should restrict local admins who don't need
that kind of portability.'"

Simon McVittie recommended that Debian consider adopting systemd's
user name syntax and concepts of "strict mode" and "relaxed mode".
The systemd tooling adheres to a strict naming convention when
creating usernames, but it has a relaxed convention for accepting
usernames created by other tools. McVittie said that seemed like a
good principle for Debian to follow, even if its specific rules might
differ from systemd's.

Haber seemed to agree in part, but said systemd's strict mode was "
`even stricter than what we currently allow for system accounts'",
and he did not like that systemd's policies (especially with
systemd-homed, which LWN covered recently) were not configurable.

This time it's personal

The discussion, perhaps not surprisingly, brought out some strong
feelings about how names and usernames were represented. Especially
when, as Hofstaedtler noted, usernames can be important to some
users:

    I see and type my username hundreds times a day, people use it to
    address me in written and spoken conversations with it, etc.

    If it were my uid, which I see maybe once a week and don't have
    to remember, I wouldn't care.

Indeed, it's not uncommon in open-source communities or within
organizations to use a person's username rather than their given
name--so it is unsurprising that some people feel strongly that
usernames should be composed of a wider range of characters than
POSIX recommends. Others dislike the practice of conflating usernames
with real-world names, and see little reason to go to any trouble to
go beyond ASCII.

Johannes Schauer Marin Rodrigues supported allowing more than ASCII
in usernames. He said it would be good for Debian to put pressure on
other projects to provide Unicode support. "`We cannot find these
kind of bugs if we accept translating everybody's given name to the
American alphabet.'" Balint Reczey, though, asked that Debian avoid
opening that can of worms and imposing needless work on upstreams. "
`Keep what works reasonably well for decades.'"

A plan

Haber initially seemed bullish on allowing UTF-8 usernames in Debian
"`as a courtesy to those people who need non-ascii user names to
write their name'" and as an opportunity to find "`bugs that are
already here'" in Debian's software. He acknowledged that it is late
in the development cycle for trixie. But, since it was currently
possible to create usernames with UTF-8 characters, he did not want
to tighten restrictions in trixie versus Debian 12, only to revisit
those restrictions for Debian 14. In a reply to Mollier he wondered
about what advice to give in Debian's documentation "`once we have
decided to officially allow UTF-8 login names'".

On December 3, however, Haber said that he "`finally understood'"
that UTF-8 support would require more than the ability to create an
UTF-8 encoded username and write it to /etc/passwd. Homograph
characters, such as U+00E9 (e) and U+0065 plus U+0301 (e), could be
used with adduser to create two separate users with lookalike
usernames:

    At the least, adduser should reject creating etienne if etienne
    already exists - those are different user names but look the
    same, and if you don't cut-and-paste user names instead of typing
    them you're bound to hit the wrong user depending on HOW you type
    and what input medium you use. Not good.

Haber said that he was the only active developer working on adduser
and did not have time to implement a check against lookalike
usernames in time for the trixie release. Worse, he said, the Perl
module that he would use (Unicode::Precis) was not packaged for
Debian and had not had a release in more than five years.

The next version of adduser, Haber said, would reject UTF-8 usernames
by default. They would still be allowed when using the
--allow-bad-names option, but he said he wanted to deprecate that
option name in favor of something that doesn't use the word "bad".
The --allow-all-names option will continue to pass everything
verbatim to useradd.

Mollier thanked Haber for his work on the problem, and suggested some
alternatives to the bad names option. Barabucci also thanked Haber
for taking the time to research the issue, to which Haber replied
dryly, "`I have learned many things.'"

Haber's current course of action for adduser seems the most prudent.
There may be a day when it is more practical to expand the allowed
characters for usernames, but the work required to do so right now is
far greater than the benefits that users would gain in the process.


[Send a free link]


-----------------------------------------
[Log in] to post comments

Anything but POSIX portable filename set with a conservative length
restriction is dangerous

Posted Dec 5, 2024 17:22 UTC (Thu) by isotopp (subscriber, #99763) [
Link] (5 responses)

Login Names in Unix have always been very restricted, they are not
just part of 'ls' output, but also are transported in some tar
formats as owners, are automatically parts of mail adresses and have
other, not cataloged use-cases.

If you allow utf-8 here, and relax length restrictions, it is unclear
and unknowable what will happen downstream with other applications.

If you want to login as 'Kristian Kohntopp', it is probably useful to
have an LDAP like name canonicalization mechanism that does a lookup
to get a unix username and then tries the password with that.
Anything else is very likely to break unexpected things.

In my personal opinion, even a --badnames option is wrong.

Or you go, and actually perform the work to define a username format
for Unix (not just Linux), catalog use-cases and make sure that they
actually work with full UTF-8, and whatever relaxed length limit you
define. And then be prepared to handle a login with kurisu (kurisu)
instead of kris.
[Reply to this comment]
Anything but POSIX portable filename set with a conservative length
restriction is dangerous

Posted Dec 6, 2024 9:33 UTC (Fri) by kleptog (subscriber, #1183) [
Link] (1 responses)

I've gotten used to using --allow-bad-names all the time at my work
because the default prohibits periods (".") and our company uses them
in usernames as firstname.lastname.

Though I've now checked the docs and apparently it's possible to
change the allowed characters in the configuration file so maybe
that's a better approach for ansible deployed machines.
[Reply to this comment]
Anything but POSIX portable filename set with a conservative length
restriction is dangerous

Posted Dec 6, 2024 19:28 UTC (Fri) by raven667 (subscriber, #5198) [
Link]

A place I used to work was purchased by a company which had
standardized on "Firstname Lastname" in their AD for username/
samAccountName so when we started joining the Unix (Mac/Linux)
machines to their AD with winbind I was surprised with how few things
actually broke, "/home/First Last/Documents" was OK in most GUI
tools, but there were definitely things that broke. Eventually they
did re-standardize on usernames without spaces, but I was surprised
it worked at all.
[Reply to this comment]
Anything but POSIX portable filename set with a conservative length
restriction is dangerous

Posted Dec 6, 2024 10:21 UTC (Fri) by smurf (subscriber, #17840) [
Link]

Heh. I had to troubleshoot a system with a UTF8ified username last
year. Let me tell you, the number of programs that format columns by
counting bytes is, umm, truly staggering.

Anyway. A little bit of safety should be in everybody's interest,
i.e. no mixed-charset names, and use some normal form to check for
existing usernames.

Writing the above sentences is significantly easier than implementing
them. While cyrillic vs. greek definitely is a problem, but latin vs.
CJK? not so much IMHO. Normalize to exactly which normal form using
which version of the Unicode standard? What do I do on the console,
type \U4E52\U4E53 instead of Ping Pang ? what if my username is "420"?

On the other hand ... I never type my username anyway. When logging in
on the GUI I click on my avatar, when connecting to a remote system
with SSH or whatever it's the default, and a fresh text-only console
login is easy because there the username is "root". 

[Reply to this comment]
Anything but POSIX portable filename set with a conservative length
restriction is dangerous

Posted Dec 6, 2024 19:14 UTC (Fri) by rgb (subscriber, #57129) [Link]
(1 responses)

> Not only was it difficult to type "e" on some keyboards, it could
also be encoded in multiple ways.
I think that says it all. Unicode is made to display text, not to
create IDs.
[Reply to this comment]
Anything but POSIX portable filename set with a conservative length
restriction is dangerous

Posted Dec 6, 2024 19:35 UTC (Fri) by raven667 (subscriber, #5198) [
Link]

I think you are right here and restricting the username to a limited
subset of bytes that existing tools don't have any trouble
interpreting and displaying makes sense, but the GECOS field
definitely should be extended to support full UTF-8 encoded names as
a courtesy and to be friendly to actual humans and their real written
names they want to use. Having machine-readable username/uid/gid(s)
distinct from a human display name makes sense and changes the
requirements quite a bit where having two different encodings for "e"
or "420" isn't really a problem that needs to be solved.
[Reply to this comment]
Doesn't the GECOS field already cover some of this use case?

Posted Dec 5, 2024 18:52 UTC (Thu) by NYKevin (subscriber, #129325) [
Link] (4 responses)

Maybe this is my Anglophone chauvinism speaking, but you can already
set an arbitrary human-readable display name in the GECOS field, and
most login GUIs prefer to display that name over the username when
both are available. Is it really critical to allow non-ASCII
characters in the username itself? How many people are trying to log
into a command line environment *and* cannot type in ASCII?
[Reply to this comment]
Doesn't the GECOS field already cover some of this use case?

Posted Dec 5, 2024 20:57 UTC (Thu) by zeha (subscriber, #61580) [Link
] (3 responses)

> Maybe this is my Anglophone chauvinism speaking

Yes.
[Reply to this comment]
Doesn't the GECOS field already cover some of this use case?

Posted Dec 5, 2024 22:00 UTC (Thu) by Cyberax ( supporter , #52523)
[Link]

As someone speaking several languages with non-Latin alphabets,
sometimes it makes sense to stick to ASCII. Otherwise, you're just
setting yourself for a world of pain. Imagine entering Chinese text
on a terminal in text mode.
[Reply to this comment]
Doesn't the GECOS field already cover some of this use case?

Posted Dec 6, 2024 14:32 UTC (Fri) by khim (subscriber, #9252) [Link]
(1 responses)

I would say it's "yes" and "no", simultaneously.

I have meet a lot of people who simply don't know English well enough
to type name in ASCII.

Unfortunately the majority of them I have meet when they cried on
various forums about how unfair it is that they "have only just used
Cyrillic (Arabic, Farsi, etc) name" - and now have so many broken
programs they couldn't even count them all.

Yes, it's deeply anglophonic, yes, it's unfair, true, people
genuinely suffer if your force that on them...

But the experience says that it's still better for them to lean 1
(one) English world (their account name) once then suffer through
innumerable programs that don't support any other names properly.

[Reply to this comment]
Doesn't the GECOS field already cover some of this use case?

Posted Dec 6, 2024 21:46 UTC (Fri) by epk (guest, #174765) [Link]

I must sadly approve of this answer.

And it's not as though a non-Latin-alphabet username would really
help that much, since so much text - especially in path names and
URLs - is in English. There is, however, the full name of each user,
and I'm guessing that should be much easier to have non-Latin UTF-8
in. And for non-computer-literate users who need a lot of
hand-holding, they might actually see mostly/only their full names.
[Reply to this comment]
Once upon a time in the past ...

Posted Dec 5, 2024 19:11 UTC (Thu) by rweikusat2 (subscriber, #
117920) [Link] (4 responses)

... people used a weird text messaging system called email to send,
well, text messages to users supposed to be delivered into their
mailboxes at certain hosts. The email address of a user user on host
host would be user@host. IOW, a unix username is also what RFC5322
calls the local-part of an email address. And this means ASCII, even
according to the newest specification.

https://datatracker.ietf.org/doc/html/rfc5322
[Reply to this comment]
Once upon a time in the past ...

Posted Dec 5, 2024 20:25 UTC (Thu) by storner (subscriber, #119) [
Link] (2 responses)

I think you are mistaken about requiring ASCII for the name of the
mailbox. The RFC you refer to says (section 3.4.1):

The local-part portion is a domain-dependent string. In addresses,
it is simply interpreted on the particular host as a name of a
particular mailbox.

"Domain-dependent" means that there are really no rules as to which
characters can be used. It can even be quoted to allow whitespace.

[Reply to this comment]
Once upon a time in the past ...

Posted Dec 5, 2024 21:02 UTC (Thu) by rweikusat2 (subscriber, #
117920) [Link] (1 responses)

The grammar rule defining local-part,

local-part = dot-atom / quoted-string / obs-local-part

is at the beginning of the RFC page which contains the statement

The local-part portion is a domain-dependent string.

The claim that domain-dependent would mean "no requirements" is thus
obviously wrong. dot-atom and quoted-string are defined in sections
3.2.3 ("Atom") and 3.2.4 ("Quoted Strings"). Drilling down to the
actual character set specifications always ends with a subset of
ASCII, the most liberal one being the one for quoted strings which
includes whitespace and all printable characters, ie, codepoints 32 -
126.
[Reply to this comment]
Once upon a time in the past ...

Posted Dec 5, 2024 21:04 UTC (Thu) by rweikusat2 (subscriber, #
117920) [Link]

Slight correction: The quoted-string character set excludes \ and ".
[Reply to this comment]
Once upon a time in the past ...

Posted Dec 6, 2024 4:38 UTC (Fri) by jheiss (subscriber, #62556) [
Link]

RFC 6532 extends 5322 to allow UTF-8 in email addresses in message
headers, and 6531 extends 5321 (SMTP) to allow UTF-8 in SMTP
addressing. There are some draft documents in the IETF mailmaint
working group which try to set some guidelines about mixed languages
and other possibly confusing situations, but servers that implement
6531/6532 could allow any combination of UTF-8 characters in
usernames.
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 5, 2024 19:19 UTC (Thu) by rhowe (subscriber, #102862) [
Link] (12 responses)

Thinking about usernames in the OS more broadly, winbind (I think by
default, but if not it's certainly one of the main options) generates
usernames of the form "DOMAIN\username" and I know of at least one
deployment which uses this.

Now, these users do not exist in the passwd file and therefore aren't
created via useradd or adduser so this isn't directly relevant to the
issue being discussed here, but it is certainly legitimate for
usernames to contain "funky" characters and indeed potentially
problematic ones. For example, if something were to treat the
backslash as an escape character then all sorts of fun could occur
from injecting of newlines into logs to injection of null
terminators. Inadequate quoting in shell scripts being a prime
example.

Also, both the domain and username portions are determined by the
records in Windows' Active Directory and therefore need to follow the
rules for that system. For the 'sAMAccountName' field, it's
documented at https://learn.microsoft.com/en-us/windows/win32/
adschema/... where interestingly it's defined as a Unicode string but
not containing any of: "/ \ [ ] : ; | = , + * ? < >
The more modern userPrincipalName attribute is defined as following
RFC822 which is not very helpful given the broad nature of that RFC:
https://learn.microsoft.com/en-us/windows/win32/adschema/...
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 5, 2024 19:31 UTC (Thu) by rweikusat2 (subscriber, #
117920) [Link] (11 responses)

RFC822 is the historic internet email RFC. That's still the
local-part of an email address which is a sequence of words separated
by . characters, word being defined in 3.3 as either an atom or a
quoted string. In the given context, this also means no UTF8 and some
restrictions beyond that.
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 5, 2024 19:40 UTC (Thu) by dskoll (subscriber, #1630) [
Link] (10 responses)

The local-part of your email address doesn't have to be your UNIX
user name, though. It often is for convenience, but while the
local-part of my email address is dianne, that is not my UNIX login
name.

So appealing to email as a reason to restrict UNIX login names is not
a great argument. I think a better argument is simply to make life
easier for programs that need to deal with login names and that don't
want to worry about UTF-8 canonicalization, etc.
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 5, 2024 19:51 UTC (Thu) by rweikusat2 (subscriber, #
117920) [Link] (9 responses)

Regardless of what someone's public email address might be,
username@hostname, hostname here both referring to the actual
hostname and the host FQDN, is always also a valid email address. The
implication of this is mostly that "programs dealing with login
names" include any MTA ever written for UNIX and very likely, all
other programs ever written to handle email on UNIX, IOW, to name the
(probably) most scary example, if you want to allow UTF8 in
usernames, are prepared to patch sendmail and procmail to support
that?

[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 5, 2024 20:57 UTC (Thu) by zeha (subscriber, #61580) [Link
] (2 responses)

It was already discovered that various MTAs and MUAs cannot deal with
non-ascii in gecos, so clearly these programs no longer matter.
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 5, 2024 21:09 UTC (Thu) by rweikusat2 (subscriber, #
117920) [Link]

These were some technical remarks about usernames and not an
invitation to an open-ended policy discussion about who dictates (or
believe he should really get to dictate) what has to "matter" to
other people.
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 6, 2024 19:43 UTC (Fri) by raven667 (subscriber, #5198) [
Link]

> It was already discovered that various MTAs and MUAs cannot deal
with non-ascii in gecos, so clearly these programs no longer matter.

I think it's worth the effort to identify and fix those programs so
people can use their real name for display in the way they prefer to
see it regardless of what language they use. If there is no one
maintaining a particular MTA or MUA or whatever that breaks because
of this, then you've learned that unmaintained software eventually
breaks when the world changes around it, but this kind of change
could be eased into over several release cycles by making it optional
while bug reports and testing are done, before accepting it as the
default and a blocker.
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 5, 2024 21:23 UTC (Thu) by dskoll (subscriber, #1630) [
Link] (5 responses)

No, that would not be fun, but still... appealing to email addresses
as a reason to restrict usernames isn't a good argument. Some email
systems store email in ways that don't necessarily depend on UNIX
login names at all (for example, Cyrus IMAP.)
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 5, 2024 21:59 UTC (Thu) by rweikusat2 (subscriber, #
117920) [Link] (4 responses)

Email systems which weren't based on UNIX have existed since before
UNIX gained any networking capabilities (AFAICT, even before UUCP)
but that's besides the point. A UNIX system is also an email system
and this system is based on using the UNIX username as local-part of
an internet email address. That's just a technical fact people
considering to extend the username syntax to include octets outside
of the range of printable ASCII characters might want to take into
account. Or not, depending on what their priorities are.
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 5, 2024 23:26 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Email systems which weren't based on UNIX have existed since before
UNIX gained any networking capabilities

I think the birth of email actually predates the birth of Unix?

Cheers,
Wol
[Reply to this comment]
UNIX and email

Posted Dec 5, 2024 23:47 UTC (Thu) by KJ7RRV (subscriber, #153595) [
Link]

> A UNIX system is also an email system and this system is based on
using the UNIX username as local-part of an internet email address.

I think I'm misunderstanding this part? It seems to mean that all
UNIX systems are email servers; is that correct?
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 6, 2024 0:15 UTC (Fri) by dvdeug (subscriber, #10998) [
Link] (1 responses)

> A UNIX system is also an email system

A fully POSIX-compliant UNIX system has an email system, though in
the modern world, very few UNIX systems are connected to Internet
email. I wouldn't say it's not UNIX if it doesn't have an email
system. I removed mailutils, mailx, and mailcap from my Debian
unstable system, and nothing depended on them. The concept of open
access to email via Internet has been lost, and system-wide email
isn't very useful on a single-user system.
[Reply to this comment]
Real-world non-alphanumeric usernames

Posted Dec 6, 2024 4:42 UTC (Fri) by KJ7RRV (subscriber, #153595) [
Link]

Thank you! I didn't realize that POSIX requires email; that explains
it.
[Reply to this comment]
usernames are a low-level implementation detail

Posted Dec 6, 2024 4:55 UTC (Fri) by marcH (subscriber, #57642) [Link
]

> people use it to address me in written and spoken conversations
with it, etc.

Just ask these people to stop. Then, do as many other people do and
simply treat _both_ usernames and uids as low-level implementation
details; that's what they are.

Asking all programs in the universe to agree on some UTF-8 subset for
usernames is totally unrealistic. This discussion and article barely
scratch that surface.

> I see and type my username hundreds times a day

Not sure what the problem is here. Surely, anyone can find something
in ASCII that's not unpleasant to look at?

The simple and reliable way forward is to allow UTF-8 in some
non-key, free-form, pure display field like "gecos" or similar and
pressure applications to display that in User Interfaces and as many
places as possible - while still relying on portable, unique and
bug-free ASCII usernames in code and other implementation details.
Isn't it what's happening already?

[Reply to this comment]
French people who believe E does not exist

Posted Dec 6, 2024 5:03 UTC (Fri) by marcH (subscriber, #57642) [Link
] (3 responses)

> Etienne Mollier noted that he had "one weird enough" character in
his first name that posed a problem if he had to log in using a
keyboard layout that lacked the capability to transcribe the
lower-case or upper-case 'e' acute characters ("e" or "E")

A French "fun fact" is that many French people wrongly believe that
E, A, U etc. "do not exist" because... the default _Windows_ keyboard
layout for France makes these incredibly hard to type! fr_FR layouts
on Mac and Linux are not affected and neither are some other
French-speaking countries.

e/E is one of the most common characters in French.

Note this is pure software issue: there's no relevant, physical
difference between Windows and Macs keyboard.

https://www.google.com/search?q=majuscules+accentu%C3%A9es

Even more fun: you can tell whether newspapers and other editors use
Windows or not by simply looking at their front page. Examples:

https://www.lemonde.fr/ -> Economie

https://www.liberation.fr/ -> Economie

[Reply to this comment]
French people who believe E does not exist

Posted Dec 6, 2024 6:49 UTC (Fri) by victrid (subscriber, #163116) [
Link] (2 responses)

I think that's what GECOS field is all about.

In fact, you can type Japanese characters directly on the keyboard,
but you cannot expect to see them in text mode. Supporting CJK
characters included in UTF-8 is too complicated compared to
supporting Latin-1.

Imagine desperate ops logging in to rescue via the console and
nothing except blank diamond symbols can be displayed.
[Reply to this comment]
French people who believe E does not exist

Posted Dec 6, 2024 11:30 UTC (Fri) by mbunkus (subscriber, #87248) [
Link] (1 responses)

I'm kinda confused when you say "In fact, you can type Japanese
characters directly on the keyboard, but you cannot expect to see
them in text mode.". There are tons of CLI programs out there with
translations into languages that include more than ASCII characters,
including but not limited to Chinese Traditional, Chinese Simplified,
Japanese, and Korean. They can display their messages just fine.
[Reply to this comment]
French people who believe E does not exist

Posted Dec 6, 2024 19:59 UTC (Fri) by wahern (subscriber, #37304) [
Link]

Maybe they had in mind VGA text mode console screens. But a little
Googling suggests that many consoles these days do display at least
some CJK characters. The UEFI specification explicitly references
VT-UTF8, for example, but I would assume products targeted at East
Asian customers had solutions (e.g. PC-98) long before these problems
were addressed in common standards.

[Reply to this comment]
It's bad

Posted Dec 6, 2024 19:41 UTC (Fri) by rgb (subscriber, #57129) [Link]

As long as UTF-8 names are not fully supported, the "bad" in
"--allow-bad-names" serves as a crucial hint to the unaware user that
"bad" things can happen.

Sing it: https://genius.com/Michael-jackson-bad-lyrics
[Reply to this comment]
RFC 8265 defines how to normalize and compare Unicode usernames

Posted Dec 6, 2024 20:48 UTC (Fri) by gioele (subscriber, #61675) [
Link]

> He asked if POSIX or other standards provided a normalization form
for UTF-8 encoded usernames.

Later in the thread [1] Michal Politowski pointed out that RFC 8265
"Preparation, Enforcement, and Comparison of Internationalized
Strings Representing Usernames and Passwords" and its sibling RFC
8264 "PRECIS Framework: Preparation, Enforcement, and Comparison of
Internationalized Strings in Application Protocols" do in fact
describe which normalization forms should be used when comparing
Unicode usernames (as well as a number of other low-level details).

[1] https://lists.debian.org/debian-devel/2024/11/msg00507.html
[Reply to this comment]

                  Copyright (c) 2024, Eklektix, Inc.
   Comments and public postings are copyrighted by their creators.
          Linux is a registered trademark of Linus Torvalds