Subj : Re: Locale & C Library To : apam From : tenser Date : Fri Sep 26 2025 01:45:29 On 24 Sep 2025 at 02:47a, apam pondered and said... ap> I was wondering if you know about setting the locale in a c library? I'm ap> thinking maybe I should do that in crt0.o? ap> ap> At present on my OS, one must set the locale using setlocale in the ap> program, but if I want my programs to have a default locale, I am ap> thinking maybe I should set it before jumping to main() from an ap> environment variable. ap> ap> Does this sound reasonable? Locales and things have always been a bit of ap> a mystery to me :) Whoo boy. This opens up a can of worms. But let me try to address your specific question first. The short answer is no, you probably don't want to do that. More specifically, I'm going by what POSIX, C, and existing libc implementations do. POSIX says there's a global default, called "POSIX" (aliased as "C") that gives you sort of the minimum baseline for running C programs. The current version of POSIX, POSIX 2024, includes the 2018 revision of ISO C standard as a normative reference, and kicks the specifics of what's done when here over to C. C, in turn, is quite clear about this; section 7.11.1 of C 2018 covers the details, and para (4) of that section states: "At program startup, the equivalent of `setlocale(LC_ALL, "C");` is executed." This strongly implies that one wouldn't do `setlocale()` for a non-default locale in crt0 before calling `main`. Looking at a smattering of `libc` implementations, I don't see any that touch locales in the pre-main C runtime code. So this suggests to me that your OS should arrange things so that, on entry to a program, the default has been selected. It is up to individual programs to call `setlocale()` as appropriate, if they need to care. Ok, so why is this stuff troublesome? Bluntly, the C/POSIX locale stuff isn't very good; it was designed to solve a problem that was, and is, very real: how do we write a single program that can work with the myriad different human languages and notations for similar concepts. An obvious example is, "how do we write dates?" Here in North American, we often write the numeric month first, and then the day of the month and then the year. But in other parts of the world, folks write the day of the month first, then the month, then the year. ISO-8601 date times write dates as 'year-month-day' (which has the considerable advantage of being sortable trivially, btw). Or consider the formatting of large numbers: again, in the US, we tend to write these with a comma separating multiples of powers of a thousand (that is, a comma between factors of 10^(k*3) for k>0), and use a period to separate the integral part of a number from the fractional part, such as 10,000.02. But over in Europe, they often use '.' to separate powers of thousands, and ',' to separate the integral and fractional parts. E.g., 10.000,02. To make things even more confusing, in India, they use the group things beyond a thousand ("hazar") into "lakh" (hundred thousands) and "crore" (ten million, or 100 lakh), so one hundred million (10 crore) might be written as, "10,00,00,000". And we haven't even started to talk about currency.... C and early Unix systems were invented in the US, so C programs and Unix systems tended to use US-centric conventions for such things, and the vast bulk of documentation, comments, etc, were written in (American) English. That's not unreasonable given the history, but folks elsewhere in the world wanted to use their own conventions and languages; locales were introduced to solve this. Except that they solve the wrong problems: in particular, they conflate things like the collating sequences used to represent textual data (important for ensuring that things like "strcmp" give the expected results in for a given locale) with how dates, times, and currency are formatted. But the former is now a solved problem: we should just use UTF-8 and Unicode everywhere. And the latter is a lot more general than what's in locale stuff in C and POSIX, and the locale stuff is not flexible enough to accommodate all of that generality. As a result, few people actually use it, preferring instead to use special-purpose libraries for handling these sorts of things. Sure, it's kind of neat that I can `export LC_ALL=fr_FR.UTF-8` and `ls -lh` will show me dates in French and use `,` for the decimal separator, but if I really want to do something in French, I'm not going to rely only on that support, Oui? Non. (For the record, I don't know French.) Anyway, that's my 2c on it: don't call `setlocale()` from CSU, and only call it in programs that actually need to care for some reason. In general, it's a pretty bad interface. --- Mystic BBS v1.12 A48 (Linux/64) * Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101) .