[HN Gopher] GCC 13 Supports New C2x Features, Including Nullptr,...
___________________________________________________________________
GCC 13 Supports New C2x Features, Including Nullptr, Enhanced
Enumerations
Author : mikece
Score : 86 points
Date : 2023-05-15 15:06 UTC (7 hours ago)
(HTM) web link (www.infoq.com)
(TXT) w3m dump (www.infoq.com)
| zabzonk wrote:
| nullptr is since c++11
|
| https://en.cppreference.com/w/cpp/language/nullptr
|
| sorry - thought the post was re c++ - my bad
| mananaysiempre wrote:
| In C++. C only copied it in C23 (not yet ratified).
| zabzonk wrote:
| sorry, misread the post
| whatever1 wrote:
| I wonder whether LLMs could help with smarter code optimizations,
| especially, since they can be context aware.
| Gibbon1 wrote:
| We should have a ten year moratorium on optimizations and force
| the compiler maintainers to work on other things.
| cryptonector wrote:
| > This proposal also recommends adoption of Unicode normalization
| form C (NFC) for identifiers to ensure that when compared,
| identifiers intended to be the same will compare as equal. Legacy
| encodings are generally naturally in NFC when converted to
| Unicode. Most tools will, by default, produce NFC tex.
|
| Er, a much better approach is to allow unnormalized Unicode in
| source code and use form-insensitive matching of symbol names so
| that all forms of a symbol are equivalent. This can be done by
| normalizing during the parse, or by implementing form-insensitive
| string comparison and hashing functions that normalize glyph by
| glyph as needed -- the latter can be very fast for all-ASCII and
| mostly-ASCII symbols!
|
| The reason this is a better way is that there's too many places
| that don't produce NFC. For example, HFS+ uses NFD, so if you
| cut-n-paste a file name from HFS+ into other contexts, you'll be
| pasting NFD unless the cut-n-paste system normalizes to NFC.
| Also, while it's true that input modes typically produce NFC,
| it's more that they produce NFC for a small subset of Unicode,
| not that they will normalize other forms seen on input. Using
| form-insensitive string comparison/hashing/matching yields a
| better user experience at not that much implementation cost:
| you're gonna need a Unicode library, and that library will need
| to have normalization support, so you can implement form-
| insensitivity.
| wahern wrote:
| > Er, a much better approach is to allow unnormalized Unicode
| in source code and use form-insensitive matching of symbol
| names so that all forms of a symbol are equivalent.
|
| Linkers will often be blissfully unaware of Unicode or any form
| of localization. This was the impetus for UTF-8, so that the
| bulk of software which is 8-bit clean or which operates on
| opaque, NUL-terminated strings can continue working as-is. This
| can't be changed without breaking backwards ABI compatibility;
| therefore, it's very unlikely to change.
|
| There are countless half-measures that could be taken, but few
| if any are suitable for standardization. If the history of
| software localization is any guide, in the face of strict,
| forward-looking specifications various vendors and ecosystems
| will likely go there own way, with the one sure thing being a
| failure to fully adopt or properly implement the specification.
| cryptonector wrote:
| Yes, the compiler should normalize symbols before writing
| object files, no doubt. I'm talking about the inputs though
| -the source files- which should not have to be normalized.
| quesomaster9000 wrote:
| Nobody else is admiring typed enumerations?
|
| Particularly when using structs this removes a lot of ambiguity
| if you ignore the indirection to find out the underlying type of
| the enum (or encode it in the name hungarian style).
| enum D : uint8_t { A = 0, B = 1,
| C = 2 } typedef struct { D f;
| } __attribute__((packed)) E; assert(sizeof(E)==1);
|
| etc. could make grokking protocol declarations with enums less
| onerous and requiring one less level of indirection.
| maccard wrote:
| I'm a c++ programmer and finding it hard to be excited about
| things we added to the language 12 years ago.
| quesomaster9000 wrote:
| As a D programmer, why haven't you caught up yet?
| maccard wrote:
| I write a reasonable amount of kotlin these days and it's
| night and day.
| ishvanl wrote:
| As a rust programmer... etc etc.
| loeg wrote:
| As a sneering C++ programmer, why are you even reading /
| commenting on a new C standard? This is basically a "if you
| don't have anything nice to say, don't say it" situation.
| tom_ wrote:
| The same goes for C programmers with a chip on their
| shoulder!
|
| There is a downvote button.
| david2ndaccount wrote:
| Clang has had it as an extension for a long time
| jwilk wrote:
| Related:
|
| https://news.ycombinator.com/item?id=35813821 ("New C features in
| GCC 13", 11 days ago, >260 comments)
| twic wrote:
| Who is the audience for new features in C? And who is driving
| stuff through the standardisation process? Is this stuff likely
| to make its way through to embedded toolchains? Or is this for
| people who are maintaining existing codebases?
| nitrix wrote:
| Changes to the Standard usually happens as a result of defect
| reports (confusing details that implementation writers want
| clarity on) or vast enough general adoption (unifying how
| implementations were differently achieving the same thing).
|
| You can read #13 of the Charter https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n2086.htm
|
| As for the audience, it's all the C developers, the open-source
| and commercial compiler implementations, vendors of libraries,
| tooling, services, learning material and everything else built
| in C; which is just innumerable.
|
| Each Standard version released supersedes and obsoletes the
| previous versions. Intentionally, the versions are meant to be
| as backwards compatible as possible so that one can mix and
| match C89/C99/C11 codebases with minimum effort.
|
| C has gained only a handful of features in the last 40 years.
| Compared to the great many things that are improved w.r.t.
| undefined/implementation-specific/unspecified behaviors, or
| removed to keep up with modern times (e.g. Trigraphs, Two's
| Complement integer representations, etc).
|
| I'd say: (1) upgrading is not the spooky thing people make it
| out to be. Go, Rust, they all move much faster than this and
| have very ambitious big design ideas on their mind. (2) It's
| necessary to take good care of C as it, and the things built in
| it, will realistically outlive many of us.
| cozzyd wrote:
| There is plenty of new C code.
| pantalaimon wrote:
| Most embedded toolchains are ARM or RISC-V GCC these days, they
| get all the features.
| hgs3 wrote:
| > Who is the audience for new features in C?
|
| Folks like myself who use C to write system software.
|
| > Is this stuff likely to make its way through to embedded
| toolchains?
|
| Embedded toolchains based on GCC or Clang will presumably see
| these features one day.
| quesomaster9000 wrote:
| The early adopters are usually transpilers (or code
| generators) which can quickly take advantage of new features
| without the effort of rewriting an entire codebase.
|
| In the same way that Rust used underlying `const` attributes
| in LLVM (and found all the weird edge cases), and Nim used C
| as an intermediate as have many other lisp or object-ish
| languages.
| nrclark wrote:
| Yes, I'd expect they will. Most embedded toolchains these days
| are built around GCC. So as GCC grows new features, embedded
| toolchains will get them too.
| cjensen wrote:
| nullptr has been such a Godsend for C++. Good to see it coming to
| C.
|
| If you ever see the macro NULL in code, be afraid. There are two
| valid ways of defining the macro and that cause weird issues when
| porting code. For example, in the statement printf ("%p %s\n",
| NULL, "Hello world!"), one of the definitions leads to NULL being
| interpreted as the null pointer, and the other leads to NULL
| being interpreted as an integer. The latter may crash if integer
| and pointer are different sizes.
|
| It also causes problems with C++ overloading if one overload
| takes a pointer and another takes an integer.
| LordShredda wrote:
| But does C need a nullptr keyword? If you're programming in C,
| you usually define 0 as an invalid value, or a null value. C
| doesn't have the insane type system C++ has and doesn't have a
| very strong need to make a distinction between a pointer or an
| integer, since they're all in the end numbers.
|
| The printf example you gave is an example of garbage in,
| garbage out. If NULL is a macro not defined as a pointer sized
| integer, then you're at fault here.
| mananaysiempre wrote:
| > If NULL is a macro not defined as a pointer sized integer,
| then you're at fault here.
|
| If it was you who wrote stdlib.h, sure; otherwise, if you're
| on a platform where NULL is traditionally defined as 0 and
| not (void *)0, you're stuck. A conformant implementation is
| free to use either definition.
|
| If you want to language-lawyer more heavily, C does not
| require there to be pointer-sized integers (uintptr_t is
| optional), does not require that all zero bytes represent a
| null pointer in memory (unlike for integers), does not
| require that the implementation choose to store an integer
| with value zero as all zero bytes (there may be other valid
| representations), and in any case does not require an
| implementation to do anything reasonable at all if the caller
| passes an integer but a vararg callee looks for a pointer
| (think separate integer and pointer registers).
|
| [I'm not entirely sure if (void *)(void *)0 is a null pointer
| constant (though it's certainly an expression that evaluates
| to a null pointer)--does it count as a zero-valued integer
| constant expression cast to a pointer to void? So you might
| not even be able to use (void *)NULL as a hedge against bad
| platform headers.]
| kazinator wrote:
| You're allowed to do: #undef NULL
| #define NULL ((void *) 0)
|
| Just don't do it prior to the inclusion of any standard
| header (C or POSIX).
| mananaysiempre wrote:
| I don't think you are? Redefining a reserved identifier
| is UB per ISO C (any version) 7.1.3p2, and per 7.1.3p1,
|
| > Each macro name in [the standard library] is reserved
| for use as specified if any of its associated headers is
| included; unless [you're #undef'ing a function also
| provided as a macro].
|
| The general idea seems to be that standard headers are
| allowed to use macros they define, _even in other macros
| they define_ , and because macro names are late-bound
| (ugh), even if the user only redefines the name
| afterwards, every macro that uses it will then be
| affected.
|
| As a silly example, a valid part of stdlib.h could be
| #define NULL 0 #define EXIT_FAILURE (NULL)
| #define EXIT_SUCCESS (EXIT_FAILURE+1)
|
| and now after your redefinitions EXIT_SUCCESS becomes a
| constraint violation.
|
| (For an implementor to actually do this would of course
| be dumb, but you did say "allowed", and that's what the
| standard says here.)
|
| Or did I misunderstand "use as specified"?
| kevin_thibedeau wrote:
| 0 never should have been overloaded in C to refer to the NULL
| pointer. With pointer assignment and comparison it transforms
| to the platform's encoding for NULL which isn't necessarily
| all zeros. No other literal has this sort of magic.
| kazinator wrote:
| The C language existed before there was a preprocessor
| which made it possible to define NULL.
| kevin_thibedeau wrote:
| This has nothing to do with the preprocessor. The concept
| of NULL existed before the macro was standardized.
| Literal zeros were the way to refer to it which was a
| design mistake.
| WalterBright wrote:
| This works in Standard C: int *p = 3;
| Gibbon1 wrote:
| I like to point out that AVR micro's reading and writing
| to address 0 is legit.
|
| On the ARM Cortex I think address 0 is the initial stack
| pointer value.
|
| My opinion is NULL being something special in the
| language is mathy CS academics trying to turn C into a
| mathy abstract language. Which it ain't.
| jcelerier wrote:
| C is officially a mathy abstract language since 1989, the
| compilers just took some time to catch up.
|
| > 2.1.2.3 Program execution
|
| > The semantic descriptions in this Standard describe the
| behavior of an abstract machine [...]
| kevin_thibedeau wrote:
| The address is still 3 which has valid applications. C is
| permissive enough to run on platforms that don't use
| address 0 for NULL. With pointer operations the compiler
| will change the encoding from 0 to that platform's NULL
| address. int *p = 0; intptr_t i =
| (intptr_t)p; if(i == 0) ... // Isn't always
| true
| LegionMammal978 wrote:
| That's a compiler extension. In C17, 6.5.16.1 (Simple
| assignment) implies that the RHS of an assignment to a
| pointer must either have pointer type or be a null
| pointer constant (i.e., an integer constant equal to 0,
| or such a constant casted to pointer type), and 6.7.9
| (Initialization) states that "the same type constraints
| and conversions as for simple assignment apply" to
| expressions used as initializers.
| bluGill wrote:
| Assuming 0 is an invalid value is not always correct. 0 is a
| perfectly valid pointer, and making it impossible to refer to
| that location is bad. Of course if you are not writing an OS
| or embedded system you won't ever have a pointer value of 0
| anyway as the OS can put things elsewhere with no problem (if
| you are you need to see your CPU docs, some CPUs 0 is
| invalid, some it is not).
| kazinator wrote:
| Umm, no. 0 is the null pointer constant, same as nullptr.
| It is not a location, but an abstraction. If a platforms
| null pointer happens to be the address 0xFFFFFFFF, then 0
| will produce that.
|
| There is no difference between char *p =
| nullptr;
|
| and char *q = 0;
|
| other than the variable name; the two have to compare
| equal: (p == q).
|
| What's wrong with 0 is that when it's not in an expression
| where it's being converted to integer type, it's just an
| integer.
| bluGill wrote:
| The problem is if 0 is a valid pointer and I write
|
| volatile int* x=0; x=0x1234;
|
| Did I just deference the null pointer or make a valid
| write to that memory location? There is no way to know
| for sure, you can only apply heuristics to make a guess.
|
| Of course if the lines are that closely spaced you can
| guess, but in real code they can be in different
| translation units.
| roqi wrote:
| > But does C need a nullptr keyword?
|
| Yes, it does.
|
| > If you're programming in C, you usually define 0 as an
| invalid value, or a null value.
|
| That was also the usual pattern in C++ when there was no
| alternative. Once nullptr was introduced in C++, NULL or 0
| quickly became a code smell.
|
| > C doesn't have the insane type system C++ has and doesn't
| have a very strong need to make a distinction between a
| pointer or an integer, since they're all in the end numbers.
|
| C++'s type system is far from insane. It's actually one of
| it's killer features.
|
| You're both entirely oblivious to the need to not conflate
| pointers with integers and failing to present any case in
| favour of the legacy and broken use of NULL, and in the
| process failing to address all family of known error patterns
| involving it.
|
| > The printf example you gave is an example of garbage in,
| garbage out. If NULL is a macro not defined as a pointer
| sized integer, then you're at fault here.
|
| Again, you seem to be completely oblivious to the problem
| domain. NULL is not a macro as far as C or C++ compilers are
| concerned. NULL is a magic constant that's resolved at
| preprocessing time. Replacing NULL with nullptr means a magic
| constant is replaced by a concrete type, and thus whole
| family of errors can be avoided with compile time checks.
| Claiming that the developers who wrote in bugs are at fault
| for inadvertently adding bugs makes no sense at all because
| it does not solve any problem at all, and instead is just
| cynical finger pointing. I take compile-time checks over
| unhelpful finger pointing all day every day.
| colonwqbang wrote:
| NULL is a macro.
|
| The original mistake by the standards committee was
| allowing implicit conversions from integer to pointer. I.e.
| allowing NULL to be defined as simply 0.
|
| If NULL had been defined always as ((void *) 0) then I
| don't see that we would have had a problem.
|
| But that's all history now and in this situation I can see
| that adding nullptr becomes a reasonable way out.
|
| It's ironic though that the fix for the different ways to
| write null is to add yet another way.
| roqi wrote:
| > NULL is a macro.
|
| You're missing the whole point.
|
| As per the C standard, NULL is an implementation-defined
| null Pointer constant.
|
| Macros are resolved in the preprocessing step. The
| compiler does not know what a macro is. What the compiler
| knows is whatever the preprocessor passes off in place of
| the macro. This means the compiler only sees a constant,
| and has no way to tell what that constant means.
|
| If instead of passing random pointer constants you pass
| an actual type, now the compiler can tell more things.
|
| > If NULL had been (...)
|
| Irrelevant. The whole point is that it wasn't the
| committee looked at the problem, and it determined that
| using a dedicated type is safer, more powerful, and more
| elegant than passing magic numbers around.
| LegionMammal978 wrote:
| Well, the fault depends on who "you" are: the NULL macro
| generally comes from one's libc, and allegedly some libc
| maintainers have been very obstinately against changing their
| NULL macros to have pointer type.
| throwway120385 wrote:
| Aren't there platforms where pointers have additional type or
| space information encoded that is orthogonal to the numeric
| address? It's only by convention that NULL == 0 because on
| platforms like Intel & ARM you would typically not use the
| first page. But that's only a convention, and you could just
| as easily put a null page at the top of your address space,
| especially in systems with an MMU where mappings can be
| added, removed, or remapped as-needed.
| mananaysiempre wrote:
| > It's only by convention that NULL == 0 [...] and you
| could just as easily put a null page at the top of your
| address space [...].
|
| Technically NULL == 0 always because the standard special-
| cases zero-valued integer constant expressions;
| (uintptr_t)NULL == 0 or NULL == *(void **)calloc(1,
| sizeof(void *)) is another matter :)
|
| Language lawyering aside, a non-all-zeroes representation
| of NULL will probably blow up most C programs [e.g. static-
| storage-duration initialization is now not the same as
| calloc or memset(,0,) and is even type-specific]. Like
| CHAR_BIT, that's a joint that technically exists but has
| been rusted for decades (pun not intended).
| kazinator wrote:
| There is no problem with static initializations with a
| null pointer that is not all zero bits, or a floating-
| point 0.0 that is not all zero bits.
|
| Those values just cannot participate in the "BSS" trick,
| whereby everything that is zero-initialized is put into a
| special section that doesn't actually exist in the
| program image, and is only provided on startup.
|
| Those values would go into the initialized data section.
|
| The problem with 0.0 or null pointers not being all zero
| bits is all the code that uses calloc or memset zero.
|
| If this is on some specialized platform (e.g. DSP chip),
| it might not matter that vast quantities of C code are
| not portable.
|
| In general, compiler (and to a great extent instruction
| set architecture!) designers are quite hamstrung by the
| expectations of C programmers and programs; that has been
| the situation for some thirty years now.
|
| Today, you could not sucessfully introduce a system in
| which pointers to bytes (void _, char_ ) have a different
| representation from other pointers (let alone different
| size, lord forbid).
| pjmlp wrote:
| If we if ignore ongoing efforts on hardware memory
| tagging since SPARC ADI.
| kazinator wrote:
| * * *
| LegionMammal978 wrote:
| In C, the integer 0 is explicitly defined to convert to a
| null pointer for all assignments, casts, comparisons, etc.,
| regardless of what the pointer's "actual" value is. The
| only time where you can see that a null pointer doesn't
| have numeric value 0 is when you manipulate its object
| representation with memset, memcpy, etc. The compiler is
| also at liberty to return whatever it wants when you
| convert a null pointer to an integer, except that
| converting it back must produce a null pointer (if it's at
| least as wide as intptr_t).
| kazinator wrote:
| The problems are that:
|
| - NULL is idiomatic: using NULL is entrenched in C programming
| and it is not going away.
|
| - In spite of nullptr existing now, NULL is _still_ (quite
| stupidly) not required to just expand to nullptr, but to an
| implementation-defined null pointer constant, rather than
| #define NULL nullptr. (According to the N2596 draft).
|
| - They had over 30 years to tighten the requirements on how
| NULL can be defined; what's the matter? C99 could already have
| required NULL to be ((void *) X) where X is an integer-typed
| constant expression evaluating to zero.
|
| I'm not going to start using nullptr. It's not idiomatic C. I'm
| going to hold out hope that NULL will be fixed so that it
| expands to nullptr.
|
| --
|
| Also, it's possible for a compiler to diagnose when a constant,
| zero-valued expression is used as the argument of a variadic
| function. The diagnostic can be confined to cases when such a
| constant expression is the result of macro expansion:
| printf_like_function("fmt ...", ... 0, ...); // OK
| #define FOO 0 printf_like_function("fmt ...", ...
| FOO, ...); // compiler diagnostic
| printf_like_function("fmt ...", ... (int) FOO, ...); // OK
| cogman10 wrote:
| > In spite of nullptr existing now, NULL is still (quite
| stupidly) not required to just expand to nullptr, but to an
| implementation-defined null pointer constant, rather than
| #define NULL nullptr. (According to the N2596 draft).
|
| This is so silly. I sort of get why not (can't break the dork
| that decided to do int i = NULL;
| i++;
|
| )
|
| But, at the same time... I almost feel like this is a "you
| are being a dork, go fix your code." moment. This isn't the
| sort of break where someone would see it and go "Oh yeah,
| assuming NULL is anything other than nullptr is dumb!"
| kazinator wrote:
| > _can 't break the dork that decided to_
|
| Why not? We've broken the dork who used undeclared
| functions, void main, gets ...
|
| (It's the same funking dork anyway. You know who you are,
| I'm looking at you!)
|
| Note that int x = ((void *) 0);
|
| will actually work in GCC and get you a zero into x, just
| with a conversion warning. The dork is unaffected; their
| code works and they don't read warnings.
| bigbillheck wrote:
| > NULL is idiomatic
|
| It is today, but idioms are a human concept and who knows how
| things will be in 2033?
| kllrnohj wrote:
| Well hopefully at some point we'll stop writing C at all
| pjmlp wrote:
| Once upon a time K&R C function declarations were idiomatic,
| in C23 they are out.
| ori_b wrote:
| > _If you ever see the macro NULL in code, be afraid. There are
| two valid ways of defining the macro_
|
| Not on a Posix system, where the only valid definition of it is
| `(void*)0`. C could have adopted this definition.
|
| Nullptr is needed in C++ because `0` is the only definition of
| `NULL` that works with the type system, due to the lack of
| implicit `void*` conversions.
|
| C doesn't have this problem.
|
| Adopting the Posix definition of NULL in the standard would
| have been sufficient -- and unlike `nullptr`, would have solved
| bugs in existing programs.
| blackpill0w wrote:
| >C doesn't have this problem
|
| I don't think a strong type system is a `problem`, implicit
| conversions can lead to so many annoying and hard to find
| bugs.
| ori_b wrote:
| That's a fine general sentiment. However, in this context
| it's a problem if you want to assign NULL to a pointer
| without a cast, which is why C++ added the magically
| convertible nullptr in addition to the magically
| convertible `0` constant. char *x = 0;
| // ok in C and C++ char *y = (void*)0; // ok in C,
| error in C++ char *z = nullptr; // ok in C++
|
| therefore: #define NULL ((void*)0) //
| Required by Posix C, invalid C++ #define NULL 0 //
| Pre-nullptr, the only valid C++ definition
|
| C++ can't define NULL the safe way that Posix C does.
|
| I don't understand why it's more acceptable to allow magic
| `0` conversions than magic `(void*)0` conversions, given
| that the latter is far less likely to happen by accident --
| but here we are.
| rightbyte wrote:
| > I don't understand why it's more acceptable to allow
| magic `0` conversions than magic `(void*)0` conversions
|
| In the end you don't have to chose between '0' and
| 'nullptr' anyways. char *x = (decltype
| (nullptr)) 0;
| kzrdude wrote:
| Case in point, integer arithmetic in C. Reasoning about
| types there is just tiring.
| Dylan16807 wrote:
| They didn't say the type system is a problem, they said it
| caused a problem.
| josefx wrote:
| > The latter may crash if integer and pointer are different
| sizes.
|
| Apparently some compilers specified a special __null extension
| to handle that case before nullptr was a thing.
| kazinator wrote:
| Since NULL expands to an implementation-defined null pointer
| constant, it is valid for those implementations to go as far
| as #define NULL __null.
| anonymousDan wrote:
| Are there any good resources on writing compiler
| optimization/instrumentation passes in gcc (as opposed to LLVM)?
___________________________________________________________________
(page generated 2023-05-15 23:00 UTC)