[HN Gopher] White space does matter in C23
       ___________________________________________________________________
        
       White space does matter in C23
        
       Author : ingve
       Score  : 62 points
       Date   : 2024-01-17 06:45 UTC (1 days ago)
        
 (HTM) web link (gustedt.wordpress.com)
 (TXT) w3m dump (gustedt.wordpress.com)
        
       | PaulHoule wrote:
       | ... puts the C in Cthulhu.
        
       | tmtvl wrote:
       | I think most programming languages have syntactically significant
       | whitespace. I believe Fortran doesn't (or didn't), which helped a
       | bug fly under the radar at NASA:                 DO 10 I=1.10
       | 
       | Which got interpreted as:                 DO10I = 1.10
       | 
       | Whereas the programmer wanted:                 DO 10 I=1,10
       | 
       | For a DO loop. Conversely, with SSW a language will evaluate
       | these two expressions differently:                 inta = 10;
       | int a = 10;
        
         | pdw wrote:
         | Algol 60 also allowed whitespace in variable names. But they
         | had a solution to avoid Fortran's confusion: keywords had to be
         | specially marked.
         | https://en.wikipedia.org/wiki/Stropping_(syntax)
        
           | jll29 wrote:
           | White space in variable names is a bad idea.
           | 
           | And not everything that is possible is worth doing; e.g. I
           | once designed a language ("Leazy") where keywords don't have
           | to be declared, and can be used as variable names, just to
           | show I could still write an LL(1) recursive descent parser
           | for it. You don't want that in anything for daily use, as it
           | introduces confusion.
        
             | nerdponx wrote:
             | Meanwhile both SQL (and more recently Python) have tokens
             | that are keywords in certain contexts and regular
             | identifiers in others.
        
               | Findecanor wrote:
               | PL/I was infamous for it being possible to express valid
               | code that read "IF IF THEN THEN ELSE ELSE".
        
               | layer8 wrote:
               | C++ has some contextual keywords as well:
               | final (C++11)       override (C++11)       import (C++20)
               | module (C++20)
        
             | avgcorrection wrote:
             | > White space in variable names is a bad idea.
             | 
             | Pff. You can have your cake and eat it too: disallow
             | whitespace in variable names except no-break space. ;)
        
               | enriquto wrote:
               | And the same thing for filenames!
               | 
               | Writing shell script under the assumption that filenames
               | do not contain spaces is a liberating experience. I want
               | more of that! And it is nearly possible, by tr ' '
               | 0x00A0'ing every call to fopen, (probably as an option
               | for mount).
        
             | rogerbinns wrote:
             | Even more fun is zero length names. In SQLite they didn't
             | require table and column names to be at least one
             | character, so you can do this:                   CREATE
             | TABLE []([] []);
             | 
             | Which will create a table with zero length name containing
             | one column with a zero length name and zero length type.
             | And yes you can do all the regular SQL against them
             | providing you quote the zero length name.
        
             | Someone wrote:
             | > where keywords [...] can be used as variable names
             | 
             | PL/I has that, too, because the designers thought you
             | couldn't expect programmers to know all keywords. For PL/I,
             | that's a correct assumption. Implementations can have
             | hundreds of keywords, and some of them are single-letter
             | (http://bitsavers.trailing-
             | edge.com/pdf/ibm/series1/GC34-0084... pages 19-25 mentions
             | A, B, E, F, P, R, S, V and X)
        
             | layer8 wrote:
             | We'd have long debates about spaces vs. tabs in
             | identifiers. ;)
        
           | formerly_proven wrote:
           | Pfft, just make your syntax a prefix-free code.
        
         | actionfromafar wrote:
         | FORTRAN also had (has?) significant columns:
         | 
         | https://web.stanford.edu/class/me200c/tutorial_77/03_basics....
        
         | pklausler wrote:
         | Fortran '90 and later has some requirements for blanks, but a
         | parser that also needs to be able to parse F'77 can't rely on
         | them -- so I have to go out of the way to detect missing blanks
         | and complain about them.
         | 
         | This feature makes some tokenization ambiguous without context
         | -- is MODULEPROCEDUREFOO to be interpreted as "MODULE
         | PROCEDUREFOO" or "MODULE PROCEDURE FOO"? But tokenization
         | without any reserved words is a tricky problem anyway.
        
       | o11c wrote:
       | Link gives me JSON, not HTML?
       | 
       | The JSON appears to mentions that this is a regression affecting
       | `U"string"` where U is a macro (that expands to a string
       | literal).
       | 
       | Obviously there are numerous examples of where whitespace always
       | mattered even in prior versions.
        
       | omoikane wrote:
       | R"(x)" literals are neat not just because whitespaces matter, but
       | also because they are tokenized before macro expansion. Thus you
       | can write a C23 detector like this:
       | #include<stdio.h>             #define r(R) R"()"             int
       | main()        {           puts(r()[0] ? "C99"  /* r() evaluates
       | to "()" */                       : "C23"  /* r() evaluates to ""
       | */);        }
       | 
       | Output: https://gcc.godbolt.org/z/Wj3s6KEGK
       | 
       | I have used that trick here:
       | 
       | https://www.ioccc.org/years.html#2015_yang
       | 
       | (C23 wasn't a thing back then, but the same trick can be used to
       | differentiate C++11 from C++98).
        
         | silasdavis wrote:
         | And that manages to be the most intelligible part of prog.c
        
         | rwbt wrote:
         | That's clever!
        
         | defen wrote:
         | This is checking for the presence of raw string literals (A GNU
         | C extension) not C23. If you compile with `-std=gnu99` instead
         | of `-std=c99` you'll get "C23" as output.
        
           | Sharlin wrote:
           | The context is different standard versions. Random extensions
           | don't count. C23 has raw string literals, C before 23
           | doesn't.
        
             | ksherlock wrote:
             | > C23 has raw string literals
             | 
             | Are you sure about that? I only see u, u8, U, and L defined
             | as encoding-prefixes.
        
             | defen wrote:
             | No it doesn't. If you don't specify a standard for GCC it
             | uses GNU extensions by default.
        
           | omoikane wrote:
           | My bad, I just saw "R()" in the linked blog and thought the
           | feature made it to C23, but looks like it's not standard.
           | 
           | https://en.cppreference.com/w/c/23
           | 
           | On the plus side, I now have a GNU extension detector.
        
       | complianceowl wrote:
       | White Space Matters
        
       | Whitespace wrote:
       | You're damn straight I do!
        
         | downvotetruth wrote:
         | Not in else case.
        
       | jxy wrote:
       | > Generally, it is often assumed that in C spaces don't
       | contribute much to the interpretation of programming text
       | 
       | I can think of only one exception. In function-like macro
       | definitions, the opening parenthesis `(` must directly follow the
       | identifier. Though I guess the newline is significant in macro
       | definitions in general, too.
       | 
       | Are there other places where white space matters?
        
       ___________________________________________________________________
       (page generated 2024-01-18 23:00 UTC)