hngopher.com

       [HN Gopher] Embed is in C23
       ___________________________________________________________________
        
       Embed is in C23
        
       Author : aw1621107
       Score  : 370 points
       Date   : 2022-07-23 10:09 UTC (12 hours ago)
        
 (HTM) web link (thephd.dev)
 (TXT) w3m dump (thephd.dev)
        
       | pwdisswordfish9 wrote:
       | > "Touch grass", some people liked to tell me. "Go outside", they
       | said (like I would in the middle of Yet Another Gotdang COVID-19
       | Spike). My dude, I've literally gotten company snail mail, and it
       | wasn't a legal notice or some shenanigans like that! Holy cow,
       | real paper in a real envelope, shipped through German Speed
       | Mail!! This letter alone probably increases my Boomer Cred(tm) by
       | at least 50; who needs Outside anymore after something like this?
       | 
       | Touch grass indeed. Sure, #embed is a nice feature, but this
       | self-indulgent writing style I can't stand.
        
         | morelisp wrote:
         | Maidenless commenter.
        
       | mgaunard wrote:
       | It takes literally 5 minutes to write a python script that does
       | this.
       | 
       | It took a long time to get this adopted because people are most
       | likely busy with things that cannot be already solved trivially.
        
         | jjnoakes wrote:
         | > trivially
         | 
         | The article covers quite a few reasons why the way things are
         | done without #embed are not quite as trivial as they seem.
        
           | mgaunard wrote:
           | I've been doing it for 20 years without any single issue, on
           | fairly large files.
           | 
           | This proposal doesn't even allow to compress or encrypt the
           | data.
        
             | jjnoakes wrote:
             | And yet the issues in the article are real and non-trivial.
             | Perhaps you just never hit them, or you have a high
             | tolerance for non-trivial solutions.
        
         | jonathrg wrote:
         | I think it's nice that this will soon be possible to do without
         | adding Python as a dependency in your build system
        
       | [deleted]
        
       | yakubin wrote:
       | _> told me this form was non-ideal and it was worth voting
       | against (and that they'd want the pure, beautiful C++ version
       | only[1])_
       | 
       | I heard about #embed, but I didn't hear about std::embed before.
       | After looking at the proposal, to me it does look a lot better
       | than #embed, because reading binary data and converting it to
       | text, only to then convert it to binary again seems needlessly
       | complex and wasteful. I also don't like that it extends the
       | preprocessor, when IMHO the preprocessor should at worst be left
       | as is, and at best be slowly deprecated in favour of features
       | which compose well with C proper.
       | 
       | Going beyond the gut reaction and moving on to hard data, as you
       | can expect from this design, std::embed of course is faster
       | during compilation than #embed for bigger files (comparable for
       | moderately-sized files, and a bit slower for tiny files).
       | 
       | I'm not a huge fan of C++, but the fact that C++ removed
       | trigraphs in C++17 and that it's generally adding features
       | replacing the preprocessor scores a point with me.
       | 
       | [1]: <https://www.open-
       | std.org/jtc1/sc22/wg21/docs/papers/2020/p10...>
        
         | mgaunard wrote:
         | the preprocessor is a great tool to reduce duplication and
         | boilerplate.
         | 
         | People that don't like it generally just don't know how to use
         | it.
        
           | timhh wrote:
           | People don't dislike it because they are unaware how helpful
           | it can be. They dislike it because they are aware how hacky,
           | fragile and error-prone it is. They want something more
           | robust than text substitution.
        
           | LadyCailin wrote:
           | Or perhaps those people can think of better ways to get those
           | benefits that also don't allow things like
           | #ifndef asdf         }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
           | #endif
           | 
           | which obliterate tooling such as IDEs. Of course, this is a
           | contrived example, but the preprocessor is just one big
           | footgun, which offers no benefits over other ways of solving
           | the problems you mentioned, such as constexpr and perhaps
           | additional, currently unimplemented solutions.
        
             | mgaunard wrote:
             | It being possible to misuse a tool does not mean the tool
             | is not very useful.
        
               | KerrAvon wrote:
               | A tool being very useful doesn't mean that it is a very
               | good tool.
               | 
               | There are better tools for the functionality the C
               | preprocessor attempts to provide. Other languages have
               | module inclusion systems and very powerful macros that
               | don't have the enormous footguns of the C preprocessor.
               | 
               | Edit: to be clear, I think #embed is a fine idea; I'd use
               | it and it would make my sourcebase cleaner in some
               | places.
        
               | makapuf wrote:
               | My carpenter has a lot of tools that can be dangerous if
               | misused. Of course better tools can be devised, but
               | useful things have been done with them (and he still has
               | all his fingers)
        
               | LadyCailin wrote:
               | Yes, but we're in a thread about ways to improve the
               | language, not about how to make the best with what's
               | there. This type of argument holds back improvement.
        
         | colonwqbang wrote:
         | Compilers follow the "as if" principle, they don't have to
         | literally follow the formal rules given by the standard. They
         | could implement #embed by doing as you say, pretty printing out
         | numbers and then parsing them back in again. But that would be
         | an extremely roundabout way to do it, so I doubt anyone will
         | actually do it that way. Unless you're running the compiler in
         | some kind of debugging mode like GCC's -E.
        
         | twoodfin wrote:
         | I don't think the implication is that the C compiler _must_
         | encode the binary file as a comma-separated integer list and
         | then re-parse it, only _act_ as if it did so.
        
           | yakubin wrote:
           | How would that work? It would need to depend on the grammar
           | of surrounding C code. This directive isn't limited to
           | variable initialisers. You can use it anywhere. So e.g. you
           | can use it inside structure declaration, or between "int
           | main()" and "{". etc. etc. Those will generate errors in
           | subsequent phases, but during preprocessing the compiler
           | doesn't know about it. Then there is also just that:
           | int main () {           return       #embed "file.bin"
           | ;       }
           | 
           | There are plenty of cases, where it will all behave
           | differently. And if you're going to pretend even more that
           | the preprocessor understands C syntax, then why not just give
           | this job to compiler proper, which actually understands it?
        
             | defen wrote:
             | Preprocessing produces a series of tokens, so you would
             | implement it as a new type of token. If you're using
             | something like `-E` you would just pretty-print it as a
             | comma-delimited list of integers. If you're moving on to
             | translation phase 7, you'd have some sort of rules in your
             | parser about where those kinds of tokens can appear . Just
             | like you can't have a return token in an initializer, you
             | wouldn't be allowed to have an embed token outside of one
             | (or whatever the rules are). And you can directly
             | instantiate some kind of node that contains the binary
             | data.
        
               | yakubin wrote:
               | _> you would implement it as a new type of token_
               | 
               | That's a good point. I consider myself debunked.
        
             | [deleted]
        
             | [deleted]
        
       | Cloudef wrote:
       | I've always used xxd -i for embedding, doesn't have the mentioned
       | problems and works everywhere, as it simply outputs a header file
       | with byte array.
        
         | jsnell wrote:
         | The article spends a fair bit of time discussing the build
         | speed and memory use problems with that approach. Like, the
         | benchmark results [0] linked to from this post literally have
         | xxd as one of the rows. It's not a viable option for embedding
         | megabytes of data.
         | 
         | [0] https://thephd.dev/embed-the-details#results
        
           | tempodox wrote:
           | And even if the data is small enough, not every C programmer
           | uses Unix or knows their way around it.
        
         | mikepurvis wrote:
         | But you have to have build system stuff for that and it's
         | obviously non portable.
        
           | Cloudef wrote:
           | True, i dont personally ever have problem with this because i
           | always compile from unix system anyways. (Even for windows)
        
         | pdw wrote:
         | Well, congratulations, you now have a build dependency on Vim.
         | (xxd is not a standard tool, it ships with Vim.)
         | 
         | It's also only suitable for tiny files: compile time and RAM
         | requirements will blow up once you go beyond a couple of
         | megabytes.
        
           | mariusor wrote:
           | > It's also only suitable for tiny files: compile time and
           | RAM requirements will blow up once you go beyond a couple of
           | megabytes.
           | 
           | Do you know what makes it so? Is there a technical argument
           | why the compiler could do better, except maybe for xxd not
           | being specifically optimized for this use case?
        
             | tialaramex wrote:
             | The compiler has an as-if rule on its side.
             | 
             | It's allowed to do whatever it wants so long as the results
             | are _as-if_ it did what the standard says. So even though
             | the standard says this is making a big list of integers
             | like your xxd command, the compiler won 't do that, because
             | (as a C compiler) it knows perfectly well it would just
             | parse those integers into bytes again, just like the ones
             | it got out of the binary file. It knows the integers would
             | all be valid (it made them) and fit in a byte (duh) and so
             | it can skip the entire back-and-forth.
        
           | Cloudef wrote:
           | Yeah, these are reasonable arguments against it
        
             | nicoburns wrote:
             | The one in the original article I that it performs badly
             | for large files.
        
               | yakubin wrote:
               | That's putting it mildly: <https://www.open-
               | std.org/jtc1/sc22/wg14/www/docs/n3017.htm#i...>
        
       | sylware wrote:
       | There is way too much in C already.
       | 
       | The first commandment of C is: 'writing a naive C compiler should
       | be "reasonable" for a small team or even one individual'. That's
       | getting harder and harder, longer and longer.
       | 
       | I did move from C being "the best compromise" to "the less worse
       | compromise".
       | 
       | I wish we had a "C-like" language, which would kind of be a high-
       | level assembler which: has no integer promotion or implicit
       | casts, has compile-time/runtime casts (without the horrible c++
       | syntax), has sized primitive types (u64/s64,f32/f64,etc) at its
       | core, has sized literals (42b,12w,123dw,2qw,etc), has no
       | typedef/generic/volatile/restrict/etc well that sort of horrible
       | things, has compile-time and runtime "const"s, and I am
       | forgetting a lot.
       | 
       | From the main issues: the kernel gcc C dialect (roughly speaking,
       | each linux release uses more gcc extensions). Aggressive
       | optimizations can break some code (while programing some hardware
       | for instance).
       | 
       | Maybe I should write assembly, expect RISC-V to be a success, and
       | forget about all of this.
        
         | armchairhacker wrote:
         | I wish we had something like typed Lua without Lua's weird
         | quirks (e.g. indexing by 1), designed with performance
         | enhancement and and safety in mind, and with the features you
         | mention.
         | 
         | But like Lua, the base compiler is really small and simple and
         | can be embedded. And it's "pseudo-interpreted": ultimately it's
         | an ahead-of-time language to support things like function
         | declarations after references and proper type checking, but
         | compiling unoptimized is practically instant and you can load
         | new sources at runtime, start a REPL, and do everything else
         | you can with an interpreted language. Now having a simple
         | compiler with all these features may be impossible, so worse-
         | case there is just a simple interpreter, a separate type-
         | checker, and a separate performance-optimized JIT compiler
         | (like Lua and LuaJIT).
         | 
         | Also like Lua and high-level assembly, debugging unoptimized is
         | also really simple and direct. By default, there aren't
         | optimizations which elide variables, move instructions around,
         | and otherwise clobber the data so the debugger loses
         | information, not even tail-call optimization. Execution is so
         | simple someone will create a reliable record-replay, time-
         | travel debugger which is fast enough you could run it in
         | production, and we can have true in-depth debugging.
         | 
         | Now that i've wrote all that I realize this is basically ML.
         | But oCaml still has weird quirks (the object system), SML too
         | honestly, and I doubt their compilers are small and simple
         | enough to be embedded. So maybe a modern ML dialect with a few
         | new features and none of the more confusing things which are in
         | standard ML.
        
           | elcritch wrote:
           | Checkout Nim! It does much of what you describe and its
           | great. The core language is fairly small (not quite lua
           | simple but probably ML comparable). It compiles fast enough
           | that a Nim repl like `inim` is useable to check features and
           | for basic maths, though it requires a C compiler, but TCC [4]
           | works perfectly. Essentially Nim + tcc is pretty close to
           | your description, IMHO. Though I'm not sure TCC supports
           | non-x86 targets.
           | 
           | I've never used it but Nim does support some hot reloading as
           | well [3]. It also has a real VM if you want to run user
           | scripts and has a nice library for it [1]. Its not quite Lua
           | flexible but for a generally compiled language its
           | impressive.
           | 
           | Recently I made a wrapper to embed access to the Nim
           | compilers macros at runtime [2]. It took 3-4 hours probably
           | and still compiles in 10s of seconds despite building in a
           | fair bit of the compiler! It was useful for making a code
           | generator for a serializer format. Though I'm not sure its
           | small enough to live on even beefy m4/m7 microcontrollers.
           | Though I'm tempted to try.
           | 
           | 1: https://github.com/beef331/nimscripter 2: https://github.c
           | om/elcritch/cdecl/blob/main/src/cdecl/compil... 3:
           | https://nim-lang.org/docs/hcr.html 4:
           | https://bellard.org/tcc/
        
         | agluszak wrote:
         | > I wish we had a "C-like" language, which would... How about
         | https://ziglang.org/ ?
        
           | [deleted]
        
         | jcranmer wrote:
         | > I wish we had a "C-like" language, which would kind of be a
         | high-level assembler which: has no integer promotion or
         | implicit casts, has compile-time/runtime casts (without the
         | horrible c++ syntax), has sized primitive types
         | (u64/s64,f32/f64,etc) at its core, has sized literals
         | (42b,12w,123dw,2qw,etc), has no
         | typedef/generic/volatile/restrict/etc well that sort of
         | horrible things, has compile-time and runtime "const"s, and I
         | am forgetting a lot.
         | 
         | Unsafe Rust code I think fits this model better than C does: it
         | relies on sized primitive types, it has support for both
         | wrapping and non-wrapping arithmetic rather than C's quite
         | frankly odd rules here, it has no automatic implicit casts, it
         | has no strict aliasing rules.
        
         | ArrayBoundCheck wrote:
         | Are you a programmer? Embed is the easiest feature to implement
         | that I have ever heard
        
         | AlexanderDhoore wrote:
         | GCC or Clang with all warnings turned on will give you almost
         | what you want. -Wconversion -Wdouble-promotion and 100s of
         | others. A good way to learn about warning flags (apart from
         | reading the docs) is Clang -Weverything, which will give you
         | many, many warnings.
        
         | quelsolaar wrote:
         | I agree (with a lot of caveats), but a key value of C is that
         | we do not break peoples code and that means that we cant easily
         | remove things. If we do, we create a lot of problems. This
         | makes it very difficult to keep the language as easy to
         | implement as we would like. As a member of the WG14, I intend
         | to propose that we do make this our prime priority going
         | forward.
        
         | jessermeyer wrote:
         | Not an exact match, but a close one: https://odin-lang.org/
        
       | trinovantes wrote:
       | Off the top of my head, I think there's some niche use in
       | embedding shaders so that they don't need to be stored as strings
       | (no IDE support) or read at runtime (slower performance).
        
         | matthews2 wrote:
         | Nice for binary shaders too, e.g. SPIR-V bytecode generated by
         | glslc.
        
         | phoboslab wrote:
         | You can get some IDE support with a simple preprocessor
         | macro[1].
         | 
         | It's a crutch, but at least you don't need to stuff the shader
         | into multiple "strings" or have string continuations (\\) at
         | the end of every line. Plus you get some syntax highlighting
         | from the embedding language. I.e. the shader is highlighted as
         | C code, which for the most part seems to be close enough.
         | 
         | [1]
         | https://github.com/phoboslab/pl_mpeg/blob/master/pl_mpeg_pla...
        
         | avianes wrote:
         | Another typical use is embedding a public-key in an application
         | or firmware.
        
         | spicyjpeg wrote:
         | There are a lot of use cases for baking binary data directly
         | into the program, especially in embedded applications. For
         | instance, if you are writing a bootloader for a device that has
         | some kind of display you might want to include a splash screen,
         | or a font to be able to show error messages before a filesystem
         | or an external storage medium is initialized. Similarly, on a
         | microcontroller with no external storage at all you need to
         | embed all your assets into the binary; the current way to do
         | that is to either use whatever non-standard tools the
         | manufacturer's proprietary toolchain provides, or to use xxd to
         | (inefficiently) generate a huge C source file from the contents
         | of the binary file. Both require custom build steps and neither
         | is ideal.
        
       | eps wrote:
       | Re: #embed </dev/urandom>
       | 
       | Just a random thought, but I'd expect a compiler to do exactly
       | what's described if I tell it:                   static char
       | foo[123] = {             #embed </dev/urandom>         };
       | 
       | This would address the most common case with infinity files, and
       | then just let the compiler error out if the array size is not
       | specified.
        
         | ealexhudson wrote:
         | The preprocessor needs to run before the compiler, though, and
         | isn't complex enough to understand the context of the code that
         | it's in. That would be a substantially complex thing to
         | implement.
        
           | eps wrote:
           | This will indeed require delaying population of the array to
           | the compilation stage. However it's worth the convenience and
           | the succinctness of the syntax, and it's not _that_
           | substantially complex to implement.
        
         | tazjin wrote:
         | Reproducible builds crowd wailing in agony
        
           | huhtenberg wrote:
           | The context is that of infinite files, not of the urandom
           | specifically. Give the linked post a read for details.
        
           | astrange wrote:
           | They just need a reproducible urandom.
        
             | elcritch wrote:
             | Looks like you forgot to add your sponsor disclaimer
             | "message sponsored by NSA". ;)
        
         | kibwen wrote:
         | I would expect that to produce an error, though. If I had a
         | regular file that was not infinite in size, and I specified the
         | wrong length for the array, I would find it more useful to have
         | the compiler inform me as to the discrepancy rather than
         | truncate my file.
        
           | eps wrote:
           | A warning, not an error. Both under and over-population can
           | be valid use cases.
        
       | owalt wrote:
       | Honestly I'm usually very wary of additions to C, as one of its
       | greatest strengths (to me) is how rather straightforward it is as
       | a language in terms of conceptual simplicity. There just aren't
       | that many big concepts to understand in the language. (On the
       | other hand there's _many_ footguns but that's another issue.)
       | 
       | That said, to me this seems like a great addition to the
       | language. It's very single-purpose in its usage (so it doesn't
       | seem to add much conceptual complexity to the language) and it
       | replaces something genuinely painful (arcane linker hacks). I'm
       | very much looking forward to using this as I often make single-
       | executable programs in C. The only thing that's unfortunate is
       | I'm sure it'll take decades before proprietary embedded
       | toolchains add support for this.
        
         | pjmlp wrote:
         | C23 and C26 are basically heading into C++ without classes.
        
       | timhh wrote:
       | Ha I suggested this on the C++ proposals mailing list 7 years
       | ago:
       | 
       | https://groups.google.com/a/isocpp.org/g/std-proposals/c/b6n...
       | 
       | Enjoy the naysayers if you like! I'm glad someone spent the time
       | and effort to push past them. Bit too late for me - I have moved
       | on to Rust which had support for this from version 1.0.0.
       | 
       | > There's also the standard *nix/BSD utility "xxd".
       | 
       | > Seems like the niche is filled. Or, at least, if you want to
       | claim that
       | 
       | > (A) XPM
       | 
       | > (B) incbin
       | 
       | > (C) "xxd -i"
       | 
       | > (D) various ad-hoc scripts given in
       | http://stackoverflow.com/questions/8707183/script-tool-to-co...
       | 
       | >...do NOT completely fill this evolutionary niche
       | 
       | > This ultimately would encourage a weird sort of resource
       | management philosophy that I think might be damaging in the long
       | run.
       | 
       | > Speaking from experience, it is a tremendously bad idea to bake
       | any resource into a binary.
       | 
       | > I'll point out that this is a non-issue for Qt applications
       | that can simply use Qt's resources for this sort of business.
       | 
       | (Though credit to Matthew Woehlke, he did point out a solution
       | which is basically identical to #embed)
       | 
       | > I find this useless specially in embedded environments since
       | there should be some processing of the binary data anyway, either
       | before building the application
       | 
       | In fairness there was a decent amount of support. But given the
       | insane amount of negativity around an obviously useful feature I
       | gave up.
       | 
       | I wonder if there was a similar response to the proposal to
       | include `string::starts_with()`...
        
         | einpoklum wrote:
         | > > Speaking from experience, it is a tremendously bad idea to
         | bake any resource into a binary.
         | 
         | What a pompous douche whoever wrote that was.
         | 
         | > > This ultimately would encourage a weird sort of resource
         | management philosophy that I think might be damaging in the
         | long run.
         | 
         | So, this might be a valid point, although not enough to reject
         | the feature for. It true that it's a feature that could
         | potentially see over-use and ab-use. But then, so did templates
         | :-P
        
         | boywitharupee wrote:
         | what is the Rust equivalent for #embed?
        
           | guipsp wrote:
           | https://doc.rust-lang.org/std/macro.include_bytes.html
        
             | dbrgn wrote:
             | And there's also https://doc.rust-
             | lang.org/std/macro.include_str.html for strings.
        
           | zRedShift wrote:
           | https://doc.rust-lang.org/std/macro.include_bytes.html
        
       | juunpp wrote:
       | Looks great. I've been writing cmake hacks to include assets in
       | executables for too long.
        
       | zzo38computer wrote:
       | It is something that I had wanted in C too, for a while, so I am
       | glad that they added this #embed command.
        
       | ascar wrote:
       | This is a cool feature, but the author doesn't do himself any
       | favors with his style of writing that greatly overestimates the
       | importance of his own feature in the great scheme of things.
       | Remarks like "an extension that should've existed 40-50 years
       | ago" make me think, if we should've really bothered all compiler
       | vendors with implementing this 40-50 years ago. After all, you
       | can already a) directly put your binary data in the source file
       | like shown after the preprocessor step and b) read a file at
       | runtime. I'm not saying this isn't useful, but it's a rather
       | niche performance improvement than a core language feature.
        
       | someweirdperson wrote:
       | How to read unsigned data? Is there a stadardized parameter, or
       | does this require a vendor extension?
        
         | jonathrg wrote:
         | You just make your array type uint8_t or whatever you need as
         | long as it supports integer literals. See section 4 in
         | https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#i...
        
           | someweirdperson wrote:
           | Scary, it's as if the preprocessor has become type-aware. I
           | guess I better don't imagine the result of the preprocessing
           | to look similar to and following the same rules as something
           | I would have written by hand. This might make manual
           | inspection of the preprocessed file a bit painful.
        
             | elcritch wrote:
             | Its not really a pre-processor stage. Probably better to
             | think of it more like a pointer cast to some binary blob.
             | Though it'd be interesting to see what `gcc -E` would
             | produce.
        
       | buserror wrote:
       | Haven't the people making the standards other things to do, like,
       | integrating _useful features_ instead of duplicating incbin.h [0]
       | years after that feature worked?
       | 
       | https://github.com/graphitemaster/incbin/blob/main/incbin.h
        
         | Jasper_ wrote:
         | That doesn't work on MSVC without an external source-generating
         | tool.
        
       | iasay wrote:
       | This is wrong.
       | 
       | It belongs in the linker and you can pull the symbol it creates
       | in with extern. I've been doing this for about 25 years.
        
         | dafelst wrote:
         | ...and now your solution is non-portable and as a cross-
         | platform developer you need to implement N different build
         | scripts. This is far more elegant.
        
           | iasay wrote:
           | No C toolchain is portable!
           | 
           | If that's a problem, use Go or another higher level language.
        
             | xigoi wrote:
             | Isn't one of the supposed advantages of C that it works on
             | many platforms?
        
       | thraw_oway wrote:
       | Personally I feel the C committee should've disbanded after the
       | first standard (the C++ one, after the 2003 technical
       | corrigendum). I didn't mind C99 much, but it looks like
       | C(++)reeping featuritis is a nasty habit.
       | 
       | These gratuitous standards prompt newbies to use the new features
       | (it's "modern") and puzzled veterans to keep up and reinternalize
       | understanding of new variants of languages they've been using for
       | decades. There's no real improvement, just churn. Possibly it's
       | one of the instruments of ageism. More incompatibility with
       | existing software and existing programmers.
        
       | tempodox wrote:
       | > Even among people who control all the cards, they are in many
       | respects fundamentally incapable of imagining a better world or
       | seizing on that opportunity to try and create one, let alone
       | doing so in a timely fashion.
       | 
       | That does sound soul-crushing. Congrats on this achievement!
        
         | quelsolaar wrote:
         | This is simply wrong. We (the ISO wg14) don't hold the cards,
         | compilers are free to implement what ever they want, users are
         | free to use what ever tools or languages they want.
         | 
         | We exist only as long as we are trusted to be good stewards,
         | and only go forward with the consensus of the wider community.
        
           | unreal37 wrote:
           | You're both right.
           | 
           | It's amazing that you and the ISO team are good stewards of
           | the C standard. Thank you for being part of that.
           | 
           | And it can also be true that it was "hell" and "hardly worth
           | it" for the OP to get a new feature added to the language. I
           | believe it was a miserable experience that has him
           | questioning how he spends his time.
           | 
           | Both can be true. Thank you for your efforts. And thank the
           | OP for his efforts too.
        
           | gtirloni wrote:
           | _> (the ISO wg14) don 't hold the cards_
           | 
           | That "standard" card seem to be a pretty huge one though.
        
             | quelsolaar wrote:
             | Yeah, but a document doesn't compile c code so best to stay
             | humble. :-)
        
           | morelisp wrote:
           | > > Even among people who control all the cards, they are in
           | many respects fundamentally incapable of imagining a better
           | world or seizing on that opportunity to try and create one,
           | let alone doing so in a timely fashion.
           | 
           | > This is simply wrong. We (the ISO wg14) don't hold the
           | cards, compilers are free to implement what ever they want,
           | users are free to use what ever tools or languages they want.
           | 
           | This is an incredibly oblivious realization of JeanHeyd's
           | point.
        
         | moffkalast wrote:
         | I think in our reality the prerequisite for holding all the
         | cards is the lack of competence in knowing how to improve the
         | world. We've gotten where we are now through sheer force of
         | will of those that are empty handed.
        
           | jeffreygoesto wrote:
           | The reasonable man adapts himself to the world: the
           | unreasonable one persists in trying to adapt the world to
           | himself. Therefore all progress depends on the unreasonable
           | man.
           | 
           | George Bernard Shaw
        
           | [deleted]
        
       | kibwen wrote:
       | This serves the same use as Rust's `include_bytes!` macro, right?
       | Presumably most people just use this feature as a way to avoid
       | having to stuff binary data into a massive array literal, but in
       | our case it's essential because we're actually using it to stuff
       | binaries from earlier in our build step into a binary built later
       | in the build step. Not something you often need, but very handy
       | when you do.
        
         | kzrdude wrote:
         | for both Rust and C, these features "just" make something you
         | could otherwise do with the build system and generated code
         | easier, I think.
        
           | masklinn wrote:
           | As the article quotes, in C the lack of standardisation makes
           | this tricky when you want to support more than one compiler,
           | or even when you want to support just one compiler (cf email
           | about the hacks to make it work on GCC with PIE).
        
         | tialaramex wrote:
         | This has different affordances than std::include_bytes! but I
         | agree that if you were writing Rust and had this problem you'd
         | reach for std::include_bytes! and probably not instead think
         | "We should have an equivalent of #embed".
         | 
         | include_bytes! gives you a &'static [u8; N] which for non-Rust
         | programmers means we're making a fixed size array (the size of
         | your file) full of unsigned 8-bit integers (ie bytes) which
         | lives for the life of the program, and we get an immutable
         | reference to it. Rust's arrays know how big they are (so we can
         | ask, now or later) but cannot grow.
         | 
         | #embed gets you a bunch of integers. The as-if rule means your
         | compiler is likely to notice if what you're actually doing is
         | putting those integers into an array of unsigned 8-bit integers
         | and just stick all the file bytes in the array, short cutting
         | what you wrote, but you could reasonably do other things,
         | especially with smaller files.
        
       | quelsolaar wrote:
       | I represent Sweden in the ISO WG14, and I voted for the inclusion
       | of Embed in to C23. Its a good feature. But its not a necessary
       | feature and I think JeanHeyd is wrong in his criticism of the
       | pace of wg14 work. I have found everyone in wg14 to be very
       | hardworking and serious about their work.
       | 
       | Cs main strengthen is its portability and simplicity. Therefore
       | we should be very conservative, and not add anything quickly.
       | There are plenty of languages to choose form if you want a
       | "modern" language with lots of conveniences. If you want a truly
       | portable language there is really only C. And when I say truly, I
       | mean for platforms without file systems, or operating systems or
       | where bytes aren't 8 bits, that doesn't use ASCI or Unicode,
       | where NULL isn't on address 0 and so on.
       | 
       | We are the stewards of this, and the work we put in, while large,
       | is tiny compared to the impact we have. Any change we makes,
       | needs to be addressed by every compiler maintainer. There are
       | millions of lines of code that depend on every part of the
       | standard. A 1% performance loss is millions of tons of CO2
       | released, and billions in added hardware and energy costs.
       | 
       | In this privileged position, we have to be very mindful of the
       | concerns of our users, and take the time too look at every corner
       | case in detail before adding any new features. If we add
       | something, then people will depend on its behavior, no matter how
       | bad, and we therefor will have great difficulty in fixing it in
       | the future without breaking our users work, so we have to get it
       | right the first time.
        
         | blippage wrote:
         | Seems like a nice addition. Much better than futzing around
         | with xxd and suchlike.
        
         | pif wrote:
         | Thank you for your post!
         | 
         | Thank you especially for reminding everybody that programming
         | is much more than web programming and information systems.
        
           | quelsolaar wrote:
           | Thank you,
           | 
           | Its also worth remembering that a lot of higher level
           | languages have runtimes / VMs are implemented in C. Web
           | applications rely heavily on databases, java script VM,
           | network-stacks, system calls and operating system features,
           | all of which are impemented in C.
           | 
           | If you are a software developer and want to do something
           | about climate change, consider becomming a compiler engineer.
           | If you manage to get a couple of tenths of a percent
           | performance increase in one of the big compilers during your
           | career, you will have materially impacted global warming.
           | Compiler engineers are the unsung heroes of software
           | engineering.
        
             | ErikCorry wrote:
             | No JavaScript VM is implemented in C. They are all written
             | in a language that's a bit like C++ but has no exceptions
             | and relies on lots of compiler behaviour that is not
             | defined by the C++ standard.
        
               | woodruffw wrote:
               | Hm? I can think of two pure C JS engines off the top of
               | my head: Duktape and Elk. I believe Samsung or another
               | vendor also has their own; they're all somewhat common in
               | the embedded space.
        
               | ErikCorry wrote:
               | Fair. I'm not familiar with the tiny JS VMs. But really
               | the main point stands: It's not possible to build a
               | decent GC without violating strict aliasing so C and C++
               | as standardized are not suitable for this task.
        
               | xyzzy_plugh wrote:
               | I guess this doesn't exist then:
               | https://bellard.org/quickjs/
        
               | astrange wrote:
               | This is not written in C if it doesn't pass
               | UBSan/ASan/Frama-C and co. It's written in a language
               | that just happens to look like C.
        
               | icedchai wrote:
               | Close enough. Will you claim the Linux kernel isn't C
               | because it's compiled with -fno-strict-overflow and -fno-
               | strict-aliasing ?
        
               | astrange wrote:
               | Yes, that's why it only supports specific C compilers.
               | 
               | Anything that includes its own memory allocator (that
               | doesn't call malloc()) is probably not implemented in
               | standardized C.
        
               | icedchai wrote:
               | It's still "C", even if it's a specific dialect. Vendor
               | specific C extensions have existed forever.
        
         | morelisp wrote:
         | This reasoning has always rung mostly hollow for compiler
         | features (#embed, typeof) rather than true language features
         | (VLAs, closures).
         | 
         | Modern toolchains must exist for marginal systems. It's
         | understandable to want to write code for a machine from 1975,
         | or a bespoke MCU, on a modern Thinkpad. It is not necessary to
         | support _a modern compiler running on the machine from 1975 /
         | bespoke MCU_. You might as well argue against readable
         | diagnostic messages because some system out there might not be
         | able to print them!
        
           | tialaramex wrote:
           | I could also see this, though perhaps it's a step too far for
           | C, applying to Unicode encoding of source files.
           | 
           | The 1970s mainframe this program will _run_ on has no idea
           | that Unicode exists. Fine. But, the compiler I 'm using,
           | which must have been written in the future after this was
           | standardised, definitely _does_ know that Unicode exists. So
           | let 's just agree that the program's source code is always
           | UTF-8 and have done with it.
           | 
           | Jason Turner has a talk where the big reveal is, the reason
           | the slides were all retro-looking was that they were rendered
           | in real time on a Commodore 64. The program to do that was
           | written in modern C++ and obviously can't be compiled on a
           | Commodore 64 but it doesn't need to be, the C64 just needs to
           | _run_ the program.
        
             | morelisp wrote:
             | This seems a step too far for me. Compatibility with
             | existing source files which may not be trivial to migrate
             | does also matter. (Well, except for `auto`, C23 was right
             | to fuck with that.) At the very least you'll need flags
             | that mean "do whatever you did before".
        
               | tialaramex wrote:
               | Sure, I don't seriously expect C to embrace that, even
               | though I think it'd be worth the effort I'm sure plenty
               | of their users don't.
               | 
               | For auto I think the argument is that if you poke around
               | in real software the storage specifier was basically
               | never used because it's redundant. That's the rationale
               | WG21 had to abolish its earlier meaning in C++ before
               | adding type deducing auto.
               | 
               | As I read it, N2368 (which I think is what they took?)
               | gives C something more similar to the type inference
               | found in many languages today (which gives you a
               | diagnostic if it can't infer a unique type from available
               | information) whereas C++ got deduction which will choose
               | a type when ambiguous, increasing the chance that a
               | maintenance programmer misunderstands the type of the
               | auto variable.
               | 
               | However it got inference from return, which I think is a
               | misfeature (although I think I can see why they took it,
               | to make generics nicer). With inference from return, to
               | figure out what foo(bar)'s type is, I need to _read the
               | implementation of foo_ because I have to find out what
               | the return statements look like. It 's more common today
               | to decide we should know from the function's signature.
               | 
               | This is somewhat mitigated by the fact that N2368 says
               | auto won't work in extern context, so we can't just
               | blithely say "This object file totally has a function
               | which returns _something_ and you should figure out what
               | type that is " because that's clearly nonsense. You will
               | _have_ the source code with the return statements in it.
        
               | uecker wrote:
               | We took N3007 which does not have inference on return
               | etc. https://www.open-
               | std.org/jtc1/sc22/wg14/www/docs/n3007.htm
        
               | tialaramex wrote:
               | Ah, great. I don't write very much C any more, but the
               | auto described in N3007 (well, the skim of N3007 I just
               | did) feels very much like what I'd want from this feature
               | in C _and_ perhaps more importantly, what I 'd assume
               | auto does if I see it in a snippet of somebody else's
               | code I'm trying to understand.
        
         | eps wrote:
         | > _where NULL isn 't on address 0_
         | 
         | Isn't there literally a single GPU for which it is true?
         | 
         | Asking because everytime this surfaces, someone inevitably asks
         | for an example, and the only example I've seen over the years
         | was of one specific (Nvidia?) GPU that uses NULL of 0xFFFFFFFA
         | (or something similar).
         | 
         | That is, do you know how common it is for NULL to _not_ be 0?
        
           | eslaught wrote:
           | It's true (in some memory spaces) in AMD GPU too:
           | 
           | https://llvm.org/docs/AMDGPUUsage.html#memory-spaces
        
             | eps wrote:
             | That's the one!
        
           | Bjartr wrote:
           | Here is an answer that includes a few examples systems from
           | comp.lang.c
           | 
           | https://c-faq.com/null/machexamp.html
        
         | xg15 wrote:
         | > _And when I say truly, I mean for platforms without file
         | systems_
         | 
         | Are we're really talking about _compiling_ on such platforms?
         | And if that 's the case, how would #include work but not
         | #embed?
        
           | quelsolaar wrote:
           | No, I'm mainly talking about targeting. My point is not so
           | much about embed, but rather that, almost anything you assume
           | you think you know about how computers work isn't necessarily
           | true, because C targets such a wide group of platforms.
           | Almost always when some one raises a question along the line
           | of "No platform has ever done that right?", some one knows of
           | a platform that has done that, and it turns out has very good
           | reasons for doing that.
           | 
           | For this reason, everything is much more complicated then you
           | first think. For me joining the WG14 has been an amazing
           | opportunity to learn the depths of the language. C is not big
           | but it is incredibly deep. The answer to "Why does C not just
           | do X?" is almost always far more complicated and thought
           | through than the one thinks.
           | 
           | Everyone in the wg14 who has been around for a while, knows
           | this, and therefore assumes that even the simplest addition
           | will cause problems, even if they cant come up with a reason
           | why.
        
             | rootbear wrote:
             | I was on X3J11, the ANSI committee that created the
             | original C standard and my experience was similar. It was a
             | great opportunity to learn C at depth and get an
             | understanding of many of the subtle details. We rejected a
             | great many suggestions because our mandate was to
             | standardize existing practice, address some problem areas,
             | and not get too creative. (We occasionally did get too
             | creative. The less said about noalias the better.)
        
               | uecker wrote:
               | We are still fixing bugs in restrict...
        
             | xg15 wrote:
             | Yeah, but then I have to side with the author - how could a
             | _compile time only_ feature which doesn 't even introduce
             | new language semantics possibly be affected by the
             | multitude of build targets?
             | 
             | Unless "it's more complicated than you think" is the
             | catchall answer to any and all proposals for new language
             | features. In which case, how to make progress at all?
             | 
             | Also, I find the point about the language being "truly
             | portable" a bit ironic, considering the whole rationale of
             | #embed was that the _use case_ of  "embed large chunks of
             | binary data in the executable" was completely non-portable
             | and required adding significant complexity to the build
             | scripts if you were targeting multiple platforms.
             | 
             | It's easy to make a language portable on paper if you
             | simply declare the non-portable parts to not be your
             | responsibility.
             | 
             | > _Everyone in the wg14 who has been around for a while,
             | knows this, and therefore assumes that even the simplest
             | addition will cause problems, even if they cant come up
             | with a reason why._
             | 
             | That's not something to be proud of.
        
               | quelsolaar wrote:
               | > That's not something to be proud of.
               | 
               | Its learning from old mistakes.
               | 
               | Look at embed as an example. Look how complex it is,
               | dealing with empty files, different ways of opening
               | files, files without lengths, null termination... the
               | list goes on. This is typical of a proposal for C, it
               | starts out simple "why cant i just embed a file in to my
               | code?" and then it gets complicated because the world is
               | complicated.
               | 
               | I worry a lot about people loading in text files and
               | forgetting to add null termination to embeds. I would not
               | be surprised if in a few years that provides a big
               | headline on Hacker news, about how that shot someone in
               | the foot and how C isn't to be trusted. The details
               | matter.
        
               | duped wrote:
               | > I worry a lot about people loading in text files and
               | forgetting to add null termination to embeds. I would not
               | be surprised if in a few years that provides a big
               | headline on Hacker news, about how that shot someone in
               | the foot and how C isn't to be trusted. The details
               | matter.
               | 
               | The compiler should insert the null terminator if it's
               | not in the embedded file.
        
               | nyanpasu64 wrote:
               | I don't think adding a null terminator is useful for
               | binary files which are not null-terminated strings, and
               | may even have embedded 0 bytes in the middle.
        
               | duped wrote:
               | sure, but if it's a string that requires it to be null
               | terminated, there's no reason the compiler can't solve
               | that problem
        
               | quelsolaar wrote:
               | This is another issue here. If loads of compilers start
               | doing this then programs start relying on it an then it
               | becomes a de-facto undocumented feature. That means if
               | you move compilers/platforms you get new issues. A lot of
               | what the C standard does is mopping up these kinds of
               | issues.
        
               | duped wrote:
               | Then require compilers implement it in the standard. I
               | think it's really backwards to ignore the tool chain and
               | its ability to prevent bugs from entering software.
               | 
               | It's stuff like this that leaves us writing C to rely on
               | implementation defined behavior. Under specification that
               | leaves easy holes to fill will be filled by the compiler
               | and we will rely on them. Just like type punning.
        
               | quelsolaar wrote:
               | This is the problem. Things get complicated fast. If we
               | mandate null termination, then its impossible to have
               | multiple embeds in a row to concatenate files, or we need
               | some how to have rules for when to add null termination
               | and not. These rules in turn are not going to be read by
               | all users, so some people will just assume that embed
               | always adds null terminate in when it doesn't and then we
               | are back to square one. The more we add the more corner
               | cases there are.
        
               | elcritch wrote:
               | Why assume the data should be null terminated? Its an
               | array with a known compile time size. Binary data often
               | needs to include 0 / NULL.
        
         | pbohun wrote:
         | Thanks for your work on the C standard. Any changes that are
         | made will remain forever, so I'm glad the committee takes this
         | seriously.
        
         | AlexanderDhoore wrote:
         | """Codify existing practice to address evident deficiencies.
         | Only those concepts that have some prior art should be
         | accepted. (Prior art may come from implementations of languages
         | other than C.) Unless some proposed new feature addresses an
         | evident deficiency that is actually felt by more than a few C
         | programmers, no new inventions should be entertained."""
         | 
         | Source: Rationale for International Standard -- Programming
         | Languages -- C https://www.open-
         | std.org/jtc1/sc22/wg14/www/C99RationaleV5.1...
         | 
         | I don't know if this rationale is still followed, but I think
         | it applies here. We need to be cautious when adding new
         | features to C.
        
         | oxff wrote:
         | People who call C simple have some weird definition of simple.
         | How many C programs contain UB or are pure UB? Probably over
         | 95%+. Language's not simple at all.
        
           | bigdict wrote:
           | A straight razor is simple and that's why it's the easiest to
           | cut yourself with. An electric razor is much safer precisely
           | because much engineering went into its creation.
        
             | [deleted]
        
         | ErikCorry wrote:
         | > for platforms without file systems, or operating systems or
         | where bytes aren't 8 bits, that doesn't use ASCI or Unicode,
         | where NULL isn't on address 0 and so on.
         | 
         | This seems totally misconceived to me as a basis for
         | standardizing a language in 2022. You are optimizing for the
         | few at the expense of the many.
         | 
         | I get that these strange architectures need a language. Why
         | does it have to be C or C++? They can use a nonstandardized
         | variant of C, but why hobble the language that is 99% used on
         | normal hardware with misfeatures that are justified by trule
         | obscure platforms.
        
           | ryukoposting wrote:
           | > This seems totally misconceived to me as a basis for
           | standardizing a language in 2022. You are optimizing for the
           | few at the expense of the many.
           | 
           | Sure, but it's the same line of reasoning that made C
           | relevant in the first place, and keeps it relevant today -
           | some library your dad wrote for a PDP-whatever is still
           | usable today on your laptop running Windows 10.
           | 
           | Because it's antiquated, it's also extremely easy to support,
           | and to port to new and/or exotic platforms.
        
             | ErikCorry wrote:
             | The library my dad wrote (lol) for the PDP-11 is probably
             | full of undefined behaviour and won't work now that
             | optimizers are using any gap in the standard to miscompile
             | code.
        
               | flqn wrote:
               | What a useless and jaded assumption that code written in
               | the past is bad.
        
               | jolux wrote:
               | The assumption being made here is "any useful C program
               | relies on undefined behavior" which is pretty much true.
        
               | ErikCorry wrote:
               | Yes and I'm sure it's doubly true of code that was
               | written before the C standards were written.
        
               | ErikCorry wrote:
               | I certainly didn't say it was bad. Just that it went
               | outside the boundaries of a standard that was written 25
               | years later.
        
               | jolux wrote:
               | > using any gap in the standard to miscompile code
               | 
               | For code to be miscompiled, there has to be a definition
               | of what correctly compiling it would mean, and if there
               | were, it would not be undefined behavior.
        
               | ErikCorry wrote:
               | Instead of "miscompiled" you can read "Doesn't do what it
               | did on the PDP-11 with the compilers of the time".
        
               | temac wrote:
               | The standard doesn't do that often, but it does
               | sometimes. E.g. realloc to null which was previously
               | defined, and is now UB :(
        
               | ErikCorry wrote:
               | We are taking about code written before the standard so
               | every bit of UB in the standard is in play here.
               | 
               | Eg the fact that overflowing a signed int can cause the
               | compiler to go amuck would certainly be a surprise to the
               | person who wrote code for the PDP-11.
        
               | xg15 wrote:
               | Yeah, but if that definition is constantly shifting, you
               | cannot expect it to work with existing codebases.
        
               | jolux wrote:
               | Well yeah -- therein lies the problem with a language
               | with pervasive undefined behavior.
        
             | raverbashing wrote:
             | > PDP-whatever is still usable today on your laptop running
             | Windows 10
             | 
             | No, it isn't. Go on. Go ahead and try
             | 
             | See it break in a million weird ways. (Or, for a start, it
             | will have the K&R C format, which is a pain to maintain)
             | 
             | "If your computer doesn't have 8-bit bytes" at this day and
             | age? It belongs in a dumpster, sorry.
             | 
             | (I think the only "modern" arch that does this is PIC, and
             | even only for program data - where you're not running
             | anything "officially" C89 or later)
        
               | icedchai wrote:
               | When I first learned C, it was K&R, pre-ANSI with old
               | style function parameters. It is trivial to convert to
               | ANSI C. The truth is C has barely changed in decades.
        
           | quelsolaar wrote:
           | It doesn't have to be C, but as of today there is no other
           | option. No one is coming up with new languages with these
           | kinds of features so C it is. People should, but language
           | designers today are more interested in memory safety and
           | clever syntax, than portability.
           | 
           | I would like to caution you against thinking that these weird
           | platforms are old machines from the 60s that only run in
           | museums. For instance many DSPs have 32bit bytes (smallest
           | memory unit that can be individually addressed), so if you
           | have a pair of new fancy noise canceling headphones, then its
           | not unlikely you are wearing a platform like that on your
           | head everyday.
        
             | chrisseaton wrote:
             | > It doesn't have to be C, but as of today there is no
             | other option
             | 
             | Isn't C99 an option? Why can't more advanced things go into
             | newer C and people who genuinely need something more basic
             | can use C99.
        
               | quelsolaar wrote:
               | We can! Many of us still use c89.(c99 has problems, like
               | variable length arrays).
               | 
               | The reality however is that you cant escape never
               | versions entirely. Not all code you interact with was
               | written in the subset you want, so when your favorite OS
               | or library starts using header files with newer features
               | you need to run that version of the language too.
               | 
               | Another less appreciated detail, is that a lot of WG14
               | work is not about adding new features but clarifying how
               | existing features are meant to work. When the text is
               | clarified this gets back-ported to all previous versions
               | of C in major compilers. An example of this is
               | "provenance". This is a concept that implicitly been
               | standard since the first ISO standard, but only now is
               | becoming formalized. This means that if you want to
               | adhere to the C89 standard, you will find a lot of
               | clarifications about how things should work in the C23
               | standard.
        
               | kevin_thibedeau wrote:
               | VLAs are optional since C11. There is no reason why a
               | vendor can't support a modern language.
        
             | duped wrote:
             | If it were to focus on stability, it would probably be LLVM
             | IR. That said, there's plenty of C++ being written for
             | these applications. And Ada.
             | 
             | > so if you have a pair of new fancy noise canceling
             | headphones, then its not unlikely you are wearing a
             | platform like that on your head everyday.
             | 
             | Chip shortage aside, the likelihood of these devices using
             | obscure hardware like discrete DSPs is going down as
             | cheaper low power architectures are becoming commoditized.
        
               | astrange wrote:
               | LLVM IR isn't stable or even portable. It's just a
               | compiler IR, not a language.
        
               | duped wrote:
               | Hence the qualifier, if it focused on stability. And IRs
               | are languages. They look and quack like them and people
               | treat them as such.
        
             | ErikCorry wrote:
             | Perhaps Carbon is the first in a series of new low level
             | languages that free us from the impossible tensions of
             | C/C++ having to be all things to all (low level)
             | programmers.
             | 
             | I would love a new language for implementing high level
             | languages. I've worked on several of these projects and we
             | use mostly unstandardized dialects of C++ and it's really
             | not fit for purpose.
        
               | nine_k wrote:
               | While at it, I should mention Zig.
        
               | ErikCorry wrote:
               | Does zig have as a selling point that it has _more_ UB
               | than C?
        
             | mastax wrote:
             | Unusual platforms like DSPs usually have specific (usually
             | proprietary) toolchains. Why can't those platforms
             | implement extensions to support 32-bit bytes? Why must
             | everyone else support them? In practice ~no C code is
             | portable to machines with 32-bit bytes. That's okay! You
             | don't choose a DSP to run general purpose code. You choose
             | it to run DSP code, usually written for a specific purpose,
             | often in assembly.
        
               | quelsolaar wrote:
               | "Weird" platforms often do have their own tool-chains but
               | they do have the ability to leverage LLVM, MISRA, and an
               | array of common tools and analyses that exists for C. One
               | of the reason we got new platforms like RISC-V is that
               | today its possible to use existing OSS software to build
               | a platform with a working OS and Development environment,
               | that common basic libraries can be built for is that all
               | this software is written in C and can be targeted towards
               | a new platform.
        
               | ErikCorry wrote:
               | What is the relevance of RiscV here? Not weird at all. I
               | feel like you skipped part of the argument.
        
               | gumby wrote:
               | The point is that new exploration of the design space
               | only works when there's a familiar environment to build
               | on. The old days of each architecture being its own
               | hermetic environment are gone.
        
               | AdamH12113 wrote:
               | Because C already does this, and has from the beginning.
               | C was designed to be portable in an era where there were
               | significant differences in fundamental CPU design
               | decisions between platforms. C is widely used to write
               | software for all kinds of weird platforms. Changing that
               | would be far more work than just making a new language.
        
               | [deleted]
        
           | gumby wrote:
           | As the GP post comments, if you want those features there are
           | plenty of other languages to choose from.
           | 
           | I don't even like programming in C but I respect what the
           | committee is trying to do, and yes I do sometimes write C
           | code.
        
           | skrebbel wrote:
           | C is pretty much the only language in common use for
           | programming microcontrollers. Microntrollers seldomly have
           | filesystems. To break the language on systems without
           | filesystems or terminals means to break the software of
           | pretty much every electronics manufacturer out there.
        
             | varajelle wrote:
             | But you don't run the compiler on a computer without a file
             | system. How would #include works otherwise?
        
             | varajelle wrote:
             | Thinking of it, JavaScript is a language that target mainly
             | browser, which also doesn't have a filesystem.
        
             | ithkuil wrote:
             | It may have no filesystem but it's extremely likely it has
             | 8 bit bytes
        
           | nine_k wrote:
           | I would say that one should be pretty cautious when baking in
           | assumptions snouty such a fleeting thing as hardware into
           | such a lasting thing as a language.
           | 
           | C itself carries a lot of assumptions about computer
           | architecture from the PDP-9 / PDP-11 era, and this does hold
           | current hardware back a bit: see how well the cool
           | nonstandard and fast Cell CPU fared.
           | 
           | A language standard should assume as little about the
           | hardware as possible, while also, ideally, allowing to
           | describe properties of the hardware somehow. C tries hard,
           | but the problem is not easy at all.
        
             | uecker wrote:
             | Can you explain what aspect of C from PDP-11 was
             | problematic for Cell?
        
               | nine_k wrote:
               | All memory is uniform, for instance. There is one scalar
               | data processing unit that finishes a previous operation
               | and then issues the next: no way to naturally describe
               | SIMD, for instance. No way to speak about asynchronous
               | things that happen on a Cell CPU all the time, as much as
               | I can judge. (I never programmed it, but I remember that
               | people who did said they had to use assembly
               | extensively.)
               | 
               | OTOH you can write stuff like `*src++ = *dst++`, and it
               | would neatly compile into something like `movb (R1)+,
               | (R2)+`, a single opcode on a PDP-11.
        
       | thrwyoilarticle wrote:
       | Other stuff:
       | 
       | https://twitter.com/rcs/status/1550526425211584512
       | 
       | nullptr! auto! constexpr!
        
         | phkahler wrote:
         | Not sure about the value of nullptr! Also not sure about auto!
         | In C.
        
           | camel-cdr wrote:
           | nullptr since we have type detection now, and NULL mustn't be
           | a pointer. auto, because otherwise everybody would create
           | their own hacky auto using the new typeof.
        
       | jesprenj wrote:
       | What about creating object files from raw binary files and then
       | linking against them? That's what I (and of course many others)
       | do for linking textures and shaders into the program. It's a bit
       | ugly though that with this approach you can't generate custom
       | symbol names, at least with the GNU linker.
       | 
       | This #embed feature might be a nice alternative for small files.
       | Well for large files you usually don't even want to store them
       | inside the binary, so the compilation overhead might be
       | miniscule, since the files are, by intention, small.
       | 
       | When I read the introduction of the article - about allowing us
       | to cram anything we want into the binary - I was hoping to see a
       | standard way to disable optimizations (When the compiler deletes
       | your code and you don't even notice).
        
         | mastax wrote:
         | You reminded me of Bethesda Softworks games, which always seem
         | to have 1GB+ executables for some reason. I hope it isn't all
         | code. Maybe they embed the most important assets that will
         | always need to be loaded.
        
         | dark-star wrote:
         | One reason against this is mentioned in the letter that is
         | quoted in the article
        
         | jonathrg wrote:
         | It depends on your definition of small files. A few hundred kB
         | to a few megabytes will make compilation speed and memory usage
         | explode if you embed it as text, see section 3.2 in
         | https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#i...
        
       | csmpltn wrote:
       | C89 is where C should've stayed at. If you need to convert a file
       | to a buffer and stick that somewhere in your translation unit,
       | use a build system. Don't fuck with C.
        
         | GuB-42 wrote:
         | Nothing stops you from sticking to C89 if that's you want. Many
         | projects do, and the -std=c89 option will not disappear anytime
         | soon.
        
         | macintux wrote:
         | Did you read the snail mail letter from someone who does just
         | that?
        
           | csmpltn wrote:
           | > "Did you read the snail mail letter from someone who does
           | just that?"
           | 
           | I did. The author struggled embedding files into their
           | executables with makefiles. We don't know anything else
           | beyond that. So what?
           | 
           | People also struggle with memory management in C, an arguably
           | much more difficult and widespread problem. Should we
           | introduce a garbage collector into the C spec? How about we
           | just pull in libsodium into the C standard library because
           | people struggle with getting cryptography right?
           | 
           | OP mentions #embed was a multi-year long uphill battle, with
           | a lot of convincing needed at every turn. That in itself is
           | enough proof that people aren't in clear agreement over there
           | being a single "right" solution. Hence, leave this task to
           | bespoke build systems and be done with it. Let different
           | build systems offer different solutions. Allow for different
           | syntaxes, etc. Leave the core language lean.
        
       | cowtools wrote:
       | Interesting. I look forward to this. What I've been doing now to
       | embed a source.png file is something like this, where I generate
       | source code from a file's data:
       | 
       | in embed_dump.cpp:                 #include <fstream>
       | #include <iostream>       int main(){        std::ifstream f;
       | f.open("./source.png");        std::cout         <<
       | "//automatically generated by embed_dump from project files:" <<
       | std::endl         << "const char embedded_tex[] = {";        char
       | a;        while(f.good()){         f.read(&a,1);
       | std::cout << int(a) << ",";        }        std::cout << "};" <<
       | std::endl;        f.close();       }
       | 
       | Then I set up my makefile like this (main_stuff.cpp #includes
       | embedded_files.h):                 main_stuff: main_stuff.cpp
       | embedded_files.h        c++ main_stuff.cpp       embeded_files.h:
       | embed_dump source.png        ./embed_dump > embeded_files.h
       | embed_dump: embed_dump.cpp        c++ embed_dump.cpp -o
       | embed_dump
        
       | zonovar wrote:
       | 5 years ago I wrote a small python script [1] to help me solve
       | "the same problem". It reads files in a folder and generates an
       | header file containing the files' data and filenames. Is very
       | simple and was to helping me on a job. It has limitations, don't
       | be too hard on me :)
       | 
       | [1] https://github.com/daxliar/pyker
        
       | david2ndaccount wrote:
       | This is a really, really good feature and I am so glad it is
       | finally getting standardized. C23 is shaping up to be a very good
       | revision to the C standard. I'm hoping the proposal to allow
       | redeclaration of identical structs gets in as well as you would
       | finally be able to write code using common types without having
       | to coordinate which would allow interoperability between
       | independently written libraries.
        
       | oefrha wrote:
       | > vendor extensions ... were now a legal part of the syntax. If
       | your implementation does not support a thing, it can issue a
       | diagnostic for the parameters it does not understand. This was
       | _great_ stuff.
       | 
       | I can't be the only one who thinks magic comment is already an
       | ugly escape hatch, adding a mini DSL to it that can mean anything
       | to anyone just makes it ten times worse. It's neither beautiful
       | nor great.
       | 
       | > do extensions on #embed to support different file modes,
       | potentially _reading from the network_ (with a timeout), and
       | other shenanigans.
       | 
       | (Emphasis mine.) My god.
        
         | jcelerier wrote:
         | > (Emphasis mine.) My god.
         | 
         | yes, C finally catching up with what languages such as F# have
         | been able to do for years with great success
         | https://docs.microsoft.com/en-us/dotnet/fsharp/tutorials/typ...
         | ; wild isn't it to step into the 2010-era of programming ?
        
           | oefrha wrote:
           | I suppose you're of the opinion that every feature of every
           | language should be added to C, or maybe even assembly.
        
             | jcelerier wrote:
             | this isn't adding a new feature. This is replacing a
             | feature that everyone _already_ implemented independently
             | on every other project - some with xxd, some with special
             | embedders such as rcc or windres or whatever, some through
             | CMake directly (like this:
             | https://gist.github.com/sivachandran/3a0de157dccef822a230
             | for instance) - in a standard and more performant way.
             | Instead of being paid per-project this cost will now be
             | paid per-compiler implementation which is unambiguously
             | good as there are <<< compilers than C projects.
        
               | oefrha wrote:
               | You replied to my quote about pulling network resources
               | and "other shenanigans", which certainly isn't what
               | "everyone already implemented independently". Plus that's
               | a potential vendor extension, i.e., some may implement it
               | independently, some may not, implementations will likely
               | differ.
        
               | jcelerier wrote:
               | > You replied to my quote about pulling network resources
               | and "other shenanigans", which certainly isn't what
               | "everyone already implemented independently".
               | 
               | ?? pulling network resources works, today, with what
               | people are using. there's zero difference between "cat
               | foo.txt | xxd -i" and "curl https://my.api/foo.json | xxd
               | -i"
        
               | joshuamorton wrote:
               | Hell, I #include files mounted over FUSE regularly.
        
         | greatgib wrote:
         | I guess you don't know C.
         | 
         | "#" is not a symbol for a comment line but the one for a pre-
         | processor directive. Like #include stdlib.h
         | 
         | In c/c++ you use // and /* */ for comments.
        
           | oefrha wrote:
           | I've known C for close to two decades, thank you. I'm using
           | the not at all well defined term "magic comment" to loosely
           | refer to everything that's not strictly speaking code but has
           | special meaning, which include pre-processor directives.
           | 
           | cpp is definitely a well-hated part of C.
        
             | Ensorceled wrote:
             | > I've known C for close to two decades, thank you. I'm
             | using the not at all well defined term "magic comment"
             | 
             | Please forgive those of us who've been using C since the
             | 80's, or earlier, from assuming you don't know C when you
             | invent your own terminology for preprocessor directives.
        
               | oefrha wrote:
               | This is not a preprocessor directive though, from reading
               | the post I don't think cpp is expanding the #embed into
               | an array initializer, otherwise there's no performance
               | benefit at all.
        
               | unreal37 wrote:
               | It's a preprocessor directive. The compiler injects the
               | array based on your instruction.
               | 
               | https://www.open-
               | std.org/jtc1/sc22/wg14/www/docs/n3017.htm#a...
        
               | tialaramex wrote:
               | The deliberate wording expands #embed to C's initializer-
               | list
               | 
               | The major performance benefit is from the as-if rule. The
               | compiler is entitled to do whatever it wants so long as
               | the result is _as-if_ it worked the way the standard
               | describes.
               | 
               | So a decent compiler is going to _notice_ that you 're
               | #embed-ing this in a byte array, and conclude it should
               | just shovel the whole file into the array here. If it
               | actually made the bytes into integers, and then parsed
               | them, they would of course still fit in exactly one byte,
               | because they're bytes, so it needn't worry about
               | _actually doing_ that which is expensive.
               | 
               | Does it work if you try to #embed a small file as
               | parameters to a printf() ? Yeah, probably, go try it on
               | Godbolt (this is an option there) but for the small file
               | where that's viable we don't care about the performance
               | benefit. It's just a nice trick.
        
               | stkdump wrote:
               | The problem with the preprocessor often is that it is a
               | language inside another language and the preprocessor is
               | designed to be almost completely agnostic of the language
               | it is embedded in. So there might be subtle ways to use
               | the preprocessor so that implementing as-if becomes very
               | unintuitive. I don't have a good intuition about this
               | case, if this is 100% designed in a way that it can never
               | provoke such subtle side-effects. Basically what might
               | end up happening is that the preprocessor has to learn
               | some part of the C language to decide if such an as-if
               | transformation is possible and then branch to either do
               | it or don't.
        
               | oefrha wrote:
               | Thanks for pointing out the difference, I stand
               | corrected.
        
           | [deleted]
        
         | rleigh wrote:
         | To be completely honest, I find the fact that this was raised
         | by the committee to be really obtuse and unnecessary. The same
         | "complaint" could be raised about #include as well.
         | 
         | If you want to include data from a continuous stream from a
         | device node, then you could just as easily have the data piped
         | into a temporary file of defined size and then #embed that. No
         | need to have the compiler cater for a problem of your own
         | making.
         | 
         | As for the custom data types. It's a byte array. Why not leave
         | any structure you wish to impose on the byte array up to the
         | user. They can cast it to whatever they like. Not sure why
         | that's anything to do with the #embed functionality.
         | 
         | Both these things seem to be massive overthinking on the part
         | of the committee members. I'm glad I'm not participating, and I
         | really do thank the author for their efforts there. We've
         | needed this for decades, and I'm glad it's got in even if those
         | ridiculous extensions were the compromise needed to get it
         | there.
        
       | orbifold wrote:
       | Can't you just add binary data into a custom section of your ELF
       | executable?
        
         | astrange wrote:
         | Then you don't get regular optimizations like deduping
         | identical declarations.
         | 
         | Or source line location debug info, though nobody tries to show
         | that for data at the moment.
        
         | dark-star wrote:
         | You probably don't realize that not every system is using ELF
         | binaries....
        
         | rleigh wrote:
         | Yes, but it's linker-specific and non-portable. It can also
         | come with some annoying limitations, like having to separately
         | provide the data size of each symbol. In some cases this might
         | be introspectable, but again comes at the expense of
         | portability.
         | 
         | ELF-based variants of the IAR toolchain, for example, provide a
         | means of directly embedding a file as an ELF symbol, but
         | without the size information being directly accessible.
         | 
         | GNU ld and LLVM lld do not provide any embedding functionality
         | at all (as far as I can see). You would have to generate a
         | custom object file with some generated C or ASM encoding the
         | binary content.
         | 
         | MSVC link.exe doesn't support this either, but there is the
         | "resource compiler" to embed binary bits and link them in so
         | they can be retrieved at runtime.
         | 
         | Having a universal and portable mechanism which works
         | everywhere will be a great benefit. I'll be using it for
         | compiled or text shaders, compiled or text lua scripts, small
         | graphics, fonts and all sorts.
        
           | acka wrote:
           | This article[1] shows how you can use GCC toolchain along
           | with objcopy to create an object file from a binary blob,
           | link it, and use the data within in your own code.
           | 
           | [1] https://balau82.wordpress.com/2012/02/19/linking-a-
           | binary-bl...
        
         | AdamH12113 wrote:
         | The less you have to mess with linker scripts, the better.
        
         | nicoburns wrote:
         | The article address this directly. If you're only targeting one
         | platform then this is reasonably easy (albeit still not as easy
         | as #embed), but if you need to be portable then it becomes a
         | nightmare of multiple proprietary methods.
        
         | ghoward wrote:
         | Sure, but to add binary data to _any_ executable on any
         | platform is more involved.
         | 
         | As an example, see [1]. That will turn any file into a C file
         | with a C array, and I use it to embed a math library ([2]) into
         | the executable so that the executable does not have to depend
         | on an external file.
         | 
         | [1]:
         | https://git.yzena.com/gavin/bc/src/branch/master/gen/strgen....
         | 
         | [2]:
         | https://git.yzena.com/gavin/bc/src/branch/master/gen/lib.bc
        
       | kevin_thibedeau wrote:
       | > The directive is well-specified, currently, in all cases to
       | generate a comma-delimited list of integers.
       | 
       | While a noble act, this is nearly as inefficient as using a code
       | generator tool to convert binary data into intermediate C source.
       | Other routes to embed binary data don't force the compiler to
       | churn through text bloat.
       | 
       | It would be much better if a new keyword were introduced that
       | could let the backend fill in the data at link time.
        
         | elcritch wrote:
         | You should read or re-read the article and references. There
         | are multiple benchmarks showing this _not_ to be the case.
         | Actually half the article is a (well deserved) rant about how
         | wrong compiler devs were in thinking that parsing intermediate
         | C sources could ever match the new directive. Compiler internal
         | representation of an array of integers also doesn 't require a
         | big pile of integer ast's.
         | 
         | According to the benchmarking data this extension is even 2x
         | faster than using the linker `objcopy` to insert a binary at
         | link time as you suggest.
        
           | [deleted]
        
       | nuc1e0n wrote:
       | This is a cool feature and I'll likely be using it in the years
       | to come. However, the posix standard command xxd and its -i
       | option can achieve this capability portably today.
       | 
       | It will be useful to achieve it directly in the preprocessor
       | however. I wonder how quickly can it be added to cpp?
        
         | ErikCorry wrote:
         | I'm pretty sure xxd is not part of POSIX
         | https://pubs.opengroup.org/onlinepubs/9699919799/idx/utiliti...
        
       | junon wrote:
       | This will simplify a lot of build pipelines for sure.
       | 
       | One thing that isn't clear from skimming the article, how do you
       | refer to the embedded data again?
        
         | jsnell wrote:
         | > The directive is well-specified, currently, in all cases to
         | generate a comma-delimited list of integers
         | 
         | I.e. you most likely use it to initialize a static variable,
         | and then refer to that variable.
        
           | junon wrote:
           | Ah so this basically?                   static const char
           | d[]={         #embed ...         };
           | 
           | EDIT: ah it was showing up more like a comment which made it
           | hard to spot.
        
             | jonathrg wrote:
             | Yes. There is an example in the article
        
       | throwaway38583 wrote:
       | Congratulations to the author. Things like this are why I hope
       | Carbon exists. Evolving c++ seems like a dumpster fire, despite
       | whatever compelling arguments about comparability you are going
       | to drop on me.
        
       | einpoklum wrote:
       | Everlasting glory to JeanHeyd Meneide, @thephantomderp, for
       | getting this feature into C.
       | 
       | I am wondering, though - where does this stand in C++?
        
       | moffkalast wrote:
       | This reminds me, I'd argue that the explosion of JS frameworks
       | can be mainly blamed on one thing: the lack of an <include
       | src="somemodule.html"> tag. If you have that you basically have
       | 80% of vue.js already natively supported. No clue why this was
       | never added in any fashion. Change my mind.
        
         | lkschubert8 wrote:
         | Wouldn't the include still need some templating functionality?
         | Or are people using vue that heavily for just importing static
         | html?
        
           | ear7h wrote:
           | Not the parent comment, but my personal use case is for
           | rendering a selectable list. The server side would render a
           | static list with fragment links (ex. `#item-10`) and include
           | elements with corresponding IDs, and a `:target` css rule to
           | unhide the element. This would hopefully be paired with lazy
           | loading the include elements.
           | 
           | edit:
           | 
           | My goal is to avoid reloading the page for each selection and
           | rendering all items eagerly. JS frameworks are the only ones
           | that really allow this behavior.
        
         | TheAceOfHearts wrote:
         | HTML imports were part of the original concept of Web
         | Components, and I think they were supported in Chrome. If you
         | look up examples of things built with Polymer 1.x, it was used
         | extensively.
         | 
         | It was actually pretty neat, because you could have an HTML
         | file with a template, style, and script section.
         | 
         | Safari rejected the proposal, so it had to get dropped.
         | 
         | But ESM makes it a bit redundant anyway. The end-goal is to
         | allow you to import any kind of asset, not just JS. There have
         | been demos and examples of tools supporting this going back
         | over half a decade at this point.
        
           | elcritch wrote:
           | Firefox refused the proposal as well. ESM requires javascript
           | though. :/
        
         | polskibus wrote:
         | Is <script type="module" /> not sufficient for your needs? If
         | not then what is missing?
        
           | nkozyra wrote:
           | Seems to be arguing for modular layout/templating, which is
           | what virtual includes did (the cgi in the example would
           | hypothetically output html)
        
         | xigoi wrote:
         | How would <include> be useful for dynamically updating the DOM
         | based on data, which is the main point of Vue?
        
         | agumonkey wrote:
         | I wonder why there's never been a                   <calc> /*
         | ... compute and return dom element */ </calc>
         | 
         | Basically what php does but with structure and objects instead
         | of a bytestream
         | 
         | in HTML. Or maybe it's been discussed but got left out
        
           | marwis wrote:
           | It's always been possible:
           | 
           | <script>document.write(`<p>foo</p>`)</script>
        
         | stevekemp wrote:
         | It's funny I read that and I remember Apache's virtual-include
         | facility:                   <!--#include virtual="/cgi-
         | bin/example.cgi?argument=value" -->
         | 
         | I used that, back in the day, as an alternative to PHP.
        
         | tirpen wrote:
         | > https://caniuse.com/imports
         | 
         | It was a feature in Chrome 36-79 and there were working
         | polyfills to make it work on other browsers.
         | 
         | It was actually a great feature and I used it extensively on an
         | old project back then.
         | 
         | CanIUse: https://caniuse.com/imports
         | 
         | (Now obsolete) tutorial:
         | https://www.sitepoint.com/introduction-html-imports-tutorial...
        
         | andai wrote:
         | <?php require("somemodule.html"); ?>
        
       | aaaaaaaaaaab wrote:
       | C++ keeps kicking ass!
       | 
       | Feel sorry for crab people.
        
         | jonathrg wrote:
         | That's not really the right conclusion to draw from this
         | article
        
         | agluszak wrote:
         | "crab people" means Rust people?
        
           | tialaramex wrote:
           | Officially, Rust's R-cog logo is the symbol of Rust. It is a
           | registered trademark of the Foundation.
           | 
           | But it's a bit boring. Unofficially, Rust has a mascot, in
           | the form of a crab named "Ferris". The crab mascot appears in
           | lots of places, and the Unicode crab emoji U+1F980 is often
           | used by Rust programmers to indicate Rust in text. Unlike the
           | trademarked logo, you can have a bit of fun with such an
           | unofficial symbol, for example Jon Gjengset's "Rust for
           | Rustaceans" book cover has a stylised crab wearing glasses
           | with a laptop apparently addressing a large number of other
           | crabs.
        
             | agluszak wrote:
             | Yeah, I know that, that's why I asked if that's what parent
             | meant, because I can't see why Rust was mentioned there
        
         | orf wrote:
         | The article definitely isn't a glowing praise of C/C++. In
         | fact, including this simple, useful feature that rust has had
         | for a decade now has taken an immense amount of effort and
         | received so much pushback from various parties, in part due to
         | the strangled mess of various compiler limitations and in part
         | because of design-by-committee stupidity.
         | 
         | C/C++ seems to be kicking it's own ass.
        
           | gilnaa wrote:
           | Not to mention that it didn't even get into C++
        
           | msla wrote:
           | The article doesn't even mention C++/Java.
        
             | tialaramex wrote:
             | Second paragraph of the title article:
             | 
             | > Surprisingly, despite this journey starting with C++ and
             | WG21, the C Committee is the one that managed to get there
             | first
             | 
             | Later it mentions presenting their first formal attempt at
             | this to Belfast 2019, that's a C++ meeting, it's too late
             | for this to go into C++ 20 at that point, but it easily
             | could have been in C++ 23 (it is not).
        
       ___________________________________________________________________
       (page generated 2022-07-23 23:00 UTC)