[HN Gopher] Embed is in C23
___________________________________________________________________
Embed is in C23
Author : aw1621107
Score : 370 points
Date : 2022-07-23 10:09 UTC (12 hours ago)
(HTM) web link (thephd.dev)
(TXT) w3m dump (thephd.dev)
| pwdisswordfish9 wrote:
| > "Touch grass", some people liked to tell me. "Go outside", they
| said (like I would in the middle of Yet Another Gotdang COVID-19
| Spike). My dude, I've literally gotten company snail mail, and it
| wasn't a legal notice or some shenanigans like that! Holy cow,
| real paper in a real envelope, shipped through German Speed
| Mail!! This letter alone probably increases my Boomer Cred(tm) by
| at least 50; who needs Outside anymore after something like this?
|
| Touch grass indeed. Sure, #embed is a nice feature, but this
| self-indulgent writing style I can't stand.
| morelisp wrote:
| Maidenless commenter.
| mgaunard wrote:
| It takes literally 5 minutes to write a python script that does
| this.
|
| It took a long time to get this adopted because people are most
| likely busy with things that cannot be already solved trivially.
| jjnoakes wrote:
| > trivially
|
| The article covers quite a few reasons why the way things are
| done without #embed are not quite as trivial as they seem.
| mgaunard wrote:
| I've been doing it for 20 years without any single issue, on
| fairly large files.
|
| This proposal doesn't even allow to compress or encrypt the
| data.
| jjnoakes wrote:
| And yet the issues in the article are real and non-trivial.
| Perhaps you just never hit them, or you have a high
| tolerance for non-trivial solutions.
| jonathrg wrote:
| I think it's nice that this will soon be possible to do without
| adding Python as a dependency in your build system
| [deleted]
| yakubin wrote:
| _> told me this form was non-ideal and it was worth voting
| against (and that they'd want the pure, beautiful C++ version
| only[1])_
|
| I heard about #embed, but I didn't hear about std::embed before.
| After looking at the proposal, to me it does look a lot better
| than #embed, because reading binary data and converting it to
| text, only to then convert it to binary again seems needlessly
| complex and wasteful. I also don't like that it extends the
| preprocessor, when IMHO the preprocessor should at worst be left
| as is, and at best be slowly deprecated in favour of features
| which compose well with C proper.
|
| Going beyond the gut reaction and moving on to hard data, as you
| can expect from this design, std::embed of course is faster
| during compilation than #embed for bigger files (comparable for
| moderately-sized files, and a bit slower for tiny files).
|
| I'm not a huge fan of C++, but the fact that C++ removed
| trigraphs in C++17 and that it's generally adding features
| replacing the preprocessor scores a point with me.
|
| [1]: <https://www.open-
| std.org/jtc1/sc22/wg21/docs/papers/2020/p10...>
| mgaunard wrote:
| the preprocessor is a great tool to reduce duplication and
| boilerplate.
|
| People that don't like it generally just don't know how to use
| it.
| timhh wrote:
| People don't dislike it because they are unaware how helpful
| it can be. They dislike it because they are aware how hacky,
| fragile and error-prone it is. They want something more
| robust than text substitution.
| LadyCailin wrote:
| Or perhaps those people can think of better ways to get those
| benefits that also don't allow things like
| #ifndef asdf }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
| #endif
|
| which obliterate tooling such as IDEs. Of course, this is a
| contrived example, but the preprocessor is just one big
| footgun, which offers no benefits over other ways of solving
| the problems you mentioned, such as constexpr and perhaps
| additional, currently unimplemented solutions.
| mgaunard wrote:
| It being possible to misuse a tool does not mean the tool
| is not very useful.
| KerrAvon wrote:
| A tool being very useful doesn't mean that it is a very
| good tool.
|
| There are better tools for the functionality the C
| preprocessor attempts to provide. Other languages have
| module inclusion systems and very powerful macros that
| don't have the enormous footguns of the C preprocessor.
|
| Edit: to be clear, I think #embed is a fine idea; I'd use
| it and it would make my sourcebase cleaner in some
| places.
| makapuf wrote:
| My carpenter has a lot of tools that can be dangerous if
| misused. Of course better tools can be devised, but
| useful things have been done with them (and he still has
| all his fingers)
| LadyCailin wrote:
| Yes, but we're in a thread about ways to improve the
| language, not about how to make the best with what's
| there. This type of argument holds back improvement.
| colonwqbang wrote:
| Compilers follow the "as if" principle, they don't have to
| literally follow the formal rules given by the standard. They
| could implement #embed by doing as you say, pretty printing out
| numbers and then parsing them back in again. But that would be
| an extremely roundabout way to do it, so I doubt anyone will
| actually do it that way. Unless you're running the compiler in
| some kind of debugging mode like GCC's -E.
| twoodfin wrote:
| I don't think the implication is that the C compiler _must_
| encode the binary file as a comma-separated integer list and
| then re-parse it, only _act_ as if it did so.
| yakubin wrote:
| How would that work? It would need to depend on the grammar
| of surrounding C code. This directive isn't limited to
| variable initialisers. You can use it anywhere. So e.g. you
| can use it inside structure declaration, or between "int
| main()" and "{". etc. etc. Those will generate errors in
| subsequent phases, but during preprocessing the compiler
| doesn't know about it. Then there is also just that:
| int main () { return #embed "file.bin"
| ; }
|
| There are plenty of cases, where it will all behave
| differently. And if you're going to pretend even more that
| the preprocessor understands C syntax, then why not just give
| this job to compiler proper, which actually understands it?
| defen wrote:
| Preprocessing produces a series of tokens, so you would
| implement it as a new type of token. If you're using
| something like `-E` you would just pretty-print it as a
| comma-delimited list of integers. If you're moving on to
| translation phase 7, you'd have some sort of rules in your
| parser about where those kinds of tokens can appear . Just
| like you can't have a return token in an initializer, you
| wouldn't be allowed to have an embed token outside of one
| (or whatever the rules are). And you can directly
| instantiate some kind of node that contains the binary
| data.
| yakubin wrote:
| _> you would implement it as a new type of token_
|
| That's a good point. I consider myself debunked.
| [deleted]
| [deleted]
| Cloudef wrote:
| I've always used xxd -i for embedding, doesn't have the mentioned
| problems and works everywhere, as it simply outputs a header file
| with byte array.
| jsnell wrote:
| The article spends a fair bit of time discussing the build
| speed and memory use problems with that approach. Like, the
| benchmark results [0] linked to from this post literally have
| xxd as one of the rows. It's not a viable option for embedding
| megabytes of data.
|
| [0] https://thephd.dev/embed-the-details#results
| tempodox wrote:
| And even if the data is small enough, not every C programmer
| uses Unix or knows their way around it.
| mikepurvis wrote:
| But you have to have build system stuff for that and it's
| obviously non portable.
| Cloudef wrote:
| True, i dont personally ever have problem with this because i
| always compile from unix system anyways. (Even for windows)
| pdw wrote:
| Well, congratulations, you now have a build dependency on Vim.
| (xxd is not a standard tool, it ships with Vim.)
|
| It's also only suitable for tiny files: compile time and RAM
| requirements will blow up once you go beyond a couple of
| megabytes.
| mariusor wrote:
| > It's also only suitable for tiny files: compile time and
| RAM requirements will blow up once you go beyond a couple of
| megabytes.
|
| Do you know what makes it so? Is there a technical argument
| why the compiler could do better, except maybe for xxd not
| being specifically optimized for this use case?
| tialaramex wrote:
| The compiler has an as-if rule on its side.
|
| It's allowed to do whatever it wants so long as the results
| are _as-if_ it did what the standard says. So even though
| the standard says this is making a big list of integers
| like your xxd command, the compiler won 't do that, because
| (as a C compiler) it knows perfectly well it would just
| parse those integers into bytes again, just like the ones
| it got out of the binary file. It knows the integers would
| all be valid (it made them) and fit in a byte (duh) and so
| it can skip the entire back-and-forth.
| Cloudef wrote:
| Yeah, these are reasonable arguments against it
| nicoburns wrote:
| The one in the original article I that it performs badly
| for large files.
| yakubin wrote:
| That's putting it mildly: <https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n3017.htm#i...>
| sylware wrote:
| There is way too much in C already.
|
| The first commandment of C is: 'writing a naive C compiler should
| be "reasonable" for a small team or even one individual'. That's
| getting harder and harder, longer and longer.
|
| I did move from C being "the best compromise" to "the less worse
| compromise".
|
| I wish we had a "C-like" language, which would kind of be a high-
| level assembler which: has no integer promotion or implicit
| casts, has compile-time/runtime casts (without the horrible c++
| syntax), has sized primitive types (u64/s64,f32/f64,etc) at its
| core, has sized literals (42b,12w,123dw,2qw,etc), has no
| typedef/generic/volatile/restrict/etc well that sort of horrible
| things, has compile-time and runtime "const"s, and I am
| forgetting a lot.
|
| From the main issues: the kernel gcc C dialect (roughly speaking,
| each linux release uses more gcc extensions). Aggressive
| optimizations can break some code (while programing some hardware
| for instance).
|
| Maybe I should write assembly, expect RISC-V to be a success, and
| forget about all of this.
| armchairhacker wrote:
| I wish we had something like typed Lua without Lua's weird
| quirks (e.g. indexing by 1), designed with performance
| enhancement and and safety in mind, and with the features you
| mention.
|
| But like Lua, the base compiler is really small and simple and
| can be embedded. And it's "pseudo-interpreted": ultimately it's
| an ahead-of-time language to support things like function
| declarations after references and proper type checking, but
| compiling unoptimized is practically instant and you can load
| new sources at runtime, start a REPL, and do everything else
| you can with an interpreted language. Now having a simple
| compiler with all these features may be impossible, so worse-
| case there is just a simple interpreter, a separate type-
| checker, and a separate performance-optimized JIT compiler
| (like Lua and LuaJIT).
|
| Also like Lua and high-level assembly, debugging unoptimized is
| also really simple and direct. By default, there aren't
| optimizations which elide variables, move instructions around,
| and otherwise clobber the data so the debugger loses
| information, not even tail-call optimization. Execution is so
| simple someone will create a reliable record-replay, time-
| travel debugger which is fast enough you could run it in
| production, and we can have true in-depth debugging.
|
| Now that i've wrote all that I realize this is basically ML.
| But oCaml still has weird quirks (the object system), SML too
| honestly, and I doubt their compilers are small and simple
| enough to be embedded. So maybe a modern ML dialect with a few
| new features and none of the more confusing things which are in
| standard ML.
| elcritch wrote:
| Checkout Nim! It does much of what you describe and its
| great. The core language is fairly small (not quite lua
| simple but probably ML comparable). It compiles fast enough
| that a Nim repl like `inim` is useable to check features and
| for basic maths, though it requires a C compiler, but TCC [4]
| works perfectly. Essentially Nim + tcc is pretty close to
| your description, IMHO. Though I'm not sure TCC supports
| non-x86 targets.
|
| I've never used it but Nim does support some hot reloading as
| well [3]. It also has a real VM if you want to run user
| scripts and has a nice library for it [1]. Its not quite Lua
| flexible but for a generally compiled language its
| impressive.
|
| Recently I made a wrapper to embed access to the Nim
| compilers macros at runtime [2]. It took 3-4 hours probably
| and still compiles in 10s of seconds despite building in a
| fair bit of the compiler! It was useful for making a code
| generator for a serializer format. Though I'm not sure its
| small enough to live on even beefy m4/m7 microcontrollers.
| Though I'm tempted to try.
|
| 1: https://github.com/beef331/nimscripter 2: https://github.c
| om/elcritch/cdecl/blob/main/src/cdecl/compil... 3:
| https://nim-lang.org/docs/hcr.html 4:
| https://bellard.org/tcc/
| agluszak wrote:
| > I wish we had a "C-like" language, which would... How about
| https://ziglang.org/ ?
| [deleted]
| jcranmer wrote:
| > I wish we had a "C-like" language, which would kind of be a
| high-level assembler which: has no integer promotion or
| implicit casts, has compile-time/runtime casts (without the
| horrible c++ syntax), has sized primitive types
| (u64/s64,f32/f64,etc) at its core, has sized literals
| (42b,12w,123dw,2qw,etc), has no
| typedef/generic/volatile/restrict/etc well that sort of
| horrible things, has compile-time and runtime "const"s, and I
| am forgetting a lot.
|
| Unsafe Rust code I think fits this model better than C does: it
| relies on sized primitive types, it has support for both
| wrapping and non-wrapping arithmetic rather than C's quite
| frankly odd rules here, it has no automatic implicit casts, it
| has no strict aliasing rules.
| ArrayBoundCheck wrote:
| Are you a programmer? Embed is the easiest feature to implement
| that I have ever heard
| AlexanderDhoore wrote:
| GCC or Clang with all warnings turned on will give you almost
| what you want. -Wconversion -Wdouble-promotion and 100s of
| others. A good way to learn about warning flags (apart from
| reading the docs) is Clang -Weverything, which will give you
| many, many warnings.
| quelsolaar wrote:
| I agree (with a lot of caveats), but a key value of C is that
| we do not break peoples code and that means that we cant easily
| remove things. If we do, we create a lot of problems. This
| makes it very difficult to keep the language as easy to
| implement as we would like. As a member of the WG14, I intend
| to propose that we do make this our prime priority going
| forward.
| jessermeyer wrote:
| Not an exact match, but a close one: https://odin-lang.org/
| trinovantes wrote:
| Off the top of my head, I think there's some niche use in
| embedding shaders so that they don't need to be stored as strings
| (no IDE support) or read at runtime (slower performance).
| matthews2 wrote:
| Nice for binary shaders too, e.g. SPIR-V bytecode generated by
| glslc.
| phoboslab wrote:
| You can get some IDE support with a simple preprocessor
| macro[1].
|
| It's a crutch, but at least you don't need to stuff the shader
| into multiple "strings" or have string continuations (\\) at
| the end of every line. Plus you get some syntax highlighting
| from the embedding language. I.e. the shader is highlighted as
| C code, which for the most part seems to be close enough.
|
| [1]
| https://github.com/phoboslab/pl_mpeg/blob/master/pl_mpeg_pla...
| avianes wrote:
| Another typical use is embedding a public-key in an application
| or firmware.
| spicyjpeg wrote:
| There are a lot of use cases for baking binary data directly
| into the program, especially in embedded applications. For
| instance, if you are writing a bootloader for a device that has
| some kind of display you might want to include a splash screen,
| or a font to be able to show error messages before a filesystem
| or an external storage medium is initialized. Similarly, on a
| microcontroller with no external storage at all you need to
| embed all your assets into the binary; the current way to do
| that is to either use whatever non-standard tools the
| manufacturer's proprietary toolchain provides, or to use xxd to
| (inefficiently) generate a huge C source file from the contents
| of the binary file. Both require custom build steps and neither
| is ideal.
| eps wrote:
| Re: #embed </dev/urandom>
|
| Just a random thought, but I'd expect a compiler to do exactly
| what's described if I tell it: static char
| foo[123] = { #embed </dev/urandom> };
|
| This would address the most common case with infinity files, and
| then just let the compiler error out if the array size is not
| specified.
| ealexhudson wrote:
| The preprocessor needs to run before the compiler, though, and
| isn't complex enough to understand the context of the code that
| it's in. That would be a substantially complex thing to
| implement.
| eps wrote:
| This will indeed require delaying population of the array to
| the compilation stage. However it's worth the convenience and
| the succinctness of the syntax, and it's not _that_
| substantially complex to implement.
| tazjin wrote:
| Reproducible builds crowd wailing in agony
| huhtenberg wrote:
| The context is that of infinite files, not of the urandom
| specifically. Give the linked post a read for details.
| astrange wrote:
| They just need a reproducible urandom.
| elcritch wrote:
| Looks like you forgot to add your sponsor disclaimer
| "message sponsored by NSA". ;)
| kibwen wrote:
| I would expect that to produce an error, though. If I had a
| regular file that was not infinite in size, and I specified the
| wrong length for the array, I would find it more useful to have
| the compiler inform me as to the discrepancy rather than
| truncate my file.
| eps wrote:
| A warning, not an error. Both under and over-population can
| be valid use cases.
| owalt wrote:
| Honestly I'm usually very wary of additions to C, as one of its
| greatest strengths (to me) is how rather straightforward it is as
| a language in terms of conceptual simplicity. There just aren't
| that many big concepts to understand in the language. (On the
| other hand there's _many_ footguns but that's another issue.)
|
| That said, to me this seems like a great addition to the
| language. It's very single-purpose in its usage (so it doesn't
| seem to add much conceptual complexity to the language) and it
| replaces something genuinely painful (arcane linker hacks). I'm
| very much looking forward to using this as I often make single-
| executable programs in C. The only thing that's unfortunate is
| I'm sure it'll take decades before proprietary embedded
| toolchains add support for this.
| pjmlp wrote:
| C23 and C26 are basically heading into C++ without classes.
| timhh wrote:
| Ha I suggested this on the C++ proposals mailing list 7 years
| ago:
|
| https://groups.google.com/a/isocpp.org/g/std-proposals/c/b6n...
|
| Enjoy the naysayers if you like! I'm glad someone spent the time
| and effort to push past them. Bit too late for me - I have moved
| on to Rust which had support for this from version 1.0.0.
|
| > There's also the standard *nix/BSD utility "xxd".
|
| > Seems like the niche is filled. Or, at least, if you want to
| claim that
|
| > (A) XPM
|
| > (B) incbin
|
| > (C) "xxd -i"
|
| > (D) various ad-hoc scripts given in
| http://stackoverflow.com/questions/8707183/script-tool-to-co...
|
| >...do NOT completely fill this evolutionary niche
|
| > This ultimately would encourage a weird sort of resource
| management philosophy that I think might be damaging in the long
| run.
|
| > Speaking from experience, it is a tremendously bad idea to bake
| any resource into a binary.
|
| > I'll point out that this is a non-issue for Qt applications
| that can simply use Qt's resources for this sort of business.
|
| (Though credit to Matthew Woehlke, he did point out a solution
| which is basically identical to #embed)
|
| > I find this useless specially in embedded environments since
| there should be some processing of the binary data anyway, either
| before building the application
|
| In fairness there was a decent amount of support. But given the
| insane amount of negativity around an obviously useful feature I
| gave up.
|
| I wonder if there was a similar response to the proposal to
| include `string::starts_with()`...
| einpoklum wrote:
| > > Speaking from experience, it is a tremendously bad idea to
| bake any resource into a binary.
|
| What a pompous douche whoever wrote that was.
|
| > > This ultimately would encourage a weird sort of resource
| management philosophy that I think might be damaging in the
| long run.
|
| So, this might be a valid point, although not enough to reject
| the feature for. It true that it's a feature that could
| potentially see over-use and ab-use. But then, so did templates
| :-P
| boywitharupee wrote:
| what is the Rust equivalent for #embed?
| guipsp wrote:
| https://doc.rust-lang.org/std/macro.include_bytes.html
| dbrgn wrote:
| And there's also https://doc.rust-
| lang.org/std/macro.include_str.html for strings.
| zRedShift wrote:
| https://doc.rust-lang.org/std/macro.include_bytes.html
| juunpp wrote:
| Looks great. I've been writing cmake hacks to include assets in
| executables for too long.
| zzo38computer wrote:
| It is something that I had wanted in C too, for a while, so I am
| glad that they added this #embed command.
| ascar wrote:
| This is a cool feature, but the author doesn't do himself any
| favors with his style of writing that greatly overestimates the
| importance of his own feature in the great scheme of things.
| Remarks like "an extension that should've existed 40-50 years
| ago" make me think, if we should've really bothered all compiler
| vendors with implementing this 40-50 years ago. After all, you
| can already a) directly put your binary data in the source file
| like shown after the preprocessor step and b) read a file at
| runtime. I'm not saying this isn't useful, but it's a rather
| niche performance improvement than a core language feature.
| someweirdperson wrote:
| How to read unsigned data? Is there a stadardized parameter, or
| does this require a vendor extension?
| jonathrg wrote:
| You just make your array type uint8_t or whatever you need as
| long as it supports integer literals. See section 4 in
| https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#i...
| someweirdperson wrote:
| Scary, it's as if the preprocessor has become type-aware. I
| guess I better don't imagine the result of the preprocessing
| to look similar to and following the same rules as something
| I would have written by hand. This might make manual
| inspection of the preprocessed file a bit painful.
| elcritch wrote:
| Its not really a pre-processor stage. Probably better to
| think of it more like a pointer cast to some binary blob.
| Though it'd be interesting to see what `gcc -E` would
| produce.
| buserror wrote:
| Haven't the people making the standards other things to do, like,
| integrating _useful features_ instead of duplicating incbin.h [0]
| years after that feature worked?
|
| https://github.com/graphitemaster/incbin/blob/main/incbin.h
| Jasper_ wrote:
| That doesn't work on MSVC without an external source-generating
| tool.
| iasay wrote:
| This is wrong.
|
| It belongs in the linker and you can pull the symbol it creates
| in with extern. I've been doing this for about 25 years.
| dafelst wrote:
| ...and now your solution is non-portable and as a cross-
| platform developer you need to implement N different build
| scripts. This is far more elegant.
| iasay wrote:
| No C toolchain is portable!
|
| If that's a problem, use Go or another higher level language.
| xigoi wrote:
| Isn't one of the supposed advantages of C that it works on
| many platforms?
| thraw_oway wrote:
| Personally I feel the C committee should've disbanded after the
| first standard (the C++ one, after the 2003 technical
| corrigendum). I didn't mind C99 much, but it looks like
| C(++)reeping featuritis is a nasty habit.
|
| These gratuitous standards prompt newbies to use the new features
| (it's "modern") and puzzled veterans to keep up and reinternalize
| understanding of new variants of languages they've been using for
| decades. There's no real improvement, just churn. Possibly it's
| one of the instruments of ageism. More incompatibility with
| existing software and existing programmers.
| tempodox wrote:
| > Even among people who control all the cards, they are in many
| respects fundamentally incapable of imagining a better world or
| seizing on that opportunity to try and create one, let alone
| doing so in a timely fashion.
|
| That does sound soul-crushing. Congrats on this achievement!
| quelsolaar wrote:
| This is simply wrong. We (the ISO wg14) don't hold the cards,
| compilers are free to implement what ever they want, users are
| free to use what ever tools or languages they want.
|
| We exist only as long as we are trusted to be good stewards,
| and only go forward with the consensus of the wider community.
| unreal37 wrote:
| You're both right.
|
| It's amazing that you and the ISO team are good stewards of
| the C standard. Thank you for being part of that.
|
| And it can also be true that it was "hell" and "hardly worth
| it" for the OP to get a new feature added to the language. I
| believe it was a miserable experience that has him
| questioning how he spends his time.
|
| Both can be true. Thank you for your efforts. And thank the
| OP for his efforts too.
| gtirloni wrote:
| _> (the ISO wg14) don 't hold the cards_
|
| That "standard" card seem to be a pretty huge one though.
| quelsolaar wrote:
| Yeah, but a document doesn't compile c code so best to stay
| humble. :-)
| morelisp wrote:
| > > Even among people who control all the cards, they are in
| many respects fundamentally incapable of imagining a better
| world or seizing on that opportunity to try and create one,
| let alone doing so in a timely fashion.
|
| > This is simply wrong. We (the ISO wg14) don't hold the
| cards, compilers are free to implement what ever they want,
| users are free to use what ever tools or languages they want.
|
| This is an incredibly oblivious realization of JeanHeyd's
| point.
| moffkalast wrote:
| I think in our reality the prerequisite for holding all the
| cards is the lack of competence in knowing how to improve the
| world. We've gotten where we are now through sheer force of
| will of those that are empty handed.
| jeffreygoesto wrote:
| The reasonable man adapts himself to the world: the
| unreasonable one persists in trying to adapt the world to
| himself. Therefore all progress depends on the unreasonable
| man.
|
| George Bernard Shaw
| [deleted]
| kibwen wrote:
| This serves the same use as Rust's `include_bytes!` macro, right?
| Presumably most people just use this feature as a way to avoid
| having to stuff binary data into a massive array literal, but in
| our case it's essential because we're actually using it to stuff
| binaries from earlier in our build step into a binary built later
| in the build step. Not something you often need, but very handy
| when you do.
| kzrdude wrote:
| for both Rust and C, these features "just" make something you
| could otherwise do with the build system and generated code
| easier, I think.
| masklinn wrote:
| As the article quotes, in C the lack of standardisation makes
| this tricky when you want to support more than one compiler,
| or even when you want to support just one compiler (cf email
| about the hacks to make it work on GCC with PIE).
| tialaramex wrote:
| This has different affordances than std::include_bytes! but I
| agree that if you were writing Rust and had this problem you'd
| reach for std::include_bytes! and probably not instead think
| "We should have an equivalent of #embed".
|
| include_bytes! gives you a &'static [u8; N] which for non-Rust
| programmers means we're making a fixed size array (the size of
| your file) full of unsigned 8-bit integers (ie bytes) which
| lives for the life of the program, and we get an immutable
| reference to it. Rust's arrays know how big they are (so we can
| ask, now or later) but cannot grow.
|
| #embed gets you a bunch of integers. The as-if rule means your
| compiler is likely to notice if what you're actually doing is
| putting those integers into an array of unsigned 8-bit integers
| and just stick all the file bytes in the array, short cutting
| what you wrote, but you could reasonably do other things,
| especially with smaller files.
| quelsolaar wrote:
| I represent Sweden in the ISO WG14, and I voted for the inclusion
| of Embed in to C23. Its a good feature. But its not a necessary
| feature and I think JeanHeyd is wrong in his criticism of the
| pace of wg14 work. I have found everyone in wg14 to be very
| hardworking and serious about their work.
|
| Cs main strengthen is its portability and simplicity. Therefore
| we should be very conservative, and not add anything quickly.
| There are plenty of languages to choose form if you want a
| "modern" language with lots of conveniences. If you want a truly
| portable language there is really only C. And when I say truly, I
| mean for platforms without file systems, or operating systems or
| where bytes aren't 8 bits, that doesn't use ASCI or Unicode,
| where NULL isn't on address 0 and so on.
|
| We are the stewards of this, and the work we put in, while large,
| is tiny compared to the impact we have. Any change we makes,
| needs to be addressed by every compiler maintainer. There are
| millions of lines of code that depend on every part of the
| standard. A 1% performance loss is millions of tons of CO2
| released, and billions in added hardware and energy costs.
|
| In this privileged position, we have to be very mindful of the
| concerns of our users, and take the time too look at every corner
| case in detail before adding any new features. If we add
| something, then people will depend on its behavior, no matter how
| bad, and we therefor will have great difficulty in fixing it in
| the future without breaking our users work, so we have to get it
| right the first time.
| blippage wrote:
| Seems like a nice addition. Much better than futzing around
| with xxd and suchlike.
| pif wrote:
| Thank you for your post!
|
| Thank you especially for reminding everybody that programming
| is much more than web programming and information systems.
| quelsolaar wrote:
| Thank you,
|
| Its also worth remembering that a lot of higher level
| languages have runtimes / VMs are implemented in C. Web
| applications rely heavily on databases, java script VM,
| network-stacks, system calls and operating system features,
| all of which are impemented in C.
|
| If you are a software developer and want to do something
| about climate change, consider becomming a compiler engineer.
| If you manage to get a couple of tenths of a percent
| performance increase in one of the big compilers during your
| career, you will have materially impacted global warming.
| Compiler engineers are the unsung heroes of software
| engineering.
| ErikCorry wrote:
| No JavaScript VM is implemented in C. They are all written
| in a language that's a bit like C++ but has no exceptions
| and relies on lots of compiler behaviour that is not
| defined by the C++ standard.
| woodruffw wrote:
| Hm? I can think of two pure C JS engines off the top of
| my head: Duktape and Elk. I believe Samsung or another
| vendor also has their own; they're all somewhat common in
| the embedded space.
| ErikCorry wrote:
| Fair. I'm not familiar with the tiny JS VMs. But really
| the main point stands: It's not possible to build a
| decent GC without violating strict aliasing so C and C++
| as standardized are not suitable for this task.
| xyzzy_plugh wrote:
| I guess this doesn't exist then:
| https://bellard.org/quickjs/
| astrange wrote:
| This is not written in C if it doesn't pass
| UBSan/ASan/Frama-C and co. It's written in a language
| that just happens to look like C.
| icedchai wrote:
| Close enough. Will you claim the Linux kernel isn't C
| because it's compiled with -fno-strict-overflow and -fno-
| strict-aliasing ?
| astrange wrote:
| Yes, that's why it only supports specific C compilers.
|
| Anything that includes its own memory allocator (that
| doesn't call malloc()) is probably not implemented in
| standardized C.
| icedchai wrote:
| It's still "C", even if it's a specific dialect. Vendor
| specific C extensions have existed forever.
| morelisp wrote:
| This reasoning has always rung mostly hollow for compiler
| features (#embed, typeof) rather than true language features
| (VLAs, closures).
|
| Modern toolchains must exist for marginal systems. It's
| understandable to want to write code for a machine from 1975,
| or a bespoke MCU, on a modern Thinkpad. It is not necessary to
| support _a modern compiler running on the machine from 1975 /
| bespoke MCU_. You might as well argue against readable
| diagnostic messages because some system out there might not be
| able to print them!
| tialaramex wrote:
| I could also see this, though perhaps it's a step too far for
| C, applying to Unicode encoding of source files.
|
| The 1970s mainframe this program will _run_ on has no idea
| that Unicode exists. Fine. But, the compiler I 'm using,
| which must have been written in the future after this was
| standardised, definitely _does_ know that Unicode exists. So
| let 's just agree that the program's source code is always
| UTF-8 and have done with it.
|
| Jason Turner has a talk where the big reveal is, the reason
| the slides were all retro-looking was that they were rendered
| in real time on a Commodore 64. The program to do that was
| written in modern C++ and obviously can't be compiled on a
| Commodore 64 but it doesn't need to be, the C64 just needs to
| _run_ the program.
| morelisp wrote:
| This seems a step too far for me. Compatibility with
| existing source files which may not be trivial to migrate
| does also matter. (Well, except for `auto`, C23 was right
| to fuck with that.) At the very least you'll need flags
| that mean "do whatever you did before".
| tialaramex wrote:
| Sure, I don't seriously expect C to embrace that, even
| though I think it'd be worth the effort I'm sure plenty
| of their users don't.
|
| For auto I think the argument is that if you poke around
| in real software the storage specifier was basically
| never used because it's redundant. That's the rationale
| WG21 had to abolish its earlier meaning in C++ before
| adding type deducing auto.
|
| As I read it, N2368 (which I think is what they took?)
| gives C something more similar to the type inference
| found in many languages today (which gives you a
| diagnostic if it can't infer a unique type from available
| information) whereas C++ got deduction which will choose
| a type when ambiguous, increasing the chance that a
| maintenance programmer misunderstands the type of the
| auto variable.
|
| However it got inference from return, which I think is a
| misfeature (although I think I can see why they took it,
| to make generics nicer). With inference from return, to
| figure out what foo(bar)'s type is, I need to _read the
| implementation of foo_ because I have to find out what
| the return statements look like. It 's more common today
| to decide we should know from the function's signature.
|
| This is somewhat mitigated by the fact that N2368 says
| auto won't work in extern context, so we can't just
| blithely say "This object file totally has a function
| which returns _something_ and you should figure out what
| type that is " because that's clearly nonsense. You will
| _have_ the source code with the return statements in it.
| uecker wrote:
| We took N3007 which does not have inference on return
| etc. https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n3007.htm
| tialaramex wrote:
| Ah, great. I don't write very much C any more, but the
| auto described in N3007 (well, the skim of N3007 I just
| did) feels very much like what I'd want from this feature
| in C _and_ perhaps more importantly, what I 'd assume
| auto does if I see it in a snippet of somebody else's
| code I'm trying to understand.
| eps wrote:
| > _where NULL isn 't on address 0_
|
| Isn't there literally a single GPU for which it is true?
|
| Asking because everytime this surfaces, someone inevitably asks
| for an example, and the only example I've seen over the years
| was of one specific (Nvidia?) GPU that uses NULL of 0xFFFFFFFA
| (or something similar).
|
| That is, do you know how common it is for NULL to _not_ be 0?
| eslaught wrote:
| It's true (in some memory spaces) in AMD GPU too:
|
| https://llvm.org/docs/AMDGPUUsage.html#memory-spaces
| eps wrote:
| That's the one!
| Bjartr wrote:
| Here is an answer that includes a few examples systems from
| comp.lang.c
|
| https://c-faq.com/null/machexamp.html
| xg15 wrote:
| > _And when I say truly, I mean for platforms without file
| systems_
|
| Are we're really talking about _compiling_ on such platforms?
| And if that 's the case, how would #include work but not
| #embed?
| quelsolaar wrote:
| No, I'm mainly talking about targeting. My point is not so
| much about embed, but rather that, almost anything you assume
| you think you know about how computers work isn't necessarily
| true, because C targets such a wide group of platforms.
| Almost always when some one raises a question along the line
| of "No platform has ever done that right?", some one knows of
| a platform that has done that, and it turns out has very good
| reasons for doing that.
|
| For this reason, everything is much more complicated then you
| first think. For me joining the WG14 has been an amazing
| opportunity to learn the depths of the language. C is not big
| but it is incredibly deep. The answer to "Why does C not just
| do X?" is almost always far more complicated and thought
| through than the one thinks.
|
| Everyone in the wg14 who has been around for a while, knows
| this, and therefore assumes that even the simplest addition
| will cause problems, even if they cant come up with a reason
| why.
| rootbear wrote:
| I was on X3J11, the ANSI committee that created the
| original C standard and my experience was similar. It was a
| great opportunity to learn C at depth and get an
| understanding of many of the subtle details. We rejected a
| great many suggestions because our mandate was to
| standardize existing practice, address some problem areas,
| and not get too creative. (We occasionally did get too
| creative. The less said about noalias the better.)
| uecker wrote:
| We are still fixing bugs in restrict...
| xg15 wrote:
| Yeah, but then I have to side with the author - how could a
| _compile time only_ feature which doesn 't even introduce
| new language semantics possibly be affected by the
| multitude of build targets?
|
| Unless "it's more complicated than you think" is the
| catchall answer to any and all proposals for new language
| features. In which case, how to make progress at all?
|
| Also, I find the point about the language being "truly
| portable" a bit ironic, considering the whole rationale of
| #embed was that the _use case_ of "embed large chunks of
| binary data in the executable" was completely non-portable
| and required adding significant complexity to the build
| scripts if you were targeting multiple platforms.
|
| It's easy to make a language portable on paper if you
| simply declare the non-portable parts to not be your
| responsibility.
|
| > _Everyone in the wg14 who has been around for a while,
| knows this, and therefore assumes that even the simplest
| addition will cause problems, even if they cant come up
| with a reason why._
|
| That's not something to be proud of.
| quelsolaar wrote:
| > That's not something to be proud of.
|
| Its learning from old mistakes.
|
| Look at embed as an example. Look how complex it is,
| dealing with empty files, different ways of opening
| files, files without lengths, null termination... the
| list goes on. This is typical of a proposal for C, it
| starts out simple "why cant i just embed a file in to my
| code?" and then it gets complicated because the world is
| complicated.
|
| I worry a lot about people loading in text files and
| forgetting to add null termination to embeds. I would not
| be surprised if in a few years that provides a big
| headline on Hacker news, about how that shot someone in
| the foot and how C isn't to be trusted. The details
| matter.
| duped wrote:
| > I worry a lot about people loading in text files and
| forgetting to add null termination to embeds. I would not
| be surprised if in a few years that provides a big
| headline on Hacker news, about how that shot someone in
| the foot and how C isn't to be trusted. The details
| matter.
|
| The compiler should insert the null terminator if it's
| not in the embedded file.
| nyanpasu64 wrote:
| I don't think adding a null terminator is useful for
| binary files which are not null-terminated strings, and
| may even have embedded 0 bytes in the middle.
| duped wrote:
| sure, but if it's a string that requires it to be null
| terminated, there's no reason the compiler can't solve
| that problem
| quelsolaar wrote:
| This is another issue here. If loads of compilers start
| doing this then programs start relying on it an then it
| becomes a de-facto undocumented feature. That means if
| you move compilers/platforms you get new issues. A lot of
| what the C standard does is mopping up these kinds of
| issues.
| duped wrote:
| Then require compilers implement it in the standard. I
| think it's really backwards to ignore the tool chain and
| its ability to prevent bugs from entering software.
|
| It's stuff like this that leaves us writing C to rely on
| implementation defined behavior. Under specification that
| leaves easy holes to fill will be filled by the compiler
| and we will rely on them. Just like type punning.
| quelsolaar wrote:
| This is the problem. Things get complicated fast. If we
| mandate null termination, then its impossible to have
| multiple embeds in a row to concatenate files, or we need
| some how to have rules for when to add null termination
| and not. These rules in turn are not going to be read by
| all users, so some people will just assume that embed
| always adds null terminate in when it doesn't and then we
| are back to square one. The more we add the more corner
| cases there are.
| elcritch wrote:
| Why assume the data should be null terminated? Its an
| array with a known compile time size. Binary data often
| needs to include 0 / NULL.
| pbohun wrote:
| Thanks for your work on the C standard. Any changes that are
| made will remain forever, so I'm glad the committee takes this
| seriously.
| AlexanderDhoore wrote:
| """Codify existing practice to address evident deficiencies.
| Only those concepts that have some prior art should be
| accepted. (Prior art may come from implementations of languages
| other than C.) Unless some proposed new feature addresses an
| evident deficiency that is actually felt by more than a few C
| programmers, no new inventions should be entertained."""
|
| Source: Rationale for International Standard -- Programming
| Languages -- C https://www.open-
| std.org/jtc1/sc22/wg14/www/C99RationaleV5.1...
|
| I don't know if this rationale is still followed, but I think
| it applies here. We need to be cautious when adding new
| features to C.
| oxff wrote:
| People who call C simple have some weird definition of simple.
| How many C programs contain UB or are pure UB? Probably over
| 95%+. Language's not simple at all.
| bigdict wrote:
| A straight razor is simple and that's why it's the easiest to
| cut yourself with. An electric razor is much safer precisely
| because much engineering went into its creation.
| [deleted]
| ErikCorry wrote:
| > for platforms without file systems, or operating systems or
| where bytes aren't 8 bits, that doesn't use ASCI or Unicode,
| where NULL isn't on address 0 and so on.
|
| This seems totally misconceived to me as a basis for
| standardizing a language in 2022. You are optimizing for the
| few at the expense of the many.
|
| I get that these strange architectures need a language. Why
| does it have to be C or C++? They can use a nonstandardized
| variant of C, but why hobble the language that is 99% used on
| normal hardware with misfeatures that are justified by trule
| obscure platforms.
| ryukoposting wrote:
| > This seems totally misconceived to me as a basis for
| standardizing a language in 2022. You are optimizing for the
| few at the expense of the many.
|
| Sure, but it's the same line of reasoning that made C
| relevant in the first place, and keeps it relevant today -
| some library your dad wrote for a PDP-whatever is still
| usable today on your laptop running Windows 10.
|
| Because it's antiquated, it's also extremely easy to support,
| and to port to new and/or exotic platforms.
| ErikCorry wrote:
| The library my dad wrote (lol) for the PDP-11 is probably
| full of undefined behaviour and won't work now that
| optimizers are using any gap in the standard to miscompile
| code.
| flqn wrote:
| What a useless and jaded assumption that code written in
| the past is bad.
| jolux wrote:
| The assumption being made here is "any useful C program
| relies on undefined behavior" which is pretty much true.
| ErikCorry wrote:
| Yes and I'm sure it's doubly true of code that was
| written before the C standards were written.
| ErikCorry wrote:
| I certainly didn't say it was bad. Just that it went
| outside the boundaries of a standard that was written 25
| years later.
| jolux wrote:
| > using any gap in the standard to miscompile code
|
| For code to be miscompiled, there has to be a definition
| of what correctly compiling it would mean, and if there
| were, it would not be undefined behavior.
| ErikCorry wrote:
| Instead of "miscompiled" you can read "Doesn't do what it
| did on the PDP-11 with the compilers of the time".
| temac wrote:
| The standard doesn't do that often, but it does
| sometimes. E.g. realloc to null which was previously
| defined, and is now UB :(
| ErikCorry wrote:
| We are taking about code written before the standard so
| every bit of UB in the standard is in play here.
|
| Eg the fact that overflowing a signed int can cause the
| compiler to go amuck would certainly be a surprise to the
| person who wrote code for the PDP-11.
| xg15 wrote:
| Yeah, but if that definition is constantly shifting, you
| cannot expect it to work with existing codebases.
| jolux wrote:
| Well yeah -- therein lies the problem with a language
| with pervasive undefined behavior.
| raverbashing wrote:
| > PDP-whatever is still usable today on your laptop running
| Windows 10
|
| No, it isn't. Go on. Go ahead and try
|
| See it break in a million weird ways. (Or, for a start, it
| will have the K&R C format, which is a pain to maintain)
|
| "If your computer doesn't have 8-bit bytes" at this day and
| age? It belongs in a dumpster, sorry.
|
| (I think the only "modern" arch that does this is PIC, and
| even only for program data - where you're not running
| anything "officially" C89 or later)
| icedchai wrote:
| When I first learned C, it was K&R, pre-ANSI with old
| style function parameters. It is trivial to convert to
| ANSI C. The truth is C has barely changed in decades.
| quelsolaar wrote:
| It doesn't have to be C, but as of today there is no other
| option. No one is coming up with new languages with these
| kinds of features so C it is. People should, but language
| designers today are more interested in memory safety and
| clever syntax, than portability.
|
| I would like to caution you against thinking that these weird
| platforms are old machines from the 60s that only run in
| museums. For instance many DSPs have 32bit bytes (smallest
| memory unit that can be individually addressed), so if you
| have a pair of new fancy noise canceling headphones, then its
| not unlikely you are wearing a platform like that on your
| head everyday.
| chrisseaton wrote:
| > It doesn't have to be C, but as of today there is no
| other option
|
| Isn't C99 an option? Why can't more advanced things go into
| newer C and people who genuinely need something more basic
| can use C99.
| quelsolaar wrote:
| We can! Many of us still use c89.(c99 has problems, like
| variable length arrays).
|
| The reality however is that you cant escape never
| versions entirely. Not all code you interact with was
| written in the subset you want, so when your favorite OS
| or library starts using header files with newer features
| you need to run that version of the language too.
|
| Another less appreciated detail, is that a lot of WG14
| work is not about adding new features but clarifying how
| existing features are meant to work. When the text is
| clarified this gets back-ported to all previous versions
| of C in major compilers. An example of this is
| "provenance". This is a concept that implicitly been
| standard since the first ISO standard, but only now is
| becoming formalized. This means that if you want to
| adhere to the C89 standard, you will find a lot of
| clarifications about how things should work in the C23
| standard.
| kevin_thibedeau wrote:
| VLAs are optional since C11. There is no reason why a
| vendor can't support a modern language.
| duped wrote:
| If it were to focus on stability, it would probably be LLVM
| IR. That said, there's plenty of C++ being written for
| these applications. And Ada.
|
| > so if you have a pair of new fancy noise canceling
| headphones, then its not unlikely you are wearing a
| platform like that on your head everyday.
|
| Chip shortage aside, the likelihood of these devices using
| obscure hardware like discrete DSPs is going down as
| cheaper low power architectures are becoming commoditized.
| astrange wrote:
| LLVM IR isn't stable or even portable. It's just a
| compiler IR, not a language.
| duped wrote:
| Hence the qualifier, if it focused on stability. And IRs
| are languages. They look and quack like them and people
| treat them as such.
| ErikCorry wrote:
| Perhaps Carbon is the first in a series of new low level
| languages that free us from the impossible tensions of
| C/C++ having to be all things to all (low level)
| programmers.
|
| I would love a new language for implementing high level
| languages. I've worked on several of these projects and we
| use mostly unstandardized dialects of C++ and it's really
| not fit for purpose.
| nine_k wrote:
| While at it, I should mention Zig.
| ErikCorry wrote:
| Does zig have as a selling point that it has _more_ UB
| than C?
| mastax wrote:
| Unusual platforms like DSPs usually have specific (usually
| proprietary) toolchains. Why can't those platforms
| implement extensions to support 32-bit bytes? Why must
| everyone else support them? In practice ~no C code is
| portable to machines with 32-bit bytes. That's okay! You
| don't choose a DSP to run general purpose code. You choose
| it to run DSP code, usually written for a specific purpose,
| often in assembly.
| quelsolaar wrote:
| "Weird" platforms often do have their own tool-chains but
| they do have the ability to leverage LLVM, MISRA, and an
| array of common tools and analyses that exists for C. One
| of the reason we got new platforms like RISC-V is that
| today its possible to use existing OSS software to build
| a platform with a working OS and Development environment,
| that common basic libraries can be built for is that all
| this software is written in C and can be targeted towards
| a new platform.
| ErikCorry wrote:
| What is the relevance of RiscV here? Not weird at all. I
| feel like you skipped part of the argument.
| gumby wrote:
| The point is that new exploration of the design space
| only works when there's a familiar environment to build
| on. The old days of each architecture being its own
| hermetic environment are gone.
| AdamH12113 wrote:
| Because C already does this, and has from the beginning.
| C was designed to be portable in an era where there were
| significant differences in fundamental CPU design
| decisions between platforms. C is widely used to write
| software for all kinds of weird platforms. Changing that
| would be far more work than just making a new language.
| [deleted]
| gumby wrote:
| As the GP post comments, if you want those features there are
| plenty of other languages to choose from.
|
| I don't even like programming in C but I respect what the
| committee is trying to do, and yes I do sometimes write C
| code.
| skrebbel wrote:
| C is pretty much the only language in common use for
| programming microcontrollers. Microntrollers seldomly have
| filesystems. To break the language on systems without
| filesystems or terminals means to break the software of
| pretty much every electronics manufacturer out there.
| varajelle wrote:
| But you don't run the compiler on a computer without a file
| system. How would #include works otherwise?
| varajelle wrote:
| Thinking of it, JavaScript is a language that target mainly
| browser, which also doesn't have a filesystem.
| ithkuil wrote:
| It may have no filesystem but it's extremely likely it has
| 8 bit bytes
| nine_k wrote:
| I would say that one should be pretty cautious when baking in
| assumptions snouty such a fleeting thing as hardware into
| such a lasting thing as a language.
|
| C itself carries a lot of assumptions about computer
| architecture from the PDP-9 / PDP-11 era, and this does hold
| current hardware back a bit: see how well the cool
| nonstandard and fast Cell CPU fared.
|
| A language standard should assume as little about the
| hardware as possible, while also, ideally, allowing to
| describe properties of the hardware somehow. C tries hard,
| but the problem is not easy at all.
| uecker wrote:
| Can you explain what aspect of C from PDP-11 was
| problematic for Cell?
| nine_k wrote:
| All memory is uniform, for instance. There is one scalar
| data processing unit that finishes a previous operation
| and then issues the next: no way to naturally describe
| SIMD, for instance. No way to speak about asynchronous
| things that happen on a Cell CPU all the time, as much as
| I can judge. (I never programmed it, but I remember that
| people who did said they had to use assembly
| extensively.)
|
| OTOH you can write stuff like `*src++ = *dst++`, and it
| would neatly compile into something like `movb (R1)+,
| (R2)+`, a single opcode on a PDP-11.
| thrwyoilarticle wrote:
| Other stuff:
|
| https://twitter.com/rcs/status/1550526425211584512
|
| nullptr! auto! constexpr!
| phkahler wrote:
| Not sure about the value of nullptr! Also not sure about auto!
| In C.
| camel-cdr wrote:
| nullptr since we have type detection now, and NULL mustn't be
| a pointer. auto, because otherwise everybody would create
| their own hacky auto using the new typeof.
| jesprenj wrote:
| What about creating object files from raw binary files and then
| linking against them? That's what I (and of course many others)
| do for linking textures and shaders into the program. It's a bit
| ugly though that with this approach you can't generate custom
| symbol names, at least with the GNU linker.
|
| This #embed feature might be a nice alternative for small files.
| Well for large files you usually don't even want to store them
| inside the binary, so the compilation overhead might be
| miniscule, since the files are, by intention, small.
|
| When I read the introduction of the article - about allowing us
| to cram anything we want into the binary - I was hoping to see a
| standard way to disable optimizations (When the compiler deletes
| your code and you don't even notice).
| mastax wrote:
| You reminded me of Bethesda Softworks games, which always seem
| to have 1GB+ executables for some reason. I hope it isn't all
| code. Maybe they embed the most important assets that will
| always need to be loaded.
| dark-star wrote:
| One reason against this is mentioned in the letter that is
| quoted in the article
| jonathrg wrote:
| It depends on your definition of small files. A few hundred kB
| to a few megabytes will make compilation speed and memory usage
| explode if you embed it as text, see section 3.2 in
| https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#i...
| csmpltn wrote:
| C89 is where C should've stayed at. If you need to convert a file
| to a buffer and stick that somewhere in your translation unit,
| use a build system. Don't fuck with C.
| GuB-42 wrote:
| Nothing stops you from sticking to C89 if that's you want. Many
| projects do, and the -std=c89 option will not disappear anytime
| soon.
| macintux wrote:
| Did you read the snail mail letter from someone who does just
| that?
| csmpltn wrote:
| > "Did you read the snail mail letter from someone who does
| just that?"
|
| I did. The author struggled embedding files into their
| executables with makefiles. We don't know anything else
| beyond that. So what?
|
| People also struggle with memory management in C, an arguably
| much more difficult and widespread problem. Should we
| introduce a garbage collector into the C spec? How about we
| just pull in libsodium into the C standard library because
| people struggle with getting cryptography right?
|
| OP mentions #embed was a multi-year long uphill battle, with
| a lot of convincing needed at every turn. That in itself is
| enough proof that people aren't in clear agreement over there
| being a single "right" solution. Hence, leave this task to
| bespoke build systems and be done with it. Let different
| build systems offer different solutions. Allow for different
| syntaxes, etc. Leave the core language lean.
| cowtools wrote:
| Interesting. I look forward to this. What I've been doing now to
| embed a source.png file is something like this, where I generate
| source code from a file's data:
|
| in embed_dump.cpp: #include <fstream>
| #include <iostream> int main(){ std::ifstream f;
| f.open("./source.png"); std::cout <<
| "//automatically generated by embed_dump from project files:" <<
| std::endl << "const char embedded_tex[] = {"; char
| a; while(f.good()){ f.read(&a,1);
| std::cout << int(a) << ","; } std::cout << "};" <<
| std::endl; f.close(); }
|
| Then I set up my makefile like this (main_stuff.cpp #includes
| embedded_files.h): main_stuff: main_stuff.cpp
| embedded_files.h c++ main_stuff.cpp embeded_files.h:
| embed_dump source.png ./embed_dump > embeded_files.h
| embed_dump: embed_dump.cpp c++ embed_dump.cpp -o
| embed_dump
| zonovar wrote:
| 5 years ago I wrote a small python script [1] to help me solve
| "the same problem". It reads files in a folder and generates an
| header file containing the files' data and filenames. Is very
| simple and was to helping me on a job. It has limitations, don't
| be too hard on me :)
|
| [1] https://github.com/daxliar/pyker
| david2ndaccount wrote:
| This is a really, really good feature and I am so glad it is
| finally getting standardized. C23 is shaping up to be a very good
| revision to the C standard. I'm hoping the proposal to allow
| redeclaration of identical structs gets in as well as you would
| finally be able to write code using common types without having
| to coordinate which would allow interoperability between
| independently written libraries.
| oefrha wrote:
| > vendor extensions ... were now a legal part of the syntax. If
| your implementation does not support a thing, it can issue a
| diagnostic for the parameters it does not understand. This was
| _great_ stuff.
|
| I can't be the only one who thinks magic comment is already an
| ugly escape hatch, adding a mini DSL to it that can mean anything
| to anyone just makes it ten times worse. It's neither beautiful
| nor great.
|
| > do extensions on #embed to support different file modes,
| potentially _reading from the network_ (with a timeout), and
| other shenanigans.
|
| (Emphasis mine.) My god.
| jcelerier wrote:
| > (Emphasis mine.) My god.
|
| yes, C finally catching up with what languages such as F# have
| been able to do for years with great success
| https://docs.microsoft.com/en-us/dotnet/fsharp/tutorials/typ...
| ; wild isn't it to step into the 2010-era of programming ?
| oefrha wrote:
| I suppose you're of the opinion that every feature of every
| language should be added to C, or maybe even assembly.
| jcelerier wrote:
| this isn't adding a new feature. This is replacing a
| feature that everyone _already_ implemented independently
| on every other project - some with xxd, some with special
| embedders such as rcc or windres or whatever, some through
| CMake directly (like this:
| https://gist.github.com/sivachandran/3a0de157dccef822a230
| for instance) - in a standard and more performant way.
| Instead of being paid per-project this cost will now be
| paid per-compiler implementation which is unambiguously
| good as there are <<< compilers than C projects.
| oefrha wrote:
| You replied to my quote about pulling network resources
| and "other shenanigans", which certainly isn't what
| "everyone already implemented independently". Plus that's
| a potential vendor extension, i.e., some may implement it
| independently, some may not, implementations will likely
| differ.
| jcelerier wrote:
| > You replied to my quote about pulling network resources
| and "other shenanigans", which certainly isn't what
| "everyone already implemented independently".
|
| ?? pulling network resources works, today, with what
| people are using. there's zero difference between "cat
| foo.txt | xxd -i" and "curl https://my.api/foo.json | xxd
| -i"
| joshuamorton wrote:
| Hell, I #include files mounted over FUSE regularly.
| greatgib wrote:
| I guess you don't know C.
|
| "#" is not a symbol for a comment line but the one for a pre-
| processor directive. Like #include stdlib.h
|
| In c/c++ you use // and /* */ for comments.
| oefrha wrote:
| I've known C for close to two decades, thank you. I'm using
| the not at all well defined term "magic comment" to loosely
| refer to everything that's not strictly speaking code but has
| special meaning, which include pre-processor directives.
|
| cpp is definitely a well-hated part of C.
| Ensorceled wrote:
| > I've known C for close to two decades, thank you. I'm
| using the not at all well defined term "magic comment"
|
| Please forgive those of us who've been using C since the
| 80's, or earlier, from assuming you don't know C when you
| invent your own terminology for preprocessor directives.
| oefrha wrote:
| This is not a preprocessor directive though, from reading
| the post I don't think cpp is expanding the #embed into
| an array initializer, otherwise there's no performance
| benefit at all.
| unreal37 wrote:
| It's a preprocessor directive. The compiler injects the
| array based on your instruction.
|
| https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n3017.htm#a...
| tialaramex wrote:
| The deliberate wording expands #embed to C's initializer-
| list
|
| The major performance benefit is from the as-if rule. The
| compiler is entitled to do whatever it wants so long as
| the result is _as-if_ it worked the way the standard
| describes.
|
| So a decent compiler is going to _notice_ that you 're
| #embed-ing this in a byte array, and conclude it should
| just shovel the whole file into the array here. If it
| actually made the bytes into integers, and then parsed
| them, they would of course still fit in exactly one byte,
| because they're bytes, so it needn't worry about
| _actually doing_ that which is expensive.
|
| Does it work if you try to #embed a small file as
| parameters to a printf() ? Yeah, probably, go try it on
| Godbolt (this is an option there) but for the small file
| where that's viable we don't care about the performance
| benefit. It's just a nice trick.
| stkdump wrote:
| The problem with the preprocessor often is that it is a
| language inside another language and the preprocessor is
| designed to be almost completely agnostic of the language
| it is embedded in. So there might be subtle ways to use
| the preprocessor so that implementing as-if becomes very
| unintuitive. I don't have a good intuition about this
| case, if this is 100% designed in a way that it can never
| provoke such subtle side-effects. Basically what might
| end up happening is that the preprocessor has to learn
| some part of the C language to decide if such an as-if
| transformation is possible and then branch to either do
| it or don't.
| oefrha wrote:
| Thanks for pointing out the difference, I stand
| corrected.
| [deleted]
| rleigh wrote:
| To be completely honest, I find the fact that this was raised
| by the committee to be really obtuse and unnecessary. The same
| "complaint" could be raised about #include as well.
|
| If you want to include data from a continuous stream from a
| device node, then you could just as easily have the data piped
| into a temporary file of defined size and then #embed that. No
| need to have the compiler cater for a problem of your own
| making.
|
| As for the custom data types. It's a byte array. Why not leave
| any structure you wish to impose on the byte array up to the
| user. They can cast it to whatever they like. Not sure why
| that's anything to do with the #embed functionality.
|
| Both these things seem to be massive overthinking on the part
| of the committee members. I'm glad I'm not participating, and I
| really do thank the author for their efforts there. We've
| needed this for decades, and I'm glad it's got in even if those
| ridiculous extensions were the compromise needed to get it
| there.
| orbifold wrote:
| Can't you just add binary data into a custom section of your ELF
| executable?
| astrange wrote:
| Then you don't get regular optimizations like deduping
| identical declarations.
|
| Or source line location debug info, though nobody tries to show
| that for data at the moment.
| dark-star wrote:
| You probably don't realize that not every system is using ELF
| binaries....
| rleigh wrote:
| Yes, but it's linker-specific and non-portable. It can also
| come with some annoying limitations, like having to separately
| provide the data size of each symbol. In some cases this might
| be introspectable, but again comes at the expense of
| portability.
|
| ELF-based variants of the IAR toolchain, for example, provide a
| means of directly embedding a file as an ELF symbol, but
| without the size information being directly accessible.
|
| GNU ld and LLVM lld do not provide any embedding functionality
| at all (as far as I can see). You would have to generate a
| custom object file with some generated C or ASM encoding the
| binary content.
|
| MSVC link.exe doesn't support this either, but there is the
| "resource compiler" to embed binary bits and link them in so
| they can be retrieved at runtime.
|
| Having a universal and portable mechanism which works
| everywhere will be a great benefit. I'll be using it for
| compiled or text shaders, compiled or text lua scripts, small
| graphics, fonts and all sorts.
| acka wrote:
| This article[1] shows how you can use GCC toolchain along
| with objcopy to create an object file from a binary blob,
| link it, and use the data within in your own code.
|
| [1] https://balau82.wordpress.com/2012/02/19/linking-a-
| binary-bl...
| AdamH12113 wrote:
| The less you have to mess with linker scripts, the better.
| nicoburns wrote:
| The article address this directly. If you're only targeting one
| platform then this is reasonably easy (albeit still not as easy
| as #embed), but if you need to be portable then it becomes a
| nightmare of multiple proprietary methods.
| ghoward wrote:
| Sure, but to add binary data to _any_ executable on any
| platform is more involved.
|
| As an example, see [1]. That will turn any file into a C file
| with a C array, and I use it to embed a math library ([2]) into
| the executable so that the executable does not have to depend
| on an external file.
|
| [1]:
| https://git.yzena.com/gavin/bc/src/branch/master/gen/strgen....
|
| [2]:
| https://git.yzena.com/gavin/bc/src/branch/master/gen/lib.bc
| kevin_thibedeau wrote:
| > The directive is well-specified, currently, in all cases to
| generate a comma-delimited list of integers.
|
| While a noble act, this is nearly as inefficient as using a code
| generator tool to convert binary data into intermediate C source.
| Other routes to embed binary data don't force the compiler to
| churn through text bloat.
|
| It would be much better if a new keyword were introduced that
| could let the backend fill in the data at link time.
| elcritch wrote:
| You should read or re-read the article and references. There
| are multiple benchmarks showing this _not_ to be the case.
| Actually half the article is a (well deserved) rant about how
| wrong compiler devs were in thinking that parsing intermediate
| C sources could ever match the new directive. Compiler internal
| representation of an array of integers also doesn 't require a
| big pile of integer ast's.
|
| According to the benchmarking data this extension is even 2x
| faster than using the linker `objcopy` to insert a binary at
| link time as you suggest.
| [deleted]
| nuc1e0n wrote:
| This is a cool feature and I'll likely be using it in the years
| to come. However, the posix standard command xxd and its -i
| option can achieve this capability portably today.
|
| It will be useful to achieve it directly in the preprocessor
| however. I wonder how quickly can it be added to cpp?
| ErikCorry wrote:
| I'm pretty sure xxd is not part of POSIX
| https://pubs.opengroup.org/onlinepubs/9699919799/idx/utiliti...
| junon wrote:
| This will simplify a lot of build pipelines for sure.
|
| One thing that isn't clear from skimming the article, how do you
| refer to the embedded data again?
| jsnell wrote:
| > The directive is well-specified, currently, in all cases to
| generate a comma-delimited list of integers
|
| I.e. you most likely use it to initialize a static variable,
| and then refer to that variable.
| junon wrote:
| Ah so this basically? static const char
| d[]={ #embed ... };
|
| EDIT: ah it was showing up more like a comment which made it
| hard to spot.
| jonathrg wrote:
| Yes. There is an example in the article
| throwaway38583 wrote:
| Congratulations to the author. Things like this are why I hope
| Carbon exists. Evolving c++ seems like a dumpster fire, despite
| whatever compelling arguments about comparability you are going
| to drop on me.
| einpoklum wrote:
| Everlasting glory to JeanHeyd Meneide, @thephantomderp, for
| getting this feature into C.
|
| I am wondering, though - where does this stand in C++?
| moffkalast wrote:
| This reminds me, I'd argue that the explosion of JS frameworks
| can be mainly blamed on one thing: the lack of an <include
| src="somemodule.html"> tag. If you have that you basically have
| 80% of vue.js already natively supported. No clue why this was
| never added in any fashion. Change my mind.
| lkschubert8 wrote:
| Wouldn't the include still need some templating functionality?
| Or are people using vue that heavily for just importing static
| html?
| ear7h wrote:
| Not the parent comment, but my personal use case is for
| rendering a selectable list. The server side would render a
| static list with fragment links (ex. `#item-10`) and include
| elements with corresponding IDs, and a `:target` css rule to
| unhide the element. This would hopefully be paired with lazy
| loading the include elements.
|
| edit:
|
| My goal is to avoid reloading the page for each selection and
| rendering all items eagerly. JS frameworks are the only ones
| that really allow this behavior.
| TheAceOfHearts wrote:
| HTML imports were part of the original concept of Web
| Components, and I think they were supported in Chrome. If you
| look up examples of things built with Polymer 1.x, it was used
| extensively.
|
| It was actually pretty neat, because you could have an HTML
| file with a template, style, and script section.
|
| Safari rejected the proposal, so it had to get dropped.
|
| But ESM makes it a bit redundant anyway. The end-goal is to
| allow you to import any kind of asset, not just JS. There have
| been demos and examples of tools supporting this going back
| over half a decade at this point.
| elcritch wrote:
| Firefox refused the proposal as well. ESM requires javascript
| though. :/
| polskibus wrote:
| Is <script type="module" /> not sufficient for your needs? If
| not then what is missing?
| nkozyra wrote:
| Seems to be arguing for modular layout/templating, which is
| what virtual includes did (the cgi in the example would
| hypothetically output html)
| xigoi wrote:
| How would <include> be useful for dynamically updating the DOM
| based on data, which is the main point of Vue?
| agumonkey wrote:
| I wonder why there's never been a <calc> /*
| ... compute and return dom element */ </calc>
|
| Basically what php does but with structure and objects instead
| of a bytestream
|
| in HTML. Or maybe it's been discussed but got left out
| marwis wrote:
| It's always been possible:
|
| <script>document.write(`<p>foo</p>`)</script>
| stevekemp wrote:
| It's funny I read that and I remember Apache's virtual-include
| facility: <!--#include virtual="/cgi-
| bin/example.cgi?argument=value" -->
|
| I used that, back in the day, as an alternative to PHP.
| tirpen wrote:
| > https://caniuse.com/imports
|
| It was a feature in Chrome 36-79 and there were working
| polyfills to make it work on other browsers.
|
| It was actually a great feature and I used it extensively on an
| old project back then.
|
| CanIUse: https://caniuse.com/imports
|
| (Now obsolete) tutorial:
| https://www.sitepoint.com/introduction-html-imports-tutorial...
| andai wrote:
| <?php require("somemodule.html"); ?>
| aaaaaaaaaaab wrote:
| C++ keeps kicking ass!
|
| Feel sorry for crab people.
| jonathrg wrote:
| That's not really the right conclusion to draw from this
| article
| agluszak wrote:
| "crab people" means Rust people?
| tialaramex wrote:
| Officially, Rust's R-cog logo is the symbol of Rust. It is a
| registered trademark of the Foundation.
|
| But it's a bit boring. Unofficially, Rust has a mascot, in
| the form of a crab named "Ferris". The crab mascot appears in
| lots of places, and the Unicode crab emoji U+1F980 is often
| used by Rust programmers to indicate Rust in text. Unlike the
| trademarked logo, you can have a bit of fun with such an
| unofficial symbol, for example Jon Gjengset's "Rust for
| Rustaceans" book cover has a stylised crab wearing glasses
| with a laptop apparently addressing a large number of other
| crabs.
| agluszak wrote:
| Yeah, I know that, that's why I asked if that's what parent
| meant, because I can't see why Rust was mentioned there
| orf wrote:
| The article definitely isn't a glowing praise of C/C++. In
| fact, including this simple, useful feature that rust has had
| for a decade now has taken an immense amount of effort and
| received so much pushback from various parties, in part due to
| the strangled mess of various compiler limitations and in part
| because of design-by-committee stupidity.
|
| C/C++ seems to be kicking it's own ass.
| gilnaa wrote:
| Not to mention that it didn't even get into C++
| msla wrote:
| The article doesn't even mention C++/Java.
| tialaramex wrote:
| Second paragraph of the title article:
|
| > Surprisingly, despite this journey starting with C++ and
| WG21, the C Committee is the one that managed to get there
| first
|
| Later it mentions presenting their first formal attempt at
| this to Belfast 2019, that's a C++ meeting, it's too late
| for this to go into C++ 20 at that point, but it easily
| could have been in C++ 23 (it is not).
___________________________________________________________________
(page generated 2022-07-23 23:00 UTC)