[HN Gopher] What's the Most Portable Way to Include Binary Blobs...
___________________________________________________________________
What's the Most Portable Way to Include Binary Blobs in an
Executable?
Author : Tomte
Score : 33 points
Date : 2022-07-25 09:15 UTC (1 days ago)
(HTM) web link (tratt.net)
(TXT) w3m dump (tratt.net)
| DethNinja wrote:
| Assuming binary blob is relatively small:
|
| Just template generate and store the data as a bit array on the
| language of your choice.
|
| For example, if you are using C/C++ you can zip everything then
| use a small python script to generate a C/C++ header where this
| data is available as a uint8_t array.
|
| Keep in mind that all this data will be loaded to memory, so I
| don't recommend this approach for anything north of 10mb.
| kazinator wrote:
| On a modern VM system, the static initialized data will be
| mapped to memory, not loaded. So you have to worry about its
| virtual footprint, not physical memory use.
| kelseyfrog wrote:
| https://thephd.dev/finally-embed-in-c23
| jll29 wrote:
| Here's a standalone (and Rust-implemented) version similar to xxd
| (if you don't like the vim dependency):
| https://github.com/jochenleidner/ltools/blob/main/src/bin/bi...
|
| What I found is that many compilers don't like to compile very
| large source files; so if the binaries you'd like to integrate
| are big, it might be better to integrate their constituent
| objects one by one (if applicable).
| tomn wrote:
| My colleague wrote this solution for C++ and cmake:
|
| https://github.com/ebu/libear/commit/40a4000296190c3f91eba79...
|
| This is a cmake function which generates C++ files using no
| external tools. It's probably not very fast, but if you don't
| need to handle big files and are already using cmake this is easy
| to integrate, adds no dependencies and works on all platforms.
| jreese wrote:
| Make a ZIP file containing the blob, and catenate it to the end
| of the executable binary. The ZIP format specifically puts all of
| the key metadata at the back of the file, so pretty much any ZIP
| tool can correctly read/list/extract data from the ZIP portion of
| the file. Anything that needs to be linked at runtime can just be
| extracted to a temp dir, and then cleaned up on exit. Bonus
| points for getting "free" compression on text data blobs.
|
| We do this for Python applications, by combining a ZIP containing
| the "link tree" of sources/packages/modules, with a shell
| bootstrap script that automatically sets up the environment,
| import path, etc, and Python itself has built in support for
| importing pure-python modules from a ZIP file. All that's needed
| for native modules is a simple import hook that extracts the
| native objects into temp space and then loads them appropriately.
| deivid wrote:
| Just in case you are unaware, take a look at shiv:
| https://github.com/linkedin/shiv which does this quite neatly
| mmastrac wrote:
| One missing approach is just appending the binary data to the end
| of the file, and then reading the resource from /proc/self/exe on
| Linux (or the equivalents on Mac and Windows).
|
| It's not "portable" per-se, but all modern platforms [1] have a
| way to interrogate the binary contents of the currently-running
| executable.
|
| [1] _NSGetExecutablePath, GetModuleFileName(), getexecname() etc
|
| EDIT: Apparently https://github.com/gpakosz/whereami will manage
| a lot of this complexity for you
| anyfoo wrote:
| Don't do this, it's ugly and relies on assumptions that aren't
| true. I haven't checked each spec, but it is very unlikely that
| your ELF/mach-O/PE/... is still valid with added junk at the
| end. You may try it out and it may work, but that is true for
| many things that may come back to bite you (or others) in
| spectacular ways.
| dmitrygr wrote:
| > it is very unlikely that your ELF/mach-O/PE/... is still
| valid with added junk at the end.
|
| I've written loaders for all of the executable formats you
| mentioned, and maybe a dozen more. I know of none where this
| would violate the strict interpretation of the word of the
| spec.
|
| That being said, valid file != happy OS
| anyfoo wrote:
| Agreed. As above: It may for example run, but not be
| accepted by other parts of the OS (as evidenced).
| fabian2k wrote:
| I'd be interested in any example where this approach would
| produce an invalid executable. I have used this without
| issues, but of course I have certainly not tried this in
| every possible environment.
| anyfoo wrote:
| Computing history is chock full of examples where something
| "seems to work" but is actually invalid (and a mach-O
| treated that way would be invalid [EDIT: or just "not
| accepted" by some parts of the system, see below], whether
| it runs or not), and then Raymond Chen has to write a blog
| post about it decades later. Here's just one out of many as
| a random example: https://devblogs.microsoft.com/oldnewthin
| g/20041026-00/?p=37...
|
| Back to this particular case, the binary will fail strict
| code signing validation on macOS. It may still _run_
| because the kernel does not access the binary past the
| coverage of the code signature (and all the bits there are
| still intact), similar to how multiarch binaries work, but
| you will at least severely be hampered to distribute your
| binary, since Gatekeeper won 't be happy either.
| naasking wrote:
| And on microcontrollers where embedded binaries are essential?
| duskwuff wrote:
| Most microcontrollers run code directly from flash memory --
| there's no "executable file" (or, indeed, any files) involved
| at all.
| kazinator wrote:
| In TXR Lisp, I did this:
|
| https://www.nongnu.org/txr/txr-manpage.html#N-0389D15E
|
| There is a 128-byte area prefixed by the character sequence
| @(txr):. It normally contains all zeros (empty null-terminated
| string). If you put a non-empty UTF-8 string there, it gets
| executed.
|
| Of course, the problem of including a binary blob is trivial if
| it can just be declared as an array; the interesting problem is
| doing it to the executable, without doing any compiling or
| linking.
| branon wrote:
| Something like https://justine.lol/ape.html perhaps?
| mrlonglong wrote:
| C23 will soon have the #embed attribute to include such blobs.
| This will ease portability concerns.
| avrionov wrote:
| This was discussed a few days ago "Embed is in C23":
|
| https://news.ycombinator.com/item?id=32201951
|
| C++ added "std::embed" https://open-
| std.org/JTC1/SC22/WG21/docs/papers/2020/p1040r6...
| ghoward wrote:
| The answer to this is easy. At least it was for me; I didn't know
| it was such a problem.
|
| My solution is [1]. It generates a C file with a specific array
| name passed in through the command-line. It also has a few other
| niceties that I need.
|
| It works on Windows, Mac OSX, Linux, and the BSD's, no matter the
| compiler or linker.
|
| I use it to generate the arrays for the help texts ([2] and [3]),
| as well as two math libraries ([4] and [5]).
|
| People are welcome to adopt and adapt it. Just follow the
| license, as per usual. I've even adapted to my other software.
| [6]
|
| [1]:
| https://git.yzena.com/gavin/bc/src/branch/master/gen/strgen....
|
| [2]:
| https://git.yzena.com/gavin/bc/src/branch/master/gen/bc_help...
|
| [3]:
| https://git.yzena.com/gavin/bc/src/branch/master/gen/dc_help...
|
| [4]: https://git.yzena.com/gavin/bc/src/branch/master/gen/lib.bc
|
| [5]: https://git.yzena.com/gavin/bc/src/branch/master/gen/lib2.bc
|
| [6]:
| https://git.yzena.com/Yzena/Yc/src/branch/master/tests/strge...
| ufo wrote:
| How do you feel about that problem the parent blog post
| mentioned, of this being slow for large blobs particularly when
| compiling with Clang?
| hikarudo wrote:
| You can split it up into several files, then concatenate the
| arrays at runtime.
| vgel wrote:
| If you're already using Clang (and thus LLVM & its platform
| constraints), I wonder if the best way would be to link in a tiny
| Rust / Zig `.o` using `include_bytes!` / `@embedFile`...
| kevin_thibedeau wrote:
| This is broken: .incbin "string_blob.txt"
| ... printf("%s\n", string_blob);
|
| Text files don't have to have a NUL termination. The proper way
| to embed data with the .incbin directive is to add a label after
| the file and use that directly for pointer arithmetic or compute
| the size with another assembly directive.
| kelnos wrote:
| In this case it works because the author explicitly put a NUL
| at the end of the string in the text file. I don't think the
| author was trying to suggest that you can do this with
| arbitrary data.
| jandrese wrote:
| Couldn't you follow up the .incbin statement with a .const 0 or
| something similar?
| ncmncm wrote:
| ELF provides for any number of different kinds of "section" that
| you can have automatically mapped into your address space at
| startup. You just need a way for your program to know where it
| is. There are lots of different ways to get that.
| titzer wrote:
| Yes, but the article was mostly about what tools do you use to
| get that section into the ELF.
| jviotti wrote:
| My team is working on this problem in the context of creating
| Node.js single-executable applications. While the naive approach
| of just appending data at the end of the binary works, it is not
| friendly with code-signature in macOS and Windows given that
| signing operates on PE and Mach-O sections.
|
| We have recently open-sourced a small tool called Postject
| (https://github.com/postmanlabs/postject), which is able to
| inject arbitrary data as proper ELF/Mach-O/PE sections for all
| major operating systems (with AIX support coming). The tool also
| provides C/C++ cross-platform headers for easily traversing the
| final binary and introspect whether the segment is present or
| not.
|
| The tool is based on the LIEF (https://github.com/lief-
| project/LIEF) project.
|
| At Postman, we are making use of this on our custom Node.js
| single-executable applications and soon on our custom Electron.js
| builds too.
___________________________________________________________________
(page generated 2022-07-26 23:00 UTC)