[HN Gopher] Every bug/quirk of the Windows resource compiler (rc...
       ___________________________________________________________________
        
       Every bug/quirk of the Windows resource compiler (rc.exe), probably
        
       Author : nektro
       Score  : 209 points
       Date   : 2024-10-11 22:47 UTC (1 days ago)
        
 (HTM) web link (www.ryanliptak.com)
 (TXT) w3m dump (www.ryanliptak.com)
        
       | squeek502 wrote:
       | I'm the author if anyone has questions
        
         | 1f60c wrote:
         | Resinator's error messages look amazing! I also feel like I've
         | gained a lot of cursed but useless (to me) knowledge, so thanks
         | for that. :-)
         | 
         | I don't have a horse in this race, but regarding FONT
         | resources, I would like to humbly suggest not supporting them
         | at all. Radical, but from what you wrote, they do seem pretty
         | weird and ripe for accidental misuse. Plus, they are obsolete
         | and it seems like Resinator already intentionally diverges from
         | rc.exe in a few cases anyway.
        
           | squeek502 wrote:
           | Thanks!
           | 
           | I'm actually pretty okay with where I've landed with FONT
           | resources. The legwork has already been done in figuring
           | things out, and with the strategy I've chosen, resinator
           | doesn't even need to parse .fnt files at all, so the
           | implementation is pretty simple (I wrote a .fnt parser, but
           | it's now unused[1]).
           | 
           | [1] https://github.com/squeek502/resinator/blob/master/src/fn
           | t.z...
        
         | kcbanner wrote:
         | Thanks again for resinator! I recently used it to ship a small
         | win32 utility (https://github.com/kcbanner/multi-mouse) and it
         | worked perfectly.
        
         | InvisibleUp wrote:
         | The quote escaping seems to be identical to that of Visual
         | Basic 6 (and likely QBASIC as well, although I haven't tested
         | that.)
        
         | bramhaag wrote:
         | Fun article, thank you! One nitpick: some of the side-by-side
         | code blocks overflow on mobile (Pixel 8 Pro, Firefox)
        
       | Arnavion wrote:
       | I was thinking `NOT (1|2)` and `NOT () 2` could make sense if the
       | parser just has a `not_in_effect` flag that gets set to true when
       | a `NOT` is encountered and then applies to the next integer as
       | soon as one is parsed. So `NOT (1|2)` sets the flag, then starts
       | parsing `(1|2)`. Once it's parsed the `1`, it notices a NOT is in
       | effect so it applies it to the 1 as if it had just parsed `NOT 1`
       | (which leaves 0 unchanged), then parses `| 2`, so the result is
       | 2.
       | 
       | `NOT () 2` would be the same logic. `)` signifies the end of an
       | expression and thus evaluates to the current integral result,
       | which is 0 (for the same reason that unary - is zero), and a NOT
       | is in effect so it's treated as `NOT 0` which is a no-op ("unset
       | no bits"). Then the next `2` makes the result `2`. This assumes
       | that `x y` is parsed the same as `x | y` (maybe only if a `NOT`
       | has been parsed at any point first) or as `y` (the same stack-
       | like "the last number that was parsed becomes the result"
       | behavior described in other items).
       | 
       | This doesn't explain the `7 NOT NOT 4 NOT 2 NOT NOT 1 = 2` case
       | though. If the parser just *sets* the `not_in_effect` flag when
       | it encounters a `NOT` (instead of *toggling* it), then this would
       | be `7 | NOT 4 | NOT 2 | NOT 1` which would be 0. If the parser
       | does toggle the flag, this would be `7 | 4 | NOT 2 | 1` which
       | would be 1 or 5. If the parser treats a `NOT` as ending the
       | previous expression (if any), this would be `7 | NOT 0 | NOT 4 |
       | NOT 2 | NOT 0 | NOT 1` which would be 0.
        
       | immibis wrote:
       | That's a crazy amount of work and a crazy amount of quirks
       | indeed. Very much illustrates a mindset where the user is at
       | fault if they provide bad input - and development effort for
       | everything was multiplied compared to today. In 1985, of course,
       | nobody cared about things like security from untrusted inputs,
       | and reproducible builds.
       | 
       | My favourite bug from this list is that the compiler expands tabs
       | to spaces in string literals and puts them at tab stops based on
       | the string literal's horizontal position in the source file.
       | 
       | I think that being able to directly define resource type 6 is not
       | a bug. You got exactly what you asked for - an invalid resource.
       | Crashing when loading it isn't a bug, either.
       | 
       | I suppose that style flag arguments are parsed as |-separated
       | lists of numeric or NOT expressions, rather than single
       | expressions where | serves as bitwise-or.
       | 
       | > If the truncated value is >= 0x80, add 0xFF00 and write the
       | result as a little-endian u32. If the truncated value is < 0x80
       | but not zero, write the value as a little-endian u32.
       | 
       | This is sign-extension: s8 -> s16 -> u16 -> u32. The examples
       | below this also seem to have reversed the order of the input byte
       | and the FF.
       | 
       | Visual C++ 6, at least, includes a toolbar resource editor. IIRC
       | it shows the toolbar metadata and the bitmap together in one
       | editor, and you edit each button's image individually even though
       | they are concatenated into one bitmap in the resource file.
       | 
       | "GROUPBOX can only be used in DIALOGEX" might refer to some
       | limitation other than the resource compiler. For example, perhaps
       | Windows versions that don't support DIALOGEX also don't support
       | GROUPBOX.
       | 
       | A lot of them could be caused by memory safety errors. For
       | example the fact that "1 ICON {" treats "ICON" as the filename is
       | probably because the tokenizer doesn't set the Microsoft
       | equivalent of yytext for tokens where it's not supposed to be
       | relevant. Maybe it would even crash (null pointer) if { could be
       | the first token (which it can't).
        
         | squeek502 wrote:
         | Appreciate the added context!
         | 
         | > |-separated lists of numeric or NOT
         | 
         | Note that | is not the only operator that can be used in style
         | parameters, & + and - are all allowed too.
         | 
         | > perhaps Windows versions that don't support DIALOGEX also
         | don't support GROUPBOX
         | 
         | Seems possible for sure. From [1]:
         | 
         | > The 16-bit extended dialog template is purely historical. The
         | only operating systems to support it were the Windows 95/98/Me
         | series.
         | 
         | [1]
         | https://devblogs.microsoft.com/oldnewthing/20040622-00/?p=38...
         | 
         | > The examples below this also seem to have reversed the order
         | of the input byte and the FF.
         | 
         | Good catch, fixed
        
       | pilif wrote:
       | What an amazing article. And what amazing analysis of this 30
       | years old blob. This was super enjoyable to read.
       | 
       | My only tiny gripe is that with the first quirk the author who
       | insists that his implementation is bug for bug compatible lists
       | the other implementations behavior and explains that they are
       | very different from the original.
       | 
       | And then they proceed with additional quirks where the supposedly
       | bug-for-bug compatible implementation _also_ is different in the
       | same way as the first example in that it produces an error
       | message rather than the quirky output.
       | 
       | Don't get me wrong: errors rather than quirks is much better
       | behavior, but then don't claim to be bug-for-bug compatible, nor
       | roast other implementations for doing the same thing.
        
         | squeek502 wrote:
         | Apologies if I got the tone wrong there, I definitely wasn't
         | trying to roast the other projects.
         | 
         | In terms of what I prioritized bug-for-bug compatibility on, I
         | tailored it to getting
         | https://github.com/squeek502/win32-samples-rc-tests passing
         | 100%, and then also tried to take into account how likely a
         | bug/quirk was to be used in a real .rc file (ultimately this is
         | just a judgement call, though). The results of that test suite
         | (provided in the readme) is also a better indication of how
         | rc.exe-compatible the various resource compilers are in
         | practice (i.e. on 'real' .rc files).
        
       | urbandw311er wrote:
       | I liked this article. I would suggest having a new category of
       | "validation" for some of these. It's not particularly fair to
       | call something a bug, for example, when it's just that rc.exe
       | doesn't play nicely with things it never expected to receive,
       | like non-numeric characters etc.
        
       | rwmj wrote:
       | Very brave author. I have contributed a few patches to WINDRES to
       | fix some bugs and it's a strange tool / concept.
       | 
       | I'm going to guess that Microsoft won't wish to fix any bugs in
       | RC.EXE, since that would break some existing resource scripts,
       | and at this point backwards compatibility is much more important
       | than dealing with quirks.
       | 
       | Edit: Reminds me a bit of my adventures with the Registry,
       | another ill-conceived part of Windows:
       | https://rwmj.wordpress.com/2010/02/18/why-the-windows-regist...
        
         | squeek502 wrote:
         | I think everything labeled 'miscompilation' could be fixed
         | without breaking backwards compatibility, since triggering them
         | always leads to an unusable/broken .res file. No clue how
         | likely it is they'll be fixed, though.
        
           | o11c wrote:
           | You're assuming that anything will _notice_ if the .res file
           | is broken, though. It might add a .res because  "that's what
           | Windows programs are supposed to do", or "only old versions
           | of the program actually used that", or "everybody knows it
           | crashes if you use that menu option, so don't use it."
           | 
           | But the build still depends on the _compilation_ of resources
           | succeeding.
        
         | layer8 wrote:
         | The concept is actually great, having declarative GUI
         | components compiled into efficient binary representations, and
         | allowing named values to be shared in a single-point-of-
         | definition style with the code that will be handling those
         | components at runtime.
        
       | oefrha wrote:
       | > My resource compiler implementation, resinator, has now reached
       | relative maturity and has been merged into the Zig compiler (but
       | is also maintained as a standalone project),
       | 
       | I was going to say scope creep, but then I remembered I've
       | replaced the cross toolchain with `zig cc` in a few small cgo
       | projects. Does zig intend to become the busybox of compilers?
        
         | squeek502 wrote:
         | See https://www.ryanliptak.com/blog/zig-is-a-windows-resource-
         | co... for an example of using the Zig build system to cross-
         | compile an existing Windows GUI program written in C from any
         | supported host system.
        
           | oefrha wrote:
           | Yeah I have used windres for that purpose in the past, open
           | to replacing that with a zig invocation.
        
       | mananaysiempre wrote:
       | > Somehow, the filename { causes rc.exe to think the filename
       | token is actually the preceding token, so it's trying to
       | interpret ICON as both the resource type and the file path of the
       | resource. Who knows what's going on there.
       | 
       | > [...]
       | 
       | > Strangely, rc.exe will treat FOO [in place of the resource
       | type, followed by EOF] as both the type of the resource and as a
       | filename (similar to what we saw earlier in "BEGIN or { as
       | filename").
       | 
       | Having written a stupid lexer recently, I'm almost certain I know
       | what's going on there. The lexer has a lexeme type global that it
       | always sets, and a separate lexeme text global that it only sets
       | for text-like tokens (numbers, identifiers, strings, and bare
       | filenames, which it does not interpret other than to tell where
       | they end, that being the easiest way to deal with ANSI C's _pp-
       | number_ s) but not for punctuation or EOF. Now have the code that
       | looks for the resource filename blindly reach into the text
       | global without first checking the type global (not even to check
       | if it's EOF), and you get exactly the behaviour above.
       | 
       | (Alternatively, the type could instead be returned by the next-
       | lexeme function--that's what my stupid lexer currently does,
       | anyway, though I'm considering changing it. The result is the
       | same.)
       | 
       | > For whatever reason, rc.exe will just take the last number
       | literal in the expression and try to read from a file with that
       | name [...].
       | 
       | I think I'm skilled enough to fuck that up too: the code to read
       | the filename calls the expression parser (which is either not
       | supposed to be called at EOF, or returns EOF that you're supposed
       | to check for in case that happens) and then blindly reaches into
       | the lexeme text variable.
        
       | phaedrus wrote:
       | I recently wrote an ANTLR4 parser for RC files, as part of
       | software archeology on a legacy codebase I support. Considering
       | how many programs over the last 30 (35?) years exist that use the
       | Windows resource compiler, it's surprising how little in-depth
       | information and how few open source alternative tools exist for
       | it. So I'm really glad to see both the information and the
       | project in this post.
        
       | layer8 wrote:
       | It's impressive that with all those shenanigans going on, not
       | more crashing scenarios where uncovered.
        
       ___________________________________________________________________
       (page generated 2024-10-13 22:01 UTC)