https://www.ryanliptak.com/blog/every-rc-exe-bug-quirk-probably/ * Home * Blog RSS --------------------------------------------------------------------- Every bug/quirk of the Windows resource compiler (rc.exe), probably 2024-10-11 - Programming - Fuzzing * 7 NOT NOT 4 NOT 2 NOT NOT 1 is a valid expression * 000 is a number that gets parsed into the decimal value 65130 * A < 1 MiB icon file can get compiled into 127 TiB of data The above is just a small sampling of a few of the strange behaviors of the Windows RC compiler (rc.exe). All of the above bugs/quirks, and many, many more, will be detailed and explained (to the best of my ability) in this post. Note: If you have no familiarity with rc.exe, .rc files, or even Windows development at all, no need to worry--I have tried to organize this post such that it will get you up to speed as-you-read. Context Inspired by an accepted proposal for Zig to include support for compiling Windows resource script (.rc) files, I set out on what I thought at the time would be a somewhat straightforward side-project of writing a Windows resource compiler in Zig. Microsoft's RC compiler (rc.exe) is closed source, but alternative implementations are nothing new--there are multiple existing projects that tackle the same goal of an open source and cross-platform Windows resource compiler (in particular, windres and llvm-rc). I figured that I could use them as a reference, and that the syntax of .rc files didn't look too complicated. I was wrong on both counts. While the .rc syntax in theory is not complicated, there are edge cases hiding around every corner, and each of the existing alternative Windows resource compilers handle each edge case very differently from the canonical Microsoft implementation. With a goal of byte-for-byte-identical-outputs (and possible bug-for-bug compatibility) for my implementation, I had to effectively start from scratch, as even the Windows documentation couldn't be fully trusted to be accurate. Ultimately, I went with fuzz testing (with rc.exe as the source of truth/oracle) as my method of choice for deciphering the behavior of the Windows resource compiler (this approach is similar to something I did with Lua a while back). This process led to a few things: * A completely clean-room implementation of a Windows resource compiler (not even any decompilation of rc.exe involved in the process) * A high degree of compatibility with the rc.exe implementation, including byte-for-byte identical outputs for a sizable corpus of Microsoft-provided sample .rc files (~500 files) * A large list of strange/interesting/baffling behaviors of the Windows resource compiler My resource compiler implementation, resinator, has now reached relative maturity and has been merged into the Zig compiler (but is also maintained as a standalone project), so I thought it might be interesting to write about all the weird stuff I found along the way. Note: While this list is thorough, it is only indicative of my current understanding of rc.exe, which can always be incorrect. Even in the process of writing this article, I found new edge cases and had to correct my implementation of certain aspects of the compiler. Who is this article for? * If you work at Microsoft, consider this a large list of bug reports (of particular note, see everything labeled 'miscompilation') + If you're Raymond Chen, then consider this an extension of/ homage to all the (fantastic, very helpful) blog posts about Windows resources in The Old New Thing * If you are a contributor to llvm-rc, windres, or wrc, consider this a long list of behaviors to test for (if strict compatibility is a goal) * If you are someone that managed to endure the bad audio of this talk I gave about my resource compiler and wanted more, consider this an extension of that talk * If you are none of the above, consider this an entertaining list of bizarre bugs/edge cases + If you'd like to skip around and check out the strangest bugs /quirks, Ctrl+F for 'utterly baffling' A brief intro to resource compilers .rc files (resource definition-script files) are scripts that contain both C/C++ preprocessor commands and resource definitions. We'll ignore the preprocessor for now and focus on resource definitions. One possible resource definition might look like this: id1 typeFOO { data"bar" } The 1 is the ID of the resource, which can be a number (ordinal) or literal (name). The FOO is the type of the resource, and in this case it's a user-defined type with the name FOO. The { "bar" } is a block that contains the data of the resource, which in this case is the string literal "bar". Not all resource definitions look exactly like this, but the part is fairly common. Resource compilers take .rc files and compile them into binary .res files: 1 RCDATA { "abc" } 00 00 00 00 20 00 00 00 .... ... FF FF 00 00 FF FF 00 00 ........ 00 00 00 00 00 00 00 00 ........ 00 00 00 00 00 00 00 00 ........ 03 00 00 00 20 00 00 00 .... ... FF FF 0A 00The predefined RCDATA resource type has ID 0x0A FF FF 01 00 ........ 00 00 00 00 30 00 09 04 ....0... 00 00 00 00 00 00 00 00 ........ 61 62 63 00 abc. A simple .rc file and a hexdump of the relevant part of the resulting .res file The .res file can then be handed off to the linker in order to include the resources in the resource table of a PE/COFF binary (.exe /.dll). The resources in the PE/COFF binary can be used for various things, like: * Executable icons that show up in Explorer * Version information that integrates with the Properties window * Defining dialogs/menus that can be loaded at runtime * Localization strings * Embedding arbitrary data * etc. [zig-ico] Both the executable's icon and the version information in the Properties window come from a compiled .rc file So, in general, a resource is a blob of data that can be referenced by an ID, plus a type that determines how that data should be interpreted. The resource(s) are embedded into compiled binaries (.exe/.dll) and can then be loaded at runtime, and/or can be loaded by the operating system for certain Windows-specific integrations. Note: rc.exe has been around since the earliest versions of Windows (the old 16-bit rc.exe has the copyright years 1985-1992). The RC compiler was updated for the 32-bit Windows NT in the early 90's, and (as far as I can tell) has been mostly untouched since. So, it's worth keeping in mind that rc.exe is very likely made up, in large part, of code written 30+ years ago. An additional bit of context worth knowing is that .rc files were/are very often generated by Visual Studio rather than manually written-by-hand, which could explain why many of the bugs/quirks detailed here have gone undetected/unfixed for so long (i.e. the Visual Studio generator just so happened not to trigger these edge cases). With that out of the way, we're ready to get into it. The list of bugs/quirks tokenizer quirk Special tokenization rules for names/IDs Here's a resource definition with a user-defined type of FOO ("user-defined" means that it's not one of the predefined resource types): 1 FOO { "bar" } For user-defined types, the (uppercased) resource type name is written as UTF-16 into the resulting .res file, so in this case FOO is written as the type of the resource, and the bytes of the string bar are written as the resource's data. So, following from this, let's try wrapping the resource type name in double quotes: 1 "FOO" { "bar" } Intuitively, you might expect that this doesn't change anything (i.e. it'll still get parsed into FOO), but in fact the Windows RC compiler will now include the quotes in the user-defined type name. That is, "FOO" will be written as the resource type name in the .res file, not FOO. This is because both resource IDs and resource types use special tokenization rules--they are basically only terminated by whitespace and nothing else (well, not exactly whitespace, it's actually any ASCII character from 0x05 to 0x20 [inclusive]). As an example: L"\r\n"123abc error{OutOfMemory}!?u8 { "bar" } Reminder: Resource IDs don't have to be integers In this case, the ID would be L"\R\N"123ABC (uppercased) and the resource type would be ERROR{OUTOFMEMORY}!?U8 (again, uppercased). --------------------------------------------------------------------- I've started with this particular quirk because it is actually demonstrative of the level of rc.exe-compatibility of the existing cross-platform resource compiler projects: * windres parses the "FOO" resource type as a regular string literal and the resource type name ends up as FOO (without the quotes) * llvm-rc errors with expected int or identifier, got "FOO" * wrc also errors with syntax error Note: This is the last time I'll be mentioning the behaviors of windres/llvm-rc/wrc, as this simple example is indicative of how much their implementations diverge from rc.exe for edge cases. See win32-samples-rc-tests for a rough approximation of the (strict) compatibility of the different Windows resource compilers on a more-or-less real-world set of .rc files. From here on out, I'll only be mentioning the behavior of resinator, my resource compiler implementation that was the impetus for the findings in this article. resinator's behavior resinator matches the resource ID/type tokenization behavior of rc.exe in all known cases. parser bug/quirk Non-ASCII digits in number literals The Windows RC compiler allows non-ASCII digit codepoints within number literals, but the resulting numeric value is arbitrary. For ASCII digit characters, the standard procedure for calculating the numeric value of an integer literal is the following: * For each digit, subtract the ASCII value of the zero character ('0') from the ASCII value of the digit to get the numeric value of the digit * Multiply the numeric value of the digit by the relevant multiple of 10, depending on the place value of the digit * Sum the result of all the digits For example, for the integer literal 123: 123 '1' - '0' = 1 '2' - '0' = 2 '3' - '0' = 3 1 * 100 = 100 2 * 10 = 20 3 * 1 = 3 [?][?][?][?][?][?][?][?][?][?][?][?] 123 integer literal numeric value of each digit numeric value of the integer literal So, how about the integer literal 123? The Windows RC compiler accepts it, but the resulting numeric value ends up being 1403. The problem is that the exact same procedure outlined above is erroneously followed for all allowed digits, so things go haywire for non-ASCII digits since the relationship between the non-ASCII digit's codepoint value and the ASCII value of '0' is arbitrary: 123 '2' - '0' = 130 1 * 100 = 100 130 * 10 = 1300 3 * 1 = 3 [?][?][?][?][?][?][?][?][?][?][?][?][?] 1403 integer literal numeric value of the 2 "digit" numeric value of the integer literal In other words, the 2 is treated as a base-10 "digit" with the value 130 (and 3 would be a base-10 "digit" with the value 131, 5 (U+1045) would be a base-10 "digit" with the value 4117, etc). This particular bug/quirk is (presumably) due to the use of the iswdigit function, and the same sort of bug/quirk exists with special COM[1-9] device names. resinator's behavior test.rc:2:3: error: non-ASCII digit characters are not allowed in number literals 123 ^~ parser bug/quirk BEGIN or { as filename Many resource types can get their data from a file, in which case their resource definition will look something like: 1 ICON "file.ico" Additionally, some resource types (like ICON) must get their data from a file. When attempting to define an ICON resource with a raw data block like so: 1 ICON BEGIN "foo" END Note: BEGIN and END are equivalent to { and } for opening/closing blocks and then trying to compile that ICON, rc.exe has a confusing error: test.rc(1) : error RC2135 : file not found: BEGIN test.rc(2) : error RC2135 : file not found: END That is, the Windows RC compiler will try to interpret BEGIN as a filename, which is extremely likely to fail and (if it succeeds) is almost certainly not what the user intended. It will then move on and continue trying to parse the file as if the first resource definition is 1 ICON BEGIN and almost certainly hit more errors, since everything afterwards will be misinterpreted just as badly. This is even worse when using { and } to open/close the block, as it triggers a separate bug: 1 ICON { "foo" } test.rc(1) : error RC2135 : file not found: ICON test.rc(2) : error RC2135 : file not found: } Somehow, the filename { causes rc.exe to think the filename token is actually the preceding token, so it's trying to interpret ICON as both the resource type and the file path of the resource. Who knows what's going on there. resinator's behavior In resinator, trying to use a raw data block with resource types that don't support raw data is an error, noting that if { or BEGIN is intended as a filename, it should use a quoted string literal. test.rc:1:8: error: expected '', found 'BEGIN' (resource type 'icon' can't use raw data) 1 ICON BEGIN ^~~~~ test.rc:1:8: note: if 'BEGIN' is intended to be a filename, it must be specified as a quoted string literal Note: windres takes a different approach and removes this restriction around raw data blocks entirely (i.e. it allows them to be used with any resource type). This is something resinator may support in the future, since it doesn't necessarily break compatibility and seems like it opens up a few potential use-cases (for example, windres supports .res -> .rc conversion where the binary data of each resource is written to the .rc file via raw data blocks). parser bug/quirk Number expressions as filenames There are multiple valid ways to specify the filename of a resource: // Quoted string, reads from the file: bar.txt 1 FOO "bar.txt" // Unquoted literal, reads from the file: bar.txt 2 FOO bar.txt // Number literal, reads from the file: 123 3 FOO 123 But that's not all, as you can also specify the filename as an arbitrarily complex number expression, like so: 1 FOO (1 | 2)+(2-1 & 0xFF) The entire (1 | 2)+(2-1 & 0xFF) expression, spaces and all, is interpreted as the filename of the resource. Want to take a guess as to which file path it tries to read the data from? Yes, that's right, 0xFF! For whatever reason, rc.exe will just take the last number literal in the expression and try to read from a file with that name, e.g. (1+2) will try to read from the path 2, and 1+-1 will try to read from the path -1 (the - sign is part of the number literal token, this will be detailed later in "Unary operators are an illusion"). resinator's behavior In resinator, trying to use a number expression as a filename is an error, noting that a quoted string literal should be used instead. Singular number literals are allowed, though (e.g. -1). test.rc:1:7: error: filename cannot be specified using a number expression, consider using a quoted string instead 1 FOO (1 | 2)+(2-1 & 0xFF) ^~~~~~~~~~~~~~~~~~~~ test.rc:1:7: note: the Win32 RC compiler would evaluate this number expression as the filename '0xFF' parser bug/quirk Incomplete resource at EOF The incomplete resource definition in the following example is an error: // A complete resource definition 1 FOO { "bar" } // An incomplete resource definition 2 FOO But it's not the error you might be expecting: test.rc(6) : error RC2135 : file not found: FOO Strangely, rc.exe will treat FOO as both the type of the resource and as a filename (similar to what we saw earlier in "BEGIN or { as filename"). If you create a file with the name FOO it will then successfully compile, and the .res will have a resource with type FOO and its data will be that of the file FOO. resinator's behavior resinator does not match the rc.exe behavior and instead always errors on this type of incomplete resource definition at the end of a file: test.rc:5:6: error: expected quoted string literal or unquoted literal; got '' 2 FOO ^ However... parser bug/quirk Dangling literal at EOF If we change the previous example to only have one dangling literal for its incomplete resource definition like so: // A complete resource definition 1 FOO { "bar" } // An incomplete resource definition FOO Then rc.exe will always successfully compile it, and it won't try to read from the file FOO. That is, a single dangling literal at the end of a file is fully allowed, and it is just treated as if it doesn't exist (there's no corresponding resource in the resulting .res file). Note: There are a few particular dangling literals that do cause an error: LANGUAGE, VERSION, CHARACTERISTICS, and STRINGTABLE. This is because those keywords can be the beginning of a top-level declaration (e.g. valid usage of CHARACTERISTICS is CHARACTERISTICS , so CHARACTERISTICS with no number afterwards triggers error RC2140 : CHARACTERISTICS not a number) It also turns out that there are three .rc files in Windows-classic-samples that (accidentally, presumably) rely on this behavior (1, 2, 3), so in order to fully pass win32-samples-rc-tests, it is necessary to allow a dangling literal at the end of a file. resinator's behavior resinator allows a single dangling literal at the end of a file, but emits a warning: test.rc:5:1: warning: dangling literal at end-of-file; this is not a problem, but it is likely a mistake FOO ^~~ parser bug/quirk, miscompilation Yes, that MENU over there (vague gesturing) As established in the intro, resource definitions typically have an id, like so: id1 FOO { "bar" } The id can be either a number ("ordinal") or a string ("name"), and the type of the id is inferred by its contents. This mostly works as you'd expect: * If the id is all digits, then it's a number/ordinal * If the id is all letters, then it's a string/name * If the id is a mix of digits and letters, then it's a string/name Here's a few examples: 123 ---> Ordinal: 123 ABC ---> Name: ABC 123ABC ---> Name: 123ABC This is relevant, because when defining DIALOG/DIALOGEX resources, there is an optional MENU statement that can specify the id of a separately defined MENU/MENUEX resource to use. From the DIALOGEX docs: Statement Description MENU Menu to be used. This value is either the name of the menuname menu or its integer identifier. Here's an example of that in action, where the DIALOGEX is attempting to specify that the MENUEX with the id of 1ABC should be used: 1ABC MENUEX <--------------+ { | // ... | } | | 1 DIALOGEX 0, 0, 640, 480 | MENU 1ABC ---------------+ { // ... } However, this is not what actually occurs, as for some reason, the MENU statement has different rules around inferring the type of the id. For the MENU statement, whenever the first character is a number, then the whole id is interpreted as a number no matter what. The value of this "number" is determined using the same bogus methodology detailed in "Non-ASCII digits in number literals", so in the case of 1ABC, the value works out to 2899: 1ABC '1' - '0' = 1 'A' - '0' = 17 'B' - '0' = 18 'C' - '0' = 19 1 * 1000 = 1000 17 * 100 = 1700 18 * 10 = 180 19 * 1 = 19 [?][?][?][?][?][?][?][?][?][?][?][?][?][?] 2899 "numeric" id numeric value of each "digit" numeric value of the id Unlike "Non-ASCII digits in number literals", though, it's now also possible to include characters in a "number" literal that have a lower ASCII value than the '0' character, meaning that attempting to get the numeric value for such a 'digit' will induce wrapping u16 overflow: 1! '1' - '0' = 1 '!' - '0' = -15 -15 = 65521 1 * 10 = 10 65521 * 1 = 65521 [?][?][?][?][?][?][?][?][?][?][?][?][?][?][?][?] 65531 "numeric" id numeric value of each "digit" numeric value of the id This is always a miscompilation In the following example using the same 1ABC ID as above: // In foo.rc 1ABC MENU BEGIN POPUP "Menu from .rc" BEGIN MENUITEM "Open File", 1 END END 1 DIALOGEX 0, 0, 275, 280 CAPTION "Dialog from .rc" MENU 1ABC BEGIN END // In main.c // ... HWND result = CreateDialogParamW(g_hInst, MAKEINTRESOURCE(1), hwnd, DialogProc, (LPARAM)NULL); // ... This CreateDialogParamW call will fail with The specified resource name cannot be found in the image file because, when loading the dialog, it will attempt to look for a menu resource with an integer ID of 2899. If we add such a MENU to the .rc file: 2899 MENU BEGIN POPUP "Wrong menu from .rc" BEGIN MENUITEM "Destroy File", 1 END END then the dialog will successfully load with this new menu, but it's pretty obvious this is not what was intended: [rc-wrong-m] The misinterpretation of the ID can (at best) lead to an unexpected menu being loaded A related, but inconsequential, inconsistency As mentioned in "Special tokenization rules for names/IDs", when the id of a resource is a string/name, it is uppercased before being written to the .res file. This uppercasing is not done for the MENU statement of a DIALOG/DIALOGEX resource, so in this example: abc MENUEX { // ... } 1 DIALOGEX 0, 0, 640, 480 MENU abc { // ... } The id of the MENUEX resource would be compiled as ABC, but the DIALOGEX would write the id of its menu as abc. This ends up not mattering, though, because it appears that LoadMenu uses a case-insensitive lookup. resinator's behavior resinator avoids the miscompilation and treats the id parameter of MENU statements in DIALOG/DIALOGEX resources exactly the same as the id of MENU resources. test.rc:3:8: warning: the id of this menu would be miscompiled by the Win32 RC compiler MENU 1ABC ^~~~ test.rc:3:8: note: the Win32 RC compiler would evaluate the id as the ordinal/number value 2899 test.rc:3:8: note: to avoid the potential miscompilation, the first character of the id should not be a digit parser bug/quirk If you're not last, you're irrelevant Many resource types have optional statements that can be specified between the resource type and the beginning of its body, e.g. 1 ACCELERATORS LANGUAGE 0x09, 0x01 CHARACTERISTICS 0x1234 VERSION 1 { // ... } Specifying multiple statements of the same type within a single resource definition is allowed, and the last occurrence of each statement type is the one that takes precedence, so the following would compile to the exact same .res as the example above: 1 ACCELERATORS CHARACTERISTICS 1 LANGUAGE 0xFF, 0xFF LANGUAGE 0x09, 0x01 CHARACTERISTICS 999 CHARACTERISTICS 0x1234 VERSION 999 VERSION 1 { // ... } This is not necessarily a problem on its own (although I think it should at least be a warning), but it can inadvertently lead to some bizarre behavior, as we'll see in the next bug/quirk. resinator's behavior resinator matches the Windows RC compiler behavior, but emits a warning for each ignored statement: test.rc:2:3: warning: this statement was ignored; when multiple statements of the same type are specified, only the last takes precedence CHARACTERISTICS 1 ^~~~~~~~~~~~~~~~~ test.rc:3:3: warning: this statement was ignored; when multiple statements of the same type are specified, only the last takes precedence LANGUAGE 0xFF, 0xFF ^~~~~~~~~~~~~~~~~~~ test.rc:5:3: warning: this statement was ignored; when multiple statements of the same type are specified, only the last takes precedence CHARACTERISTICS 999 ^~~~~~~~~~~~~~~~~~~ test.rc:7:3: warning: this statement was ignored; when multiple statements of the same type are specified, only the last takes precedence VERSION 999 ^~~~~~~~~~~ parser bug/quirk, miscompilation Once a number, always a number The behavior described in "Yes, that MENU over there (vague gesturing)" can also be induced in both CLASS and MENU statements of DIALOG/DIALOGEX resources via redundant statements. As seen in "If you're not last, you're irrelevant", multiple statements of the same type are allowed to be specified without much issue, but in the case of CLASS and MENU, if any of the duplicate statements are interpreted as a number, then the value of last statement of its type (the only one that matters) is always interpreted as a number no matter what it contains. 1 DIALOGEX 0, 0, 640, 480 MENU 123 // ignored, but causes the string below to be evaluated as a number MENU IM_A_STRING_I_SWEAR ----> 8360 CLASS 123 // ignored, but causes the string below to be evaluated as a number CLASS "Seriously, I'm a string" ----> 55127 { // ... } The algorithm for coercing the strings to a number is the same as the one outlined in "Yes, that MENU over there (vague gesturing)", and, for the same reasons discussed there, this too is always a miscompilation. resinator's behavior resinator avoids the miscompilation and emits warnings: test.rc:2:3: warning: this statement was ignored; when multiple statements of the same type are specified, only the last takes precedence MENU 123 ^~~~~~~~ test.rc:4:3: warning: this statement was ignored; when multiple statements of the same type are specified, only the last takes precedence CLASS 123 ^~~~~~~~~ test.rc:5:9: warning: this class would be miscompiled by the Win32 RC compiler CLASS "Seriously, I'm a string" ^~~~~~~~~~~~~~~~~~~~~~~~~ test.rc:5:9: note: the Win32 RC compiler would evaluate it as the ordinal/number value 55127 test.rc:5:9: note: to avoid the potential miscompilation, only specify one class per dialog resource test.rc:3:8: warning: the id of this menu would be miscompiled by the Win32 RC compiler MENU IM_A_STRING_I_SWEAR ^~~~~~~~~~~~~~~~~~~ test.rc:3:8: note: the Win32 RC compiler would evaluate the id as the ordinal/number value 8360 test.rc:3:8: note: to avoid the potential miscompilation, only specify one menu per dialog resource parser bug/quirk L is not allowed there Like in C, an integer literal can be suffixed with L to signify that it is a 'long' integer literal. In the case of the Windows RC compiler, integer literals are typically 16 bits wide, and suffixing an integer literal with L will instead make it 32 bits wide. 1 RCDATA { 1, 2L } 01 00 02 00 00 00 An RCDATA resource definition and a hexdump of the resulting data in the .res file However, outside of raw data blocks like the RCDATA example above, the L suffix is typically meaningless, as it has no bearing on the size of the integer used. For example, DIALOG resources have x, y, width, and height parameters, and they are each encoded in the data as a u16 regardless of the integer literal used. If the value would overflow a u16, then the value is truncated back down to a u16, meaning in the following example all 4 parameters after DIALOG get compiled down to 1 as a u16: 1 DIALOG 1, 1L, 65537, 65537L {} The maximum value of a u16 is 65535 A few particular parameters, though, fully disallow integer literals with the L suffix from being used: * Any of the four parameters of the FILEVERSION statement of a VERSIONINFO resource * Any of the four parameters of the PRODUCTVERSION statement of a VERSIONINFO resource * Any of the two parameters of a LANGUAGE statement LANGUAGE 1L, 2 test.rc(1) : error RC2145 : PRIMARY LANGUAGE ID too large 1 VERSIONINFO FILEVERSION 1L, 2, 3, 4 BEGIN // ... END test.rc(2) : error RC2127 : version WORDs separated by commas expected It is true that these parameters are limited to u16, so using an L suffix is likely a mistake, but that is also true of many other parameters for which the Windows RC compiler happily allows L suffixed numbers for. It's unclear why these particular parameters are singled out, and even more unclear given the fact that specifying these parameters using an integer literal that would overflow a u16 does not actually trigger an error (and instead it truncates the values to a u16): 1 VERSIONINFO FILEVERSION 65537, 65538, 65539, 65540 BEGIN END The compiled FILEVERSION in this case will be 1, 2, 3, 4: 65537 = 0x10001; truncated to u16 = 0x0001 65538 = 0x10002; truncated to u16 = 0x0002 65539 = 0x10003; truncated to u16 = 0x0003 65540 = 0x10004; truncated to u16 = 0x0004 resinator's behavior resinator allows L suffixed integer literals everywhere and truncates the value down to the appropriate number of bits when necessary. test.rc:1:10: warning: this language parameter would be an error in the Win32 RC compiler LANGUAGE 1L, 2 ^~ test.rc:1:10: note: to avoid the error, remove any L suffixes from numbers within the parameter parser bug/quirk Unary operators are an illusion Typically, unary +, -, etc. operators are just that--operators; they are separate tokens that act on other tokens (number literals, variables, etc). However, in the Windows RC compiler, they are not real operators. Unary - The unary - is included as part of a number literal, not as a distinct operator. This behavior can be confirmed in a rather strange way, taking advantage of a separate quirk described in "Number expressions as filenames". When a resource's filename is specified as a number expression, the file path it ultimately looks for is the last number literal in the expression, so for example: 1 FOO (567 + 123) test.rc(1) : error RC2135 : file not found: 123 And if we throw in a unary - like so, then it gets included as part of the filename: 1 FOO (567 + -123) test.rc(1) : error RC2135 : file not found: -123 This quirk leads to a few unexpected valid patterns, since - on its own is also considered a valid number literal (and it resolves to 0), so: 1 FOO { 1-- } evaluates to 1-0 and results in 1 being written to the resource's data, while: 1 FOO { "str" - 1 } looks like a string literal minus 1, but it's actually interpreted as 3 separate raw data values (str, - [which evaluates to 0], and 1), since commas between data values in a raw data block are optional. Additionally, it means that otherwise valid looking expressions may not actually be considered valid: 1 FOO { (-(123)) } test.rc(1) : error RC1013 : mismatched parentheses Unary ~ The unary NOT (~) works exactly the same as the unary - and has all the same quirks. For example, a ~ on its own is also a valid number literal: 1 FOO { ~ } Data is a u16 with the value 0xFFFF And ~L (to turn the integer into a u32) is valid in the same way that -L would be valid: 1 FOO { ~L } Data is a u32 with the value 0xFFFFFFFF Unary + The unary + is almost entirely a hallucination; it can be used in some places, but not others, without any discernible rhyme or reason. This is valid (and the parameters evaluate to 1, 2, 3, 4 as expected): 1 DIALOG +1, +2, +3, +4 {} but this is an error: 1 FOO { +123 } test.rc(1) : error RC2164 : unexpected value in RCDATA and so is this: 1 DIALOG (+1), 2, 3, 4 {} test.rc(1) : error RC2237 : numeric value expected at DIALOG Because the rules around the unary + are so opaque, I am unsure if it shares many of the same properties as the unary -. I do know, though, that + on its own does not seem to be an accepted number literal in any case I've seen so far. resinator's behavior resinator matches the Windows RC compiler's behavior around unary -/ ~, but disallows unary + entirely: test.rc:1:10: error: expected number or number expression; got '+' 1 DIALOG +1, +2, +3, +4 {} ^ test.rc:1:10: note: the Win32 RC compiler may accept '+' as a unary operator here, but it is not supported in this implementation; consider omitting the unary + miscompilation Your fate will be determined by a comma Version information is specified using key/value pairs within VERSIONINFO resources. In the compiled .res file, the value data should always start at a 4-byte boundary, so after the key data is written, a variable number of padding bytes are written to get back to 4-byte alignment: 1 VERSIONINFO { VALUE "key", "value" } ......k.e.y..... v.a.l.u.e....... Two padding bytes are inserted after the key to get back to 4-byte alignment However, if the comma between the key and value is omitted, then for whatever reason the padding bytes are also omitted: 1 VERSIONINFO { VALUE "key" "value" } ......k.e.y...v. a.l.u.e......... Without the comma between "key" and "value", the padding bytes are not written The problem here is that consumers of the VERSIONINFO resource (e.g. VerQueryValue) will expect the padding bytes, so it will try to read the value as if the padding bytes were there. For example, with the simple "key" "value" example: VerQueryValueW(verbuf, L"\\key", &querybuf, &querysize); wprintf(L"%s\n", querybuf); Which will print: alue Plus, depending on the length of the key string, it can end up being even worse, since the value could end up being written over the top of the null terminator of the key. Here's an example: 1 VERSIONINFO { VALUE "ke" "value" } ......k.e.v.a.l. u.e............. And the problems don't end there--VERSIONINFO is compiled into a tree structure, meaning the misreading of one node affects the reading of future nodes. Here's a (simplified) real-world VERSIONINFO resource definition from a random .rc file in Windows-classic-samples: VS_VERSION_INFO VERSIONINFO BEGIN BLOCK "StringFileInfo" BEGIN BLOCK "040904e4" BEGIN VALUE "CompanyName", "Microsoft" VALUE "FileDescription", "AmbientLightAware" VALUE "FileVersion", "1.0.0.1" VALUE "InternalName", "AmbientLightAware.exe" VALUE "LegalCopyright", "(c) Microsoft. All rights reserved." VALUE "OriginalFilename", "AmbientLightAware.exe" VALUE "ProductName", "AmbientLightAware" VALUE "ProductVersion", "1.0.0.1" END END BLOCK "VarFileInfo" BEGIN VALUE "Translation", 0x409, 1252 END END and here's the Properties window of an .exe compiled with and without commas between all the key/value pairs: [versioninf] Correct version information with commas included... [versioninf] ...but completely broken if the commas are omitted Note: This miscompilation will only occur when: * the comma between the key and the first value in a VALUE statement is omitted, and * the first value is a quoted string That is, VALUE "key" "value" will miscompile but VALUE "key", "value" or VALUE "key" 1 won't. resinator's behavior resinator avoids the miscompilation (always inserts the necessary padding bytes) and emits a warning. test.rc:2:15: warning: the padding before this quoted string value would be miscompiled by the Win32 RC compiler VALUE "key" "value" ^~~~~~~ test.rc:2:15: note: to avoid the potential miscompilation, consider adding a comma between the key and the quoted string miscompilation Mismatch in length units in VERSIONINFO nodes A VALUE within a VERSIONINFO resource is specified using this syntax: VALUE , The value(s) can be specified as either number literals or quoted string literals, like so: 1 VERSIONINFO { VALUE "numbers", 123, 456 VALUE "strings", "foo", "bar" } Each VALUE is compiled into a structure that contains the length of its value data, but the unit used for the length varies: * For strings, the string data is written as UTF-16, and the length is given in UTF-16 code units (2 bytes per code unit) * For numbers, the numbers are written either as u16 or u32 (depending on the presence of an L suffix), and the length is given in bytes So, for the above example, the "numbers" value would be compiled into a node with: * "Binary" data, meaning the length is given in bytes * A length of 4, since each number literal is compiled as a u16 * Data bytes of 7B 00 C8 01, where 7B 00 is 123 and C8 01 is 456 (as little-endian u16) and the "strings" value would be compiled into a node with: * "String" data, meaning the length is given in UTF-16 code units * A length of 8, since each string is 3 UTF-16 code units plus a NUL-terminator * Data bytes of 66 00 6F 00 6F 00 00 00 62 00 61 00 72 00 00 00, where 66 00 6F 00 6F 00 00 00 is "foo" and 62 00 61 00 72 00 00 00 is "bar" (both as NUL-terminated little-endian UTF-16) This is a bit bizarre, but when separated out like this it works fine. The problem is that there is nothing stopping you from mixing strings and numbers in one value, in which case the Windows RC compiler freaks out and writes the type as "binary" (meaning the length should be interpreted as a byte count), but the length as a mixture of byte count and UTF-16 code unit count. For example, with this resource: 1 VERSIONINFO { VALUE "something", "foo", 123 } Its value's data will get compiled into these bytes: 66 00 6F 00 6F 00 00 00 7B 00, where 66 00 6F 00 6F 00 00 00 is "foo" (as NUL-terminated little-endian UTF-16) and 7B 00 is 123 (as a little-endian u16). This makes for a total of 10 bytes (8 for "foo", 2 for 123), but the Windows RC compiler erroneously reports the value's data length as 6 (4 for "foo" [counted as UTF-16 code units], and 2 for 123 [counted as bytes]). This miscompilation has similar results as those detailed in "Your fate will be determined by a comma": * The full data of the value will not be read by a parser * Due to the tree structure of VERSIONINFO resource data, this has knock-on effects on all following nodes, meaning the entire resource will be mangled Note: There is a The Old New Thing post that mentions this bug/quirk, check it out if you want some additional details The return of the meaningful comma Before, I said that string values were compiled as NUL-terminated UTF-16 strings, but this is only the case when either: * It is the last data element of a VALUE, or * There is a comma separating it from the element after it So, this: 1 VERSIONINFO { VALUE "strings", "foo", "bar" } will be compiled with a NUL terminator after both foo and bar, but this: 1 VERSIONINFO { VALUE "strings", "foo" "bar" } will be compiled only with a NUL terminator after bar. This is also similar to "Your fate will be determined by a comma", but unlike that comma quirk, I don't consider this one a miscompilation because the result is not invalid/mangled, and there is a possible use-case for this behavior (concatenating two or more string literals together). However, this behavior is not mentioned in the documentation, so it's unclear if it's actually intended. resinator's behavior resinator avoids the length-related miscompilation and emits a warning: test.rc:2:22: warning: the byte count of this value would be miscompiled by the Win32 RC compiler VALUE "something", "foo", 123 ^~~~~~~~~~ test.rc:2:22: note: to avoid the potential miscompilation, do not mix numbers and strings within a value but matches the "meaningful comma" behavior of the Windows RC compiler. fundamental concept Turning off flags with NOT expressions Let's say you wanted to define a dialog resource with a button, but you wanted the button to start invisible. You'd do this with a NOT expression in the "style" parameter of the button like so: 1 DIALOGEX 0, 0, 282, 239 { PUSHBUTTON "Cancel",1,129,212,50,14, NOT WS_VISIBLE } Since WS_VISIBLE is set by default, this will unset it and make the button invisible. If there are any other flags that should be applied, they can be bitwise OR'd like so: 1 DIALOGEX 0, 0, 282, 239 { PUSHBUTTON "Cancel",1,129,212,50,14, NOT WS_VISIBLE | BS_VCENTER } WS_VISIBLE and BS_VCENTER are just numbers under-the-hood. For simplicity's sake, let's pretend their values are 0x1 for WS_VISIBLE and 0x2 for BS_VCENTER and then focus on this simplified NOT expression: NOT 0x1 | 0x2 Since WS_VISIBLE is on by default, the default value of these flags is 0x1, and so the resulting value is evaluated like this: operation binary representation of the result hex representation of the result Default value: 0x1 0000 0001 0x1 NOT 0x1 0000 0000 0x0 | 0x2 0000 0010 0x2 Ordering matters as well. If we switch the expression to: NOT 0x1 | 0x1 then we end up with 0x1 as the result: operation binary representation of the result hex representation of the result Default value: 0x1 0000 0001 0x1 NOT 0x1 0000 0000 0x0 | 0x1 0000 0001 0x1 If, instead, the ordering was reversed like so: 0x1 | NOT 0x1 then the value at the end would be 0x0: operation binary representation of the result hex representation of the result Default value: 0x1 0000 0001 0x1 0x1 0000 0001 0x1 | NOT 0x1 0000 0000 0x0 With these basic examples, NOT seems pretty straightforward, however... utterly baffling NOT is incomprehensible Practically any deviation outside the simple examples outlined in Turning off flags with NOT expressions leads to bizarre and inexplicable results. For example, these expressions are all accepted by the Windows RC compiler: * NOT (1 | 2) * NOT () 2 * 7 NOT NOT 4 NOT 2 NOT NOT 1 The first one looks like it makes sense, as intuitively the (1 | 2) would be evaluated first so in theory it should be equivalent to NOT 3. However, if the default value of the flags is 0, then the expression NOT (1 | 2) (somehow) evaluates to 2, whereas NOT 3 would evaluate to 0. NOT () 2 seems like it should obviously be a syntax error, but for whatever reason it's accepted by the Windows RC compiler and also evaluates to 2. 7 NOT NOT 4 NOT 2 NOT NOT 1 is entirely incomprehensible, and just as incomprehensibly, it also results in 2 (if the default value is 0). This behavior is so bizarre and obviously incorrect that I didn't even try to understand what's going on here, so your guess is as good as mine on this one. resinator's behavior resinator only accepts NOT , anything else is an error: test.rc:2:13: error: expected '', got '(' STYLE NOT () 2 ^ All 3 of the above examples lead to compile errors in resinator. parser bug/quirk NOT can be used in places it makes no sense The strangeness of NOT doesn't end there, as the Windows RC compiler also allows it to be used in many (but not all) places that a number expression can be used. As an example, here are NOT expressions used in the x, y, width, and height arguments of a DIALOGEX resource: 1 DIALOGEX NOT 1, NOT 2, NOT 3, NOT 4 { // ... } This doesn't necessarily cause problems, but since NOT is only useful in the context of turning off enabled-by-default flags of a bit flag parameter, there's no reason to allow NOT expressions outside of that context. However, there is an extra bit of weirdness involved here, since certain NOT expressions cause errors in some places but not others. For example, the expression 1 | NOT 2 is an error if it's used in the type parameter of a MENUEX's MENUITEM, but NOT 2 | 1 is totally accepted. 1 MENUEX { // Error: numeric value expected at NOT MENUITEM "bar", 101, 1 | NOT 2 // No error if the NOT is moved to the left of the bitwise OR MENUITEM "foo", 100, NOT 2 | 1 } Note: 1 | NOT 2 is legal in all bit flag parameters (where the use of NOT actually makes sense). resinator's behavior resinator errors if NOT expressions are attempted to be used outside of bit flag parameters: test.rc:1:12: error: expected number or number expression; got 'NOT' 1 DIALOGEX NOT 1, NOT 2, NOT 3, NOT 4 ^~~ miscompilation, crash No one has thought about FONT resources for decades As far as I can tell, the FONT resource has exactly one purpose: creating .fon files, which are resource-only .dlls (i.e. a .dll with resources, but no entry point) renamed to have a .fon extension. Such .fon files contain a collection of fonts in the obsolete .fnt font format. The .fon format is mostly obsolete, but is still supported in modern Windows, and Windows still ships with some .fon files included: [windows-10] The Terminal font included in Windows 10 is a .fon file This .fon-related purpose for the FONT resource, however, has been irrelevant for decades, and, as far as I can tell, has not worked fully correctly since the 16-bit version of the Windows RC compiler. To understand why, though, we have to understand a little bit about the .fnt format. In version 1 of the .fnt format, specified by the Windows 1.03 SDK from 1986, the total size of all the static fields in the header was 117 bytes, with a few fields containing offsets to variable-length data elsewhere in the file. Here's a (truncated) visualization, with some relevant 'offset' fields expanded: ....version.... ......size..... ...copyright... ......type..... . . . etc . . . . . . etc . . . .device_offset. ---> NUL-terminated device name. ..face_offset.. ---> NUL-terminated font face name. ....bits_ptr... ..bits_offset.. In version 3 of the .fnt format (and presumably version 2, but I can't find much info about version 2), all of the fields up to and including bits_offset are the same, but there are an additional 31 bytes of new fields, making for a total size of 148 bytes: ....version.... . . . etc . . . . . . etc . . . .device_offset. ..face_offset.. ....bits_ptr... ..bits_offset.. ....reserved... <-+ .....flags..... <-+ .....aspace.... <-+ .....bspace.... <-+-- new fields .....cspace.... <-+ ...color_ptr... <-+ ...reserved1... | ............... <-+ ............... Getting back to resource compilation, FONT resources within .rc files are collected and compiled into the following resources: * A RT_FONT resource for each FONT, where the data is the verbatim file contents of the .fnt file * A FONTDIR resource that contains data about each font, in the format specified by FONTGROUPHDR + side note: the string FONTDIR is the type of this resource, it doesn't have an associated integer ID like most other Windows-defined resources do Within the FONTDIR resource, there is a FONTDIRENTRY for each font, containing much of the information in the .fnt header. In fact, the data actually matches the version 1 .fnt header almost exactly, with only a few differences at the end: .fnt version 1 FONTDIRENTRY ....version.... == ...dfVersion... ......size..... == .....dfSize.... ...copyright... == ..dfCopyright.. ......type..... == .....dfType.... . . . etc . . . == . . . etc . . . . . . etc . . . == . . . etc . . . .device_offset. == ....dfDevice... ..face_offset.. == .....dfFace.... ....bits_ptr... =? ...dfReserved.. ..bits_offset.. NUL-terminated device name. NUL-terminated font face name. The formats match, except FONTDIRENTRY does not include bits_offset and instead it has trailing variable-length strings This documented FONTDIRENTRY is what the obsolete 16-bit version of rc.exe outputs: 113 bytes plus two variable-length NUL-terminated strings at the end. However, starting with the 32-bit resource compiler, contrary to the documentation, rc.exe now outputs FONTDIRENTRY as 148 bytes plus the two variable-length NUL-terminated strings. You might notice that this 148 number has come up before; it's the size of the .fnt version 3 header. So, starting with the 32-bit rc.exe, FONTDIRENTRY as-written-by-the-resource-compiler is effectively the first 148 bytes of the .fnt file, plus the two strings located at the positions given by the device_offset and face_offset fields. Or, at least, that's clearly the intention, but this is labeled 'miscompilation' for a reason. Let's take this example .fnt file for instance: ....version.... . . . etc . . . . . . etc . . . .device_offset. ---> some device. ..face_offset.. ---> some font face. . . . etc . . . . . . etc . . . ...reserved1... ............... ............... When compiled with the old 16-bit Windows RC compiler, some device and some font face are written as trailing strings in the FONTDIRENTRY (as expected), but when compiled with the modern rc.exe, both strings get written as 0-length (only a NUL terminator). The reason why is rather silly, so let's go through it. Here's the documented FONTDIRENTRY format again, this time with some annotations: FONTDIRENTRY -113 ...dfVersion... (2 bytes) -111 .....dfSize.... (4 bytes) -107 ..dfCopyright.. (60 bytes) -47 .....dfType.... (2 bytes) . . . etc . . . . . . etc . . . -12 ....dfDevice... (4 bytes) -8 .....dfFace.... (4 bytes) -4 ...dfReserved.. (4 bytes) The numbers on the left represent the offset from the end of the FONTDIRENTRY data to the start of the field It turns out that the Windows RC compiler uses the offset from the end of FONTDIRENTRY to get the values of the dfDevice and dfFace fields. This works fine when those offsets are unchanging, but, as we've seen, the Windows RC compiler now uses an undocumented FONTDIRENTRY definition that is is 35 bytes longer, but these hardcoded offsets were never updated accordingly. This means that the Windows RC compiler is actually attempting to read the dfDevice and dfFace fields from this part of the .fnt version 3 header: ....version.... . . . etc . . . . . . etc . . . .device_offset. ..face_offset.. . . . etc . . . . . . etc . . . -12 ...reserved1... ---> ??? -8 ............... ---> ??? -4 ............... The Windows RC compiler reads data from the reserved1 field and interprets it as dfDevice and dfFace Because this bug happens to end up reading data from a reserved field, it's very likely for that data to just contain zeroes, which means it will try to read the NUL-terminated strings starting at offset 0 from the start of the file. As a second coincidence, the first field of a .fnt file is a u16 containing the version, and the only versions I'm aware of are: * Version 1, 0x0100 encoded as little-endian, so the bytes at offset 0 are 00 01 * Version 2, 0x0200 encoded as little-endian, so the bytes at offset 0 are 00 02 * Version 3, 0x0300 encoded as little-endian, so the bytes at offset 0 are 00 03 In all three cases, the first byte is 0x00, meaning attempting to read a NUL terminated string from offset 0 always ends up with a 0-length string for all known/valid .fnt versions. So, in practice, the Windows RC compiler almost always writes the trailing szDeviceName and szFaceName strings as 0-length strings. As a final coincidence, the offset here is -12 and -8 only because the original FONTDIRENTRY definition chose to omit the bits_offset field from the .fnt version 1 format. If it contained the full .fnt version 1 header data, the dfDevice offset would have been -16, meaning it would have ended up interpreting the color_ptr field of the .fnt version 3 format as an offset instead, likely leading to more bogus (> 0-length) strings being written to the trailing data (and therefore possibly leading to this bug being found/fixed earlier). This behavior can be confirmed by crafting a .fnt file with actual offsets to NUL-terminated strings within the reserved data field that the Windows RC compiler erroneously reads from: ....version.... . . . etc . . . . . . etc . . . .device_offset. ---> some device. ..face_offset.. ---> some font face. . . . etc . . . . . . etc . . . ...reserved1... ---> i dare you to read me. ............... ---> you wouldn't. ............... Compiling such a FONT resource, we do indeed see that the strings i dare you to read me and you wouldn't are written to the FONTDIRENTRY for this FONT rather than some device and some font face. Does any of this even matter? Well, no, not really. The whole concept of the FONTDIR containing information about all the RT_FONT resources is something of a historical relic, likely only relevant when resources were constrained enough that having an overview of the font data all in once place allowed for optimization opportunities that made a difference. From what I can tell, though, on modern Windows, the FONTDIR resource is ignored entirely: * Linker implementations will happily link .res files that contain RT_FONT resources with no FONTDIR resource * Windows will happily load/install .fon files that contain RT_FONT resources with no FONTDIR resource However, there are a few caveats... Misuse of the FONT resource for non-.fnt fonts I'm not sure how prevalent this is, but it can be forgiven that someone might not realize that FONT is only intended to be used with a font format that has been obsolete for multiple decades, and try to use the FONT resource with a modern font format. In fact, there is one Microsoft-provided Windows-classic-samples example program that uses FONT resources with .ttf files to include custom fonts in a program: Win7Samples/multimedia/DirectWrite/ CustomFont. This is meant to be an example of using the DirectWrite APIs described here, but this is almost certainly a misuse of the FONT resource. Other examples, however, use user-defined resource types for including .ttf font files, which seems like the correct choice. When using non-.fnt files with the FONT resource, the resulting FONTDIRENTRY will be made up of garbage, since it effectively just takes the first 148 bytes of the file and stuffs it into the FONTDIRENTRY format. An additional complication with this is that the Windows RC compiler will still try to read NUL-terminated strings using the offsets from the dfDevice and dfFace fields (or at least, where it thinks they are). These offset values, in turn, will have much more variance since the format of .fnt and .ttf are so different. This means that using FONT with .ttf files may lead to errors, since... "Negative" offsets lead to errors For who knows what reason, the dfDevice and dfFace values are seemingly treated as signed integers, even though they ostensibly contain an offset from the beginning of the .fnt file, so a negative value makes no sense. When the sign bit is set in either of these fields, the Windows RC compiler will error with: fatal error RW1023: I/O error seeking in file This means that, for some subset of valid .ttf files (or other non-.fnt font formats), the Windows RC compiler will fail with this error. Other oddities and crashes * If the font file is 140 bytes or fewer, the Windows RC compiler seems to default to a dfFace of 0 (as the [incorrect] location of the dfFace field is past the end of the file). * If the file is 75 bytes or smaller with no 0x00 bytes, the FONTDIR data for it will be 149 bytes (the first n being the bytes from the file, then the rest are 0x00 padding bytes). After that, there will be n bytes from the file again, and then a final 0x00. * If the file is between 76 and 140 bytes long with no 0x00 bytes, the Windows RC compiler will crash. resinator's behavior I'm still not quite sure what the best course of action is here. I've written up what I see as the possibilities here, and for now I've gone with what I'm calling the "semi-compatibility while avoiding the sharp edges" approach: Do something similar enough to the Win32 compiler in the common case, but avoid emulating the buggy behavior where it makes sense. That would look like a FONTDIRENTRY with the following format: + The first 148 bytes from the file verbatim, with no interpretation whatsoever, followed by two NUL bytes (corresponding to 'device name' and 'face name' both being zero length strings) This would allow the FONTDIR to match byte-for-byte with the Win32 RC compiler in the common case (since very often the misinterpreted dfDevice/dfFace will be 0 or point somewhere outside the bounds of the file and therefore will be written as a zero-length string anyway), and only differ in the case where the Win32 RC compiler writes some bogus string(s) to the szDeviceName /szFaceName. This also enables the use-case of non-.FNT files without any loose ends. In short: write the new/undocumented FONTDIRENTRY format, but avoid the crashes, avoid the negative integer-related errors, and always write szDeviceName and szFaceName as 0-length. fundamental concept The involvement of a C/C++ preprocessor In the intro, I said: .rc files are scripts that contain both C/C++ preprocessor commands and resource definitions. So far, I've only focused on resource definitions, but the involvement of the C/C++ preprocessor cannot be ignored. From the About Resource Files documentation: The syntax and semantics for the RC preprocessor are similar to those of the Microsoft C/C++ compiler. However, RC supports a subset of the preprocessor directives, defines, and pragmas in a script. The primary use-case for this is two-fold: * Inclusion of C/C++ headers within a .rc file to pull in constants, e.g. #include to allow usage of window style constants like WS_VISIBLE, WS_BORDER, etc. * Being able to share a .h file between your .rc file and your C/ C++ source files, where the .h file contains things like the IDs of various resources. Here's some snippets that demonstrate both use-cases: // in resource.h #define DIALOG_ID 123 #define BUTTON_ID 234 // in resource.rc #include #include "resource.h" // DIALOG_ID comes from resource.h DIALOG_ID DIALOGEX 0, 0, 282, 239 // These style constants come from windows.h STYLE DS_SETFONT | DS_MODALFRAME | DS_CENTER | WS_POPUP | WS_CAPTION | WS_SYSMENU CAPTION "Dialog" { // BUTTON_ID comes from resource.h PUSHBUTTON "Button", BUTTON_ID, 129, 182, 50, 14 } // in main.c #include #include "resource.h" // ... // DIALOG_ID comes from resource.h HWND result = CreateDialogParamW(hInst, MAKEINTRESOURCEW(DIALOG_ID), hwnd, DialogProc, (LPARAM)NULL); // ... // ... // BUTTON_ID comes from resource.h HWND button = GetDlgItem(hwnd, BUTTON_ID); // ... With this setup, changing DIALOG_ID/BUTTON_ID in resource.h affects both resource.rc and main.c, so they are always kept in sync. Note: Knowing all of the particularities of the resource-compiler-flavored-preprocessor isn't important for now. If you're curious, see the docs for Predefined Macros, Preprocessor Directives, Preprocessor Operators, and Pragma Directives for the documented capabilties of the Windows RC compiler's preprocessor. preprocessor bug/quirk, parser bug/quirk Multiline strings don't behave as expected/documented Within the STRINGTABLE resource documentation we see this statement: The string [...] must occupy a single line in the source file (unless a '\' is used as a line continuation). Note: While this documentation is for STRINGTABLE, I believe it actually applies to string literals in .rc files generally. In other words, the bugs/quirks described below apply to all resource types. This is similar to the rules around C strings: char *my_string = "Line 1 Line 2"; multilinestring.c:1:19: error: missing terminating '"' character char *my_string = "Line 1 ^ Splitting a string across multiple lines without using \ is an error in C char *my_string = "Line 1 \ Line 2"; printf("%s\n", my_string); results in: Line 1 Line 2 And yet, contrary to the documentation, splitting a string across multiple lines without \ continuations is not an error in the Windows RC compiler. Here's an example: 1 RCDATA { "foo bar" } This will successfully compile, and the data of the RCDATA resource will end up as 66 6F 6F 20 0A 62 61 72 foo space.\nbar I'm not sure why this is allowed, and I also don't have an explanation for why a space character sneaks into the resulting data out of nowhere. It's also worth noting that whitespace is collapsed in these should-be-invalid multiline strings. For example, this: "foo bar" will get compiled into exactly the same data as above (with only a space and a newline between foo and bar). But, this on its own is only a minor nuisance from the perspective of implementing a resource compiler--it is undocumented behavior, but it's pretty easy to account for. The real problems start when someone actually uses \ as intended. The collapse of whitespace is imminent C pop quiz: what will get printed in this example (i.e. what will my_string evaluate to)? char *my_string = "Line 1 \ Line 2"; #include int main() { printf("%s\n", my_string); return 0; } Let's compile it with a few different compilers to find out: > zig run multilinestring.c -lc Line 1 Line 2 > clang multilinestring.c > a.exe Line 1 Line 2 > cl.exe multilinestring.c > multilinestring.exe Line 1 Line 2 That is, the whitespace preceding "Line 2" is included in the string literal. Note: In most C compiler implementations, the \ is processed and removed during the preprocessing step. For example, here's the result when running the input through the clang preprocessor: > clang -E -xc multilinestring.c # 1 "multilinestring.c" 2 char *my_string = "Line 1 Line 2"; Something odd, though, is that this is not how the MSVC compiler works. When running its preprocessor, we end up with this: > cl.exe /E multilinestring.c #line 1 "multilinestring.c" char *my_string = "Line 1 \ Line 2"; so in the MSVC implementation, it's up to the C parser to handle the \. This is not fully relevant to the Windows RC compiler, but it was surprising to me. However, the Windows RC compiler behaves differently here. If we pass the same example through its preprocessor, we end up with: #line 1 "multilinestring.c" char *my_string = "Line 1 \ Line 2"; 1. The \ remains (similar to the MSVC compiler, see the note above) 2. The whitespace before "Line 2" is removed So the value of my_string would be Line 1 Line 2 (well, not really, since char *my_string = doesn't have a meaning in .rc files, but you get the idea). This divergence in behavior from C has practical consequences: in this .rc file from one of the Windows-classic-samples example programs, we see the following, which takes advantage of the rc.exe-preprocessor-specific-whitespace-collapsing behavior: STRINGTABLE BEGIN // ... IDS_MESSAGETEMPLATEFS "The drop target is %s.\n\ %d files/directories in HDROP\n\ The path to the first object is\n\ \t%s." // ... END Plus, in certain circumstances, this difference between rc.exe and C (like other differences to C) can lead to bugs. This is a rather contrived example, but here's one way things could go wrong: // In foo.h #define FOO_TEXT "foo \ bar" #define IDC_BUTTON_FOO 1001 // In foo.rc #include "foo.h" 1 DIALOGEX 0, 0, 275, 280 BEGIN PUSHBUTTON FOO_TEXT, IDC_BUTTON_FOO, 7, 73, 93, 14 END // In main.c #include "foo.h" // ... HWND hFooBtn = GetDlgItem(hDlg, IDC_BUTTON_FOO); // Let's say the button text was changed while it was hovered // and now we want to set it back to the default SendMessage(hFooBtn, WM_SETTEXT, 0, (LPARAM) _T(FOO_TEXT)); // ... In this example, the button defined in the DIALOGEX would start with the text foo bar, since that is the value that the Windows RC compiler resolves FOO_TEXT to be, but the SendMessage call would then set the text to foo bar, since that's what the C compiler resolves FOO_TEXT to be. resinator's behavior resinator uses the Aro preprocessor, which means it acts like a C compiler. In the future, resinator will likely fork Aro (mostly to support UTF-16 encoded files), which could allow matching the behavior of rc.exe in this case as well. parser bug/quirk, utterly baffling Escaping quotes is fraught Again from the STRINGTABLE resource docs: To embed quotes in the string, use the following sequence: "". For example, """Line three""" defines a string that is displayed as follows: "Line three" This is different from C, where \" is used to escape quotes within a string literal, so in C to get "Line three" you'd do "\"Line three\ "". Note: I have no idea why the method for quote escaping would differ from C. The only thing I can think of is that it would allow for the use of \ within string literals in order to make specifying paths in string literals more convenient, but that can't be true because in e.g. "some\new\path" the \n gets parsed into 0x0A. This difference, though, can lead to some really bizarre results, since the preprocessor still uses the C escaping rules. Take this simple example: "\""BLAH" Here's how that is seen from the perspective of the preprocessor: string"\""identifierBLAHstring (unfinished)" And from the perspective of the compiler: string"\""BLAH" So, following from this, say you had this .rc file: #define BLAH "hello" 1 RCDATA { "\""BLAH" } Since we know the preprocessor sees BLAH as an identifier and we've done #define BLAH "hello", it will replace BLAH with "hello", leading to this result: 1 RCDATA { "\"""hello"" } which would now be parsed by the compiler as: string"\"""identifierhellostring"" and lead to a compile error: test.rc(3) : error RC2104 : undefined keyword or key name: hello This is just one example, but the general disagreement around escaped quotes between the preprocessor and the compiler can lead to some really unexpected error messages. Wait, but what actually happens to the backslash? Backing up a bit, I said that the compiler sees "\""BLAH" as one string literal token, so: 1 RCDATA { string"\""BLAH" } If we compile this, then the data of this RCDATA resource ends up as: "BLAH That is, the \ fully drops out and the "" is treated as an escaped quote. This seems to some sort of special case, as this behavior is not present for other unrecognized escape sequences, e.g. "\k" will end up as \k when compiled, and "\" will end up as \. resinator's behavior Using \" within string literals is always an error, since (as mentioned) it can lead to things like unexpected macro expansions and hard-to-understand errors when the preprocessor and the compiler disagree. test.rc:1:13: error: escaping quotes with \" is not allowed (use "" instead) 1 RCDATA { "\""BLAH" } ^~ This may change if it turns out \" is commonly used in the wild, but that seems unlikely to be the case. parser bug/quirk The column of a tab character matters Literal tab characters (U+009) within an .rc file get transformed by the preprocessor into a variable number of spaces (1-8), depending on the column of the tab character in the source file. This means that whitespace can affect the output of the compiler. Here's a few examples, where ---- denotes a tab character: 1 RCDATA { "----" } the tab gets compiled to 7 spaces: ******* 1 RCDATA { "----" } the tab gets compiled to 4 spaces: **** 1 RCDATA { "----" } the tab gets compiled to 1 space: * resinator's behavior resinator matches the Win32 RC compiler behavior, but emits a warning test.rc:2:4: warning: the tab character(s) in this string will be converted into a variable number of spaces (determined by the column of the tab character in the .rc file) " " ^~~ test.rc:2:4: note: to include the tab character itself in a string, the escape sequence \t should be used fundamental concept The Windows RC compiler 'speaks' UTF-16 As mentioned before, .rc files are compiled in two distinct steps: 1. First, they are run through a C/C++ preprocessor (rc.exe has a preprocessor implementation built-in) 2. The result of the preprocessing step is then compiled into a .res file In addition to a subset of the normal C/C++ preprocessor directives, there is one resource-compiler-specific #pragma code_page directive that allows changing which code page is active mid-file. This means that .rc files can have a mixture of encodings within a single file: #pragma code_page(1252) // 1252 = Windows-1252 1 RCDATA { "This is interpreted as Windows-1252: EUR" } #pragma code_page(65001) // 65001 = UTF-8 2 RCDATA { "This is interpreted as UTF-8: EUR" } If the above example file is saved as Windows-1252, each EUR is encoded as the byte 0x80, meaning: * The EUR (0x80) in the RCDATA with ID 1 will be interpreted as a EUR * The EUR (0x80) in the RCDATA with ID 2 will attempt to be interpreted as UTF-8, but 0x80 is an invalid start byte for a UTF-8 sequence, so it will be replaced during preprocessing with the Unicode replacement character ( or U+FFFD) So, if we run the Windows-1252-encoded file through only the rc.exe preprocessor (using the undocumented rc.exe /p option), the result is a file with the following contents: #pragma code_page 1252 1 RCDATA { "This is interpreted as Windows-1252: EUR" } #pragma code_page 65001 2 RCDATA { "This is interpreted as UTF-8: " } Note: In reality, the preprocessor also adds #line directives, but those have been omitted for clarity. If, instead, the example file is saved as UTF-8, each EUR is encoded as the byte sequence 0xE2 0x82 0xAC, meaning: * The EUR (0xE2 0x82 0xAC) in the RCDATA with ID 1 will be interpreted as a,! * The EUR (0xE2 0x82 0xAC) in the RCDATA with ID 2 will be interpreted as EUR So, if we run the UTF-8-encoded version through the rc.exe preprocessor, the result looks like this: #pragma code_page 1252 1 RCDATA { "This is interpreted as Windows-1252: a,!" } #pragma code_page 65001 2 RCDATA { "This is interpreted as UTF-8: EUR" } In both of these examples, the result of the rc.exe preprocessor is encoded as UTF-16. This is because, in the Windows RC compiler, the relevant code page interpretation is done during preprocessing, and the output of the preprocessor is always UTF-16. This, in turn, means that the parser/compiler of the Windows RC compiler always ingests UTF-16, as there's no option to skip the preprocessing step. This will be relevant for future bugs/quirks, so just file this knowledge away for now. preprocessor bug/quirk Extreme #pragma code_page values As seen above, the resource-compiler-specific preprocessor directive #pragma code_page can be used to alter the current code page mid-file. It's used like so: #pragma code_page(1252) // Windows-1252 // ... bytes from now on are interpreted as Windows-1252 ... #pragma code_page(65001) // UTF-8 // ... bytes from now on are interpreted as UTF-8 ... The list of possible code pages can be found here. If you try to use one that is not valid, rc.exe will error with: fatal error RC4214: Codepage not valid: ignored But what happens if you try to use an extremely large code page value (greater or equal to the max of a u32)? Most of the time it errors in the same way as above, but occasionally there's a strange / inexplicable error. Here's a selection of a few: #pragma code_page(4294967296) error RC4212: Codepage not integer: ) fatal error RC1116: RC terminating after preprocessor errors #pragma code_page(4295032296) fatal error RC22105: MultiByteToWideChar failed. #pragma code_page(4295032297) test.rc(2) : error RC2177: constant too big test.rc(2) : error RC4212: Codepage not integer: 4 fatal error RC1116: RC terminating after preprocessor errors I don't have an explanation for this behavior, especially with regards to why only certian extreme values induce an error at all. resinator's behavior resinator treats code pages exceeding the max of a u32 as a fatal error. test.rc:1:1: error: code page too large in #pragma code_page #pragma code_page ( 4294967296 ) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is a separate error from the one caused by invalid/unsupported code pages: test.rc:1:1: error: invalid or unknown code page in #pragma code_page #pragma code_page ( 64999 ) ^~~~~~~~~~~~~~~~~~~~~~~~~~~ test.rc:1:1: error: unsupported code page 'utf7 (id=65000)' in #pragma code_page #pragma code_page ( 65000 ) ^~~~~~~~~~~~~~~~~~~~~~~~~~~ preprocessor/parser bug/quirk Escaping in wide string literals In regular string literals, invalid escape sequences get compiled into their literal characters. For example: 1 RCDATA { "abc\k" ----> abc\k } However, for reasons unknown, invalid escape characters within wide string literals disappear from the compiled result entirely: 1 RCDATA { L"abc\k" ----> a.b.c. } On its own, this is just an inexplicable quirk, but when combined with other quirks, it gets elevated to the level of a (potential) bug. In combination with tab characters As detailed in "The column of a tab character matters", an embedded tab character gets converted to a variable number of spaces depending on which column it's at in the file. This happens during preprocesing, which means that by the time a string literal is parsed, the tab character will have been replaced with space character(s). This, in turn, means that "escaping" an embedded tab character will actually end up escaping a space character. Here's an example where the tab character (denoted by ----) will get converted to 6 space characters: 1 RCDATA { L"\----" } And here's what that example looks like after preprocessing (note that the escape sequence now applies to a single space character). 1 RCDATA { L"\******" } With the quirk around invalid escape sequences in wide string literals, this means that the "escaped space" gets skipped over/ ignored when parsing the string, meaning that the compiled data in this case will have 5 space characters instead of 6. In combination with codepoints represented by a surrogate pair As detailed in "The Windows RC compiler 'speaks' UTF-16", the output of the Windows RC preprocessor is always encoded as UTF-16. In UTF-16, codepoints >= U+10000 are encoded as a surrogate pair (two u16 code units). For example, the codepoint for (U+10437) is encoded in UTF-16 as <0xD801><0xDC37>. So, let's say we have this .rc file: #pragma code_page(65001) 1 RCDATA { L"\" } The file is encoded as UTF-8, meaning the is encoded as 4 bytes like so: #pragma code_page(65001) 1 RCDATA { L"\<0xF0><0x90><0x90><0xB7>" } When run through the Windows RC preprocessor, it parses the file successfully and outputs the correct UTF-16 encoding of the codepoint (remember that the Windows RC preprocessor always outputs UTF-16): 1 RCDATA { L"\" } However, the Windows RC parser does not seem to be aware of surrogate pairs, and therefore treats the escape sequence as only pertaining to the first u16 surrogate code unit (the "high surrogate"): 1 RCDATA { L"\<0xD801><0xDC37>" } This means that the \<0xD801> is treated as an invalid escape sequence and skipped, and only <0xDC37> makes it into the compiled resource data. This will essentially always end up being invalid UTF-16, since an unpaired surrogate code unit is ill-formed (the only way it wouldn't end up as ill-formed is if an intentionally unpaired high surrogate code unit was included before the escape sequence, e.g. L"\xD801\"). Note: This lack of surrogate-pair-awareness is actually quite common in Windows, since much of Windows' Unicode support predates UTF-16. Here's one resource that provides some context. resinator's behavior resinator currently attempts to match the Windows RC compiler's behavior exactly, and emulates the interaction between the preprocessor and wide string escape sequences in its string parser. The reasoning for emulating the Windows RC compiler for escaped tabs/ escaped surrogate pairs seems rather dubious, though, so this may change in the future. miscompilation STRINGTABLE semantics bypass The STRINGTABLE resource is intended for embedding string data, which can then be loaded at runtime with LoadString. A STRINGTABLE resource definition looks something like this: STRINGTABLE { 0, "Hello" 1, "Goodbye" } Notice that there is no id before the STRINGTABLE resource type. This is because all strings within STRINGTABLE resources are bundled together in groups of 16 based on their ID and language (we can ignore the language part for now, though). So, if we have this example .rc file: STRINGTABLE { 1, "Goodbye" } STRINGTABLE { 0, "Hello" 23, "Hm" } The "Hello" and "Goodbye" strings will be grouped together into one resource, and the "Hm" will be put into another. Each group is written as a series of 16 length integers (one for each string within the group), and each length is immediately followed by a UTF-16 encoded string of that length (if the length is non-zero). So, for example, the first group contains the strings with IDs 0-15, meaning, for the .rc file above, the first group would be compiled as: 05 00 48 00 65 00 6C 00 ..H.e.l. 6C 00 6F 00 07 00 47 00 l.o...G. 6F 00 6F 00 64 00 62 00 o.o.d.b. 79 00 65 00 00 00 00 00 y.e..... 00 00 00 00 00 00 00 00 ........ 00 00 00 00 00 00 00 00 ........ 00 00 00 00 00 00 00 00 ........ Internally, STRINGTABLE resources get compiled as the integer resource type RT_STRING, which is 6. The ID of the resource is based on the grouping, so strings with IDs 0-15 go into a RT_STRING resource with ID 1, 16-31 go into a resource with ID 2, etc. The above is all well and good, but what happens if you manually define a resource with the RT_STRING type of 6? The Windows RC compiler has no qualms with that at all, and compiles it similarly to a user-defined resource, so the data of the resource below will be 3 bytes long, containing foo: 1 6 { "foo" } In the compiled resource, though, the resource type and ID are indistinguishable from a properly defined STRINGTABLE. This means that compiling the above resource and then trying to use LoadString will succeed, even though the resource's data does not conform at all to the intended structure of a RT_STRING resource: UINT string_id = 0; WCHAR buf[1024]; int len = LoadStringW(NULL, string_id, buf, 1024); if (len != 0) { printf("len: %d\n", len); wprintf(L"%s\n", buf); } That code will output: len: 1023 o Let's think about what's going on here. We compiled a resource with three bytes of data: foo. We have no real control over what follows that data in the compiled binary, so we can think about how this resource is interpreted by LoadString like this: 66 6F 6F ?? ?? ?? ?? ?? foo????? ?? ?? ?? ?? ?? ?? ?? ?? ???????? ... ... The first two bytes, 66 6F, are treated as a little-endian u16 containing the length of the string that follows it. 66 6F as a little-endian u16 is 28518, so LoadString thinks that the string with ID 0 is 28 thousand UTF-16 code units long. All of the ?? bytes are those that happen to follow the resource data--they could in theory be anything. So, LoadString will erroneously attempt to read this gargantuan string into buf, but since we only provided a buffer of 1024, it only fills up to that size and stops. In the actual compiled binary of my test program, the bytes following foo happen to look like this: 66 6F 6F 00 00 00 00 00 foo..... 3C 3F 78 6D 6C 20 76 65 stringtabletest.exe!main(...) This is because, in order to load a string with ID 1, the bytes of the string with ID 0 need to be skipped past. That is, LoadString will determine that the string with ID 0 has a length of 28 thousand, and then try to skip ahead in the file 56 thousand bytes (since the length is in UTF-16 code units), which in our case is well past the end of the file. Note: My first impression is that this crash is a bug in LoadString, in that it probably should be doing some bounds checking to avoid attempting to read past the end of the resource data. However, resources get compiled into a different (tree) structure when linked into a PE/COFF binary, and that format is not something I'm familiar with (yet), so there may be something I'm missing about why bounds checking is seemingly not occuring. resinator's behavior test.rc:1:3: error: the number 6 (RT_STRING) cannot be used as a resource type 1 6 { ^ test.rc:1:3: note: using RT_STRING directly likely results in an invalid .res file, use a STRINGTABLE instead parser bug/quirk, utterly baffling CONTROL: "I'm just going to pretend I didn't see that" Within DIALOG/DIALOGEX resources, there are predefined controls like PUSHBUTTON, CHECKBOX, etc, which are actually just syntactic sugar for generic CONTROL statements with particular default values for the "class name" and "style" parameters. For example, these two statements are equivalent: classCHECKBOX, text"foo", id1, x2, y3, w4, h5 classCONTROL, "foo", 1, class nameBUTTON, styleBS_CHECKBOX | WS_TABSTOP, 2, 3, 4, 5 There is something bizarre about the "style" parameter of a generic control statement, though. For whatever reason, it allows an extra token within it and will act as if it doesn't exist. CONTROL, "text", 1, BUTTON, BS_CHECKBOX | WS_TABSTOP "why is this allowed"style, 2, 3, 4, 5 The "why is this allowed" string is completely ignored, and this CONTROL will be compiled exactly the same as the previous CONTROL statement shown above. * This bug/quirk requires there to be no comma before the extra token. In the above example, if there is a comma between the BS_CHECKBOX | WS_TABSTOP and the "why is this allowed", then it will (properly) error with expected numerical dialog constant * This bug/quirk is specific to the style parameter of CONTROL statements. In non-generic controls, the style parameter is optional and comes after the h parameter, but it does not exhibit this behavior The extra token can be many things (string, number, =, etc), but not anything. For example, if the extra token is ;, then it will error with expected numerical dialog constant. CONTROL: "Okay, I see that expression, but I don't understand it" Instead of a single extra token in the style parameter of a CONTROL, it's also possible to sneak an extra number expression in there like so: CONTROL, "text", 1, BUTTON, BS_CHECKBOX | WS_TABSTOP (7+8)style, 2, 3, 4, 5 In this case, the Windows RC compiler no longer ignores the expression, but still behaves strangely. Instead of the entire (7+8) expression being treated as the x parameter like one might expect, in this case only the 8 in the expression is treated as the x parameter, so it ends up interpreted like this: CONTROL, "text", 1, BUTTON, styleBS_CHECKBOX | WS_TABSTOP (7+x8), y2, w3, h4, exstyle5 My guess is that the similarity between this number-expression-related-behavior and "Number expressions as filenames" is not a coincidence, but beyond that I couldn't tell you what's going on here. resinator's behavior Such extra tokens/expressions are never ignored by resinator; they are always treated as the x parameter, and a warning is emitted if there is no comma between the style and x parameters. test.rc:4:57: warning: this token could be erroneously skipped over by the Win32 RC compiler CONTROL, "text", 1, BUTTON, 0x00000002L | 0x00010000L "why is this allowed", 2, 3, 4, 5 ^~~~~~~~~~~~~~~~~~~~~ test.rc:4:57: note: this line originated from line 4 of file 'test.rc' CONTROL, "text", 1, BUTTON, BS_CHECKBOX | WS_TABSTOP "why is this allowed", 2, 3, 4, 5 test.rc:4:31: note: to avoid the potential miscompilation, consider adding a comma after the style parameter CONTROL, "text", 1, BUTTON, 0x00000002L | 0x00010000L "why is this allowed", 2, 3, 4, 5 ^~~~~~~~~~~~~~~~~~~~~~~~~ test.rc:4:57: error: expected number or number expression; got '"why is this allowed"' CONTROL, "text", 1, BUTTON, 0x00000002L | 0x00010000L "why is this allowed", 2, 3, 4, 5 ^~~~~~~~~~~~~~~~~~~~~ miscompilation That's odd, I thought you needed more padding In DIALOGEX resources, a control statement is documented to have the following syntax: control [[text,]] id, x, y, width, height[[, style[[, extended-style]]]][, helpId] [{ data-element-1 [, data-element-2 [, . . . ]]}] For now, we can ignore everything except the [{ data-element-1 [, data-element-2 [, . . . ]]}] part, which is documented like so: controlData Control-specific data for the control. When a dialog is created, and a control in that dialog which has control-specific data is created, a pointer to that data is passed into the control's window procedure through the lParam of the WM_CREATE message for that control. Here's an example, where the string "foo" is the control data: 1 DIALOGEX 0, 0, 282, 239 { PUSHBUTTON "Cancel",1,129,212,50,14 { "foo" } } After a very long time of having no idea how to retrieve this data from a Win32 program, I finally figured it out while writing this article. As far as I know, the WM_CREATE event can only be received for custom controls or by superclassing a predefined control. Note: I'm going to gloss over exactly what that means. See here for details and a complete example. So, let's say in our program we register a class named CustomControl. We can then use it in a DIALOGEX resource like this: 1 DIALOGEX 0, 0, 282, 239 { CONTROL "text", 901, "CustomControl", 0, 129,212,50,14 { "foo" } } The control data ("foo") will get compiled as 03 00 66 6F 6F, where 03 00 is the length of the control data in bytes (3 as a little-endian u16) and 66 6F 6F are the bytes of foo. If we load this dialog, then our custom control's WNDPROC callback will receive a WM_CREATE event where the LPARAM parameter is a pointer to a CREATESTRUCT and ((CREATESTRUCT*)lParam)->lpCreateParams will be a pointer to the control data (if any exists). So, in our case, the lpCreateParams pointer points to memory that looks the same as the bytes shown above: a u16 length first, and the specified number of bytes following it. If we handle the event like this: // ... case WM_CREATE: if (lParam) { CREATESTRUCT* create_params = (CREATESTRUCT*)lParam; const BYTE* data = create_params->lpCreateParams; if (data) { WORD len = *((WORD*)data); printf("control data len: %d\n", len); for (WORD i = 0; i < len; i++) { printf("%02X ", data[2 + i]); } printf("\n"); } } break; // ... then we get this output (with some additional printing of the callback parameters): CustomProc hwnd: 00000000022C0A8A msg: WM_CREATE wParam: 0000000000000000 lParam: 000000D7624FE730 control data len: 3 66 6F 6F Nice! Now let's try to add a second CONTROL: 1 DIALOGEX 0, 0, 282, 239 { CONTROL "text", 901, "CustomControl", 0, 129,212,50,14 { "foo" } CONTROL "text", 902, "CustomControl", 0, 189,212,50,14 { "bar" } } With this, the CreateDialogParamW call starts failing with: Cannot find window class. Why would that be? Well, it turns out that the Windows RC compiler miscompiles the padding bytes following a control if its control data has an odd number of bytes. This is similar to what's described in " Your fate will be determined by a comma", but in the opposite direction: instead of adding too few padding bytes, the Windows RC compiler in this case will add too many. Each control within a dialog resource is expected to be 4-byte aligned (meaning its memory starts at an offset that is a multiple of 4). So, if the bytes at end of one control looks like this, where the dotted boxes represent 4-byte boundaries: ........foo then we only need one byte of padding after foo to ensure the next control is 4-byte aligned: ........foo......... However, the Windows RC compiler erroneously inserts two additional padding bytes in this case, meaning the control afterwards is misaligned by two bytes: ........foo......... This causes every field of the misaligned control to be misread, leading to a malformed dialog that can't be loaded. As mentioned, this is only the case with odd control data byte counts; if we add or remove a byte from the control data, then this miscompilation does not happen and the correct amount of padding is written. Here's what it looks like if "foo" is changed to "fo": ........fo.......... Note: This miscompilation only occurs for the padding between controls within a dialog. For the last control within a dialog, its control data can have an odd number of bytes with no ill-effects. This is a miscompilation that seems very easy to accidentally hit, but it has gone undetected/unfixed for so long presumably because this 'control data' syntax is very seldom used. For example, there's not a single usage of this feature anywhere within Windows-classic-samples. resinator's behavior resinator will avoid the miscompilation and will emit a warning when it detects that the Windows RC compiler would miscompile: test.rc:3:3: warning: the padding before this control would be miscompiled by the Win32 RC compiler (it would insert 2 extra bytes of padding) CONTROL "text", 902, "CustomControl", 1, 189,212,50,14,2,3 { "bar" } ^~~~~~~ test.rc:3:3: note: to avoid the potential miscompilation, consider adding one more byte to the control data of the control preceding this one miscompilation, utterly baffling CONTROL class specified as a number A generic CONTROL within a DIALOG/DIALOGEX resource is specified like this: classCONTROL, "foo", 1, class nameBUTTON, 1, 2, 3, 4, 5 The class name can be a string literal ("CustomControlClass") or one of BUTTON, EDIT, STATIC, LISTBOX, SCROLLBAR, or COMBOBOX. Internally, those unquoted literals are just predefined values that compile down to numeric integers: BUTTON --> 0x80 EDIT --> 0x81 STATIC --> 0x82 LISTBOX --> 0x83 SCROLLBAR --> 0x84 COMBOBOX --> 0x85 There's plenty of precedence within the Windows RC compiler that you can swap out a predefined type for its underlying integer and get the same result, and indeed the Windows RC compiler does not complain if you try to do so in this case: CONTROL, "foo", 1, class name0x80, 1, 2, 3, 4, 5 Before we look at what happens, though, we need to understand how values that can be either a string or a number get compiled. For such values, if it is a string, it is always compiled as NUL-terminated UTF-16: 66 00 6F 00 6F 00 00 00 f.o.o... If such a value is a number, then it's compiled as a pair of u16 values: 0xFFFF and then the actual number value following that, where the 0xFFFF acts as a indicator that the ambiguous string/number value is a number. So, if the number is 0x80, it would get compiled into: FF FF 80 00 .... The above (FF FF 80 00) is what BUTTON gets compiled into, since BUTTON gets translated to the integer 0x80 under-the-hood. However, getting back to this example: CONTROL, "foo", 1, class name0x80, 1, 2, 3, 4, 5 We should expect the 0x80 also gets compiled into FF FF 80 00, but instead the Windows RC compiler compiles it into: 80 FF 00 00 As far as I can tell, the behavior here is to: * Truncate the value to a u8 * If the truncated value is >= 0x80, add 0xFF00 and write the result as a little-endian u32 * If the truncated value is < 0x80 but not zero, write the value as a little-endian u32 * If the truncated value is zero, write zero as a u16 Some examples: 0x00 --> 00 00 0x01 --> 01 00 00 00 0x7F --> 7F 00 00 00 0x80 --> FF 80 00 00 0xFF --> FF FF 00 00 0x100 --> 00 00 0x101 --> 01 00 00 00 0x17F --> 7F 00 00 00 0x180 --> FF 80 00 00 0x1FF --> FF FF 00 00 etc I only have the faintest idea of what could be going on here. My guess is that this is some sort of half-baked leftover behavior from the 16-bit resource compiler that never got properly updated in the move to the 32-bit compiler, since in the 16-bit version of rc.exe, numbers were compiled as FF instead of FF FF . However, the results we see don't fully match what we'd expect if that were the case--instead of FF 80, we get 80 FF, so I don't think this explanation holds up. Note also that the 0x80 cutoff is also the cutoff for to the ASCII range, so that might also be relevant (but I don't think there's any legitimate reason for it to be). resinator's behavior resinator will avoid the miscompilation and will emit a warning: test.rc:2:22: warning: the control class of this CONTROL would be miscompiled by the Win32 RC compiler CONTROL, "foo", 1, 0x80, 1, 2, 3, 4, 5 ^~~~ test.rc:2:22: note: to avoid the potential miscompilation, consider specifying the control class using a string (BUTTON, EDIT, etc) instead of a number compiler bug/quirk CONTROL class specified as a string literal I said in "CONTROL class specified as a number" that class name can be specified as a particular set of unquoted identifiers (BUTTON, EDIT, STATIC, etc). I left out that it's also possible to specify them as quoted string literals--these are equivalent to the unquoted BUTTON class name: CONTROL, "foo", 1, "BUTTON", 1, 2, 3, 4, 5 CONTROL, "foo", 1, L"BUTTON", 1, 2, 3, 4, 5 Additionally, this equivalence is determined after parsing, so these are also equivalent, since \x42 parses to the ASCII character B: CONTROL, "foo", 1, "\x42UTTON", 1, 2, 3, 4, 5 CONTROL, "foo", 1, L"\x42UTTON", 1, 2, 3, 4, 5 All of the above examples get treated the same as the unquoted literal BUTTON, which gets compiled to FF FF 80 00 as mentioned in the previous section. A string masquerading as a number For class name strings that do not parse into one of the predefined classes (BUTTON, EDIT, STATIC, etc), the class name typically gets written as NUL-terminated UTF-16. For example: "abc" gets compiled to: 61 00 62 00 63 00 00 00 a.b.c... However, if you use an L prefixed string that starts with a \xFFFF escape, then the value is written as if it were a number (i.e. the value is always 32-bits long and has the format FF FF ). Here's an example: L"\xFFFFzzzzzzzz" gets compiled to: FF FF 7A 00 ..z. All but the first z drop out, as seemingly the first character value after the \xFFFF escape is written as a u16. Here's another example using a 4-digit hex escape after the \xFFFF: L"\xFFFF\xABCD" gets compiled to: FF FF CD AB .... Note: Remember that numbers are compiled as little-endian, so the CD AB sequence reflects that, and is not another bug/quirk of its own. So, with this bug/quirk, this: L"\xFFFF\x80" gets compiled to: FF FF 80 00 .... which is indistinguisable from the compiled form of the class name specified as either an unquoted literal (BUTTON) or quoted string ("BUTTON"). I want to say that this edge case is so specific that it has to have been intentional, but I'm not sure I can rule out the idea that some very strange confluence of quirks is coming together to produce this behavior unintentionally. resinator's behavior resinator matches the behavior of the Windows RC compiler for the "BUTTON"/"\x42UTTON" examples, but the L"\xFFFF..." edge case has not yet been decided on as of now. missing error, miscompilation Cursor posing as an icon and vice versa The ICON and CURSOR resource types expect a .ico file and a .cur file, respectively. The format of .ico and .cur is identical, but there is an 'image type' field that denotes the type of the file (1 for icon, 2 for cursor). The Windows RC compiler does not discriminate on what type is used for which resource. If we have foo.ico with the 'icon' type, and foo.cur with the 'cursor' type, then the Windows RC compiler will happily accept all of the following resources: 1 ICON "foo.ico" 2 ICON "foo.cur" 3 CURSOR "foo.ico" 4 CURSOR "foo.cur" However, the resources with the mismatched types becomes a problem in the resulting .res file because ICON and CURSOR have different formats for their resource data. When the type is 'cursor', a LOCALHEADER consisting of two cursor-specific u16 fields is written at the start of the resource data. This means that: * An ICON resource with a .cur file will write those extra cursor-specific fields, but still 'advertise' itself as an ICON resource * A CURSOR resource with an .ico file will not write those cursor-specific fields, but still 'advertise' itself as a CURSOR resource * In both of these cases, attempting to load the resource will always end up with an incorrect/invalid result because the parser will be assuming that those fields exist/don't exist based on the resource type So, such a mismatch always leads to incorrect/invalid resources in the .res file. resinator's behavior resinator errors if the resource type (ICON/CURSOR) doesn't match the type specified in the .ico/.cur file: test.rc:1:10: error: resource type 'cursor' does not match type 'icon' specified in the file 1 CURSOR "foo.ico" ^~~~~~~~~ unnecessary limitation PNG encoded cursors are erroneously rejected .ico/.cur files are a 'directory' of multiple icons/cursors, used for different resolutions. Historically, each image was a device-independent bitmap (DIB), but nowadays they can also be encoded as PNG. The Windows RC compiler is fine with .ico files that have PNG encoded images, but for whatever reason rejects .cur files with PNG encoded images. // No error, compiles and loads just fine 1 ICON "png.ico" // error RC2176 : old DIB in png.cur; pass it through SDKPAINT 2 CURSOR "png.cur" This limitation is provably artificial, though. If a .res file contains a CURSOR resource with PNG encoded image(s), then LoadCursor works correctly and the cursor displays correctly. resinator's behavior resinator allows PNG encoded cursor images, and warns about the Windows RC compiler behavior: test.rc:2:10: warning: the resource at index 0 of this cursor has the format 'png'; this would be an error in the Win32 RC compiler 2 CURSOR png.cur ^~~~~~~ miscompilation, utterly baffling Adversarial icons/cursors can lead to arbitrarily large .res files Each image in a .ico/.cur file has a corresponding header entry which contains (a) the size of the image in bytes, and (b) the offset of the image's data within the file. The Windows RC file fully trusts that this information is accurate; it will never error regardless of how malformed these two pieces of information are. If the reported size of an image is larger than the size of the .ico /.cur file itself, the Windows RC compiler will: * Write however many bytes there are before the end of the file * Write zeroes for any bytes that are past the end of the file, except * Once it has written 0x4000 bytes total, it will repeat these steps again and again until it reaches the full reported size Because a .ico/.cur can contain up to 65535 images, and each image within can report its size as up to 2 GiB (more on this in the next bug/quirk), this means that a small (< 1 MiB) maliciously constructed .ico/.cur could cause the Windows RC compiler to attempt to write up to 127 TiB of data to the .res file. resinator's behavior resinator errors if the reported file size of an image is larger than the size of the .ico/.cur file: test.rc:1:8: error: unable to read icon file 'test.ico': ImpossibleDataSize 1 ICON test.ico ^~~~~~~~ miscompilation, utterly baffling Adversarial icons/cursors can lead to infinitely large .res files As mentioned in Adversarial icons/cursors can lead to arbitrarily large .res files, each image within an icon/cursor can report its size as up to 2 GiB. However, the field for the image size is actually 4 bytes wide, meaning the maximum should technically be 4 GiB. The 2 GiB limit comes from the fact that the Windows RC compiler actually interprets this field as a signed integer, so if you try to define an image with a size larger than 2 GiB, it'll get interpreted as negative. We can somewhat confirm this by compiling with the verbose flag (/v): Writing ICON:1, lang:0x409, size -6000000 When this happens, the Windows RC compiler seemingly enters into an infinite loop when writing the icon data to the .res file, meaning it will continue trying to write garbage until (presumably) all the space of the hard drive has been used up. resinator's behavior resinator avoids misinterpreting the image size as signed, and allows images of up to 4 GiB to be specified if the .ico/.cur file actually is large enough to contain them. miscompilation Icon/cursor images with impossibly small sizes lead to bogus .res files Similar to Adversarial icons/cursors can lead to arbitrarily large .res files, it's also possible for images to specify their size as impossibly small: * If the size of an image is reported as zero, then the Windows RC compiler will: + Write an arbitrary size for the resource's data + Not actually write any bytes to the data section of the resource * If the size of an image is smaller than the header of the image format, then the Windows RC compiler will: + Read the full header for the image, even if it goes past the reported end of the image data + Write the reported number of bytes to the .res file, which can never be a valid image since it is smaller than the header size of the image format resinator's behavior resinator errors if the reported size of an image within a .ico/.cur is too small to contain a valid image header: test.rc:1:8: error: unable to read icon file 'test.ico': ImpossibleDataSize 1 ICON test.ico ^~~~~~~~ miscompilation Bitmaps with missing bytes in their color table BITMAP resources expect .bmp files, which are roughly structured something like this: ..BITMAPFILEHEADER.. ..BITMAPINFOHEADER.. .................... ....color table..... .................... ....pixel data...... .................... .................... The color table has a variable number of entries, dictated by either the biClrUsed field of the BITMAPINFOHEADER, or, if biClrUsed is zero, 2^n where n is the number of bits per pixel (biBitCount). When the number of bits per pixel is 8 or fewer, this color table is used as a color palette for the pixels in the image: 0 179 127 46 - 1 44 96 167 - 2 154 60 177 - color index color rgb color Example color table (above) and some pixel data that references the color table (below) ... 1 0 2 0 1 ... This is relevant because the Windows resource compiler does not just write the bitmap data to the .res verbatim. Instead, it strips the BITMAPFILEHEADER and will always write the expected number of color table bytes, even if the number of color table bytes in the file doesn't match expectations. ..BITMAPFILEHEADER.. ..BITMAPINFOHEADER.. .................... ....pixel data...... .................... .................... ..BITMAPINFOHEADER.. .................... ....color table..... .................... ....pixel data...... .................... .................... A bitmap file that omits the color table even though a color table is expected, and the data written to the .res for that bitmap Typically, a bitmap with a shorter-than-expected color table is considered invalid (or, at least, Windows and Firefox fail to render them), but the Windows RC compiler does not error on such files. Instead, it will completely ignore the bounds of the color table and just read into the following pixel data if necessary, treating it as color data. ..BITMAPFILEHEADER.. ..BITMAPINFOHEADER.. .................... ....pixel data...... .................... .................... ..BITMAPINFOHEADER.. .................... ..."color table".... .................... ....pixel data...... .................... .................... When compiled with the Windows RC compiler, the bytes of the color table in the .res will consist of the bytes in the outlined region of the pixel data in the original bitmap file. Further, if it runs out of pixel data to read (i.e. the inferred size of the color table extends beyond the end of the file), it will start filling in the remaining missing color table bytes with zeroes. My guess is that this behavior is largely unintentional, and is a byproduct of two things: * By stripping the BITMAPFILEHEADER from the bitmap during resource compilation, the bfOffBits field (which contains the offset from the beginning of the file to the pixel data) is not present in the compiled resource * A bitmap with more than the expected number of color table bytes is probably valid (see q/pal8offs.bmp of bmpsuite) The first point means that the size of the color table must always match the size given in BITMAPINFOHEADER, since the start of the pixel data must be calculated using the size of the color table. The second point means that there is some reason not to error out completely when the color table does not match the expected size. Together, it means that there is some justification for forcing the color table to match the expected size when there is a mismatch. From invalid to valid Interestingly, the behavior with regards to smaller-than-expected color tables means that an invalid bitmap compiled as a resource can end up becoming a valid bitmap. For example, if you have a bitmap with 12 actual entries in the color table, but BITMAPFILEHEADER.biClrUsed says there are 13, Windows considers that an invalid bitmap and won't render it. If you take that bitmap and compile it as a resource, though: 1 BITMAP "invalid.bmp" The resulting .res will pad the color table of the bitmap to get up to the expected number of entries (13 in this case), and therefore the resulting resource will render fine when using LoadBitmap to load it. Maliciously constructed bitmaps The dark side of this bug/quirk is that the Windows RC compiler does not have any limit as to how many missing color palette bytes it allows, and this is even the case when there are possible hard limits available (e.g. a bitmap with 4-bits-per-pixel can only have 2^4 (16) colors, but the Windows RC compiler doesn't mind if a bitmap says it has more than that). The biClrUsed field (which contains the number of color table entries) is a u32, meaning a bitmap can specify it contains up to 4.29 billion entries in its color table, where each color entry is 4 bytes long (or 3 bytes for old Windows 2.0 bitmaps). This means that a maliciously constructed bitmap can induce the Windows RC compiler to write up to 16 GiB of color table data when writing its resource, even if the file itself doesn't contain any color table at all. resinator's behavior resinator errors if there are any missing palette bytes: test.rc:1:10: error: bitmap has 16 missing color palette bytes 1 BITMAP missing_palette_bytes.bmp ^~~~~~~~~~~~~~~~~~~~~~~~~ test.rc:1:10: note: the Win32 RC compiler would erroneously pad out the missing bytes (and the added padding bytes would include 6 bytes of the pixel data) For a maliciously constructed bitmap, that error might look like: test.rc:1:10: error: bitmap has 17179869180 missing color palette bytes 1 BITMAP trust_me.bmp ^~~~~~~~~~~~ test.rc:1:10: note: the Win32 RC compiler would erroneously pad out the missing bytes There's also a warning for extra bytes between the color table and the pixel data: test.rc:2:10: warning: bitmap has 4 extra bytes preceding the pixel data which will be ignored 2 BITMAP extra_palette_bytes.bmp ^~~~~~~~~~~~~~~~~~~~~~~ miscompilation Bitmaps with BITFIELDS and a color palette When testing things using the bitmaps from bmpsuite, there is one well-formed .bmp file that rc.exe and resinator handle differently: g/rgb16-565pal.bmp: A 16-bit image with both a BITFIELDS segment and a palette. The details aren't too important here, so just know that the file is structured like this: ..BITMAPFILEHEADER.. ..BITMAPINFOHEADER.. .................... .....bitfields...... ....color table..... .................... ....pixel data...... .................... .................... As mentioned earlier, the BITMAPFILEHEADER is dropped when compiling a BITMAP resource, but for whatever reason, rc.exe also drops the color table when compiling this .bmp, so it ends up like this in the compiled .res: ..BITMAPINFOHEADER.. .................... .....bitfields...... ....pixel data...... .................... .................... Note, though, that within the BITMAPINFOHEADER, it still says that there is a color table present (specifically, that there are 256 entries in the color table), so this is likely a miscompilation. One possibility here is that it's not intended to be valid for a .bmp to contain both color masks and a color table, but that seems dubious because Windows renders the original .bmp file just fine in Explorer/ Photos. Note: This particular bitmap structure is potentially unlikely to encounter in the wild, since bitmaps with >= 16-bit depth and a color table seem largely useless (for bit depths >= 16, the color table is only used for "optimizing colors used on palette-based devices") The available documentation regarding this (BITMAPINFOHEADER, BITMAPV5HEADER) I found to be unclear at best, though. resinator's behavior resinator does not drop the color table, so in the compiled .res the bitmap resource data looks like this: ..BITMAPINFOHEADER.. .................... .....bitfields...... ....color table..... .................... ....pixel data...... .................... .................... and while I think this is correct, it turns out that... LoadBitmap mangles both versions anyway When the compiled resources are loaded with LoadBitmap and drawn using BitBlt, neither the rc.exe-compiled version, nor the resinator-compiled version are drawn correctly: [bmp-intend] [bmp-rc] [bmp-resina] intended image bitmap resource from rc.exe bitmap resource from resinator My guess/hope is that this a bug in LoadBitmap, as I believe the resinator-compiled resource should be correct/valid. parser bug/quirk, utterly baffling The strange power of the lonely close parenthesis Likely due to some number expression parsing code gone haywire, a single close parenthesis ) is occasionally treated as a 'valid' expression, with bizarre consequences. Similar to what was detailed in "BEGIN or { as filename", using ) as a filename has the same interaction as { where the preceding token is treated as both the resource type and the filename. 1 RCDATA ) test.rc(2) : error RC2135 : file not found: RCDATA But that's not all; take this, for example, where we define an RCDATA resource using a raw data block: 1 RCDATA { 1, ), ), ), 2 } This should very clearly be a syntax error, but it's actually accepted by the Windows RC compiler. What does the RC compiler do, you ask? Well, it just skips right over all the ), of course, and the data of this resource ends up as: the 1 (u16 little endian) - 01 00 02 00 - the 2 (u16 little endian) I said 'skip' because that's truly what seems to happen. For example, for resource definitions that take positional parameters like so: 1 DIALOGEX 1, 2, 3, 4 { //