https://wheybags.com/blog/emperor.html


Tom Mason

  * About
  * Blog
  * Projects
  * Contact

[website_co]  


Resurrecting a dead Dune RTS game

- 13th July 2024

TL;DR: I created a patch called EmperorLauncher, which modifies
Emperor: Battle for Dune to run well on modern systems, with:

  * High resolution support
  * Working online multiplayer with direct ip connection
  * Coop Campaign mode


You can download the patch here, and the source code is available on
github.


Me and a friend playing Coop Campaign in 4K

The rest of this blog post is a fairly technical explanation of how
and why I made this thing.


Table of Contents



What is Emperor: Battle for Dune?

Emperor: Battle for Dune is a 2001 realtime strategy game made by
Westwood Studios, arguably the inventors of the RTS genre. It is a
sequel to Dune 2000, itself a remake of sorts of Dune II, considered
by many to be the original RTS. The Dune RTSes hold a special place
in my heart, with Dune 2000 being the first PC game I ever bought,
and my introduction to the Dune universe. Emperor followed up on Dune
2000, bringing 3D graphics, vastly improved UX, and an absolutely
bangin' soundtrack. All that said, it isn't that well known these
days. I suspect this is because it was outshone by Westwood's other
RTS series, the juggernaut Command & Conquer.

[cnc_vs_dun]

Google trends, 2004-2010

What's wrong with it?

A lot. The years have not been kind to Emperor:

  * The game can't run at higher resolutions afforded by modern
    screens
  * Game simulation speed is uncapped in multiplayer, rendering it
    unplayably fast
  * Westwood Online (WOL) doesn't work anymore, so you can't play
    multiplayer except through LAN
  * You can't play the campaign in coop mode at all, because that was
    an online-only feature not supported over LAN
  * The installer included on the disk is broken
  * And finally, many visual effects are broken by the high
    framerates of modern PCs


Main menu in seizure mode

I love coop games, and I always wished more RTSes in particular had a
coop mode. When I found out all these years later that Emperor had a
coop mode that I'd never known about, and it was no longer playable -
I knew in that moment, deep in my heart, that this was a cosmic
injustice that must be put to rights. I also just had a hankerin' to
do some reverse engineering, so...

How do we fix this mess?

I started off with a modest initial goal. The "main" exe of the game
is a misdirection: Emperor.exe is a thin wrapper that runs the real
game executable, Game.exe. But if we run Game.exe directly, nothing
happens. So my initial goal was to make a replacement for
Emperor.exe, so I could control the launch of Game.exe. Later on,
once I have this control, I can use it to inject a DLL containing my
patches. More details on that later.

At this point I hopped into IDA to see what else Emperor.exe was
doing other than calling CreateProcess.

What is IDA?

IDA is the industry standard reverse engineering tool. It is an
incredibly powerful tool that operates on a database of knowledge
about your executable. At its core it is a disassembler (IDA -
Interactive DisAssembler), which turns machine code into slightly
more readable textual assembly language.

[ida_runCom]

Part of a disassembled function from Emperor.exe

It is also able to go one step further than that, and decompile the
assembly into mostly-compilable C code. It will need some help from
you, though. At the beginning, it will not know the types of
anything. All the datastructures and typedefs are completely gone,
everything is just an integer or int*, functions have been inlined,
or optimised out entirely, and everything has names like sub_402E80
or a2.

[ida_runCom]

The same code, decompiled into C

As you browse through the code, you annotate things that you know.
For example, in the function above I was able to look at the pointer
that was being passed in, and see which offsets were being used.
Since CreateProcessA is a known, documented Windows API function I
was able to infer what was contained in some of those offsets. Using
that knowledge, I was able to create a custom structure definition
(ProcessRunData) with some fields filled in. Now I have a type for
the parameter, so I can search for callers and annotate the type of
the variable being passed to runCommand. And oh look, I'm using some
of the fields in ProcessRunData with some other functions and
variables, so now I know their type too. In this fashion, you could
slowly flood-fill information through the whole binary until you had
a full understanding of the whole program if you wanted to. That
would of course be incredibly laborious for a game this size.

This is probably a good time to mention that I am not a skilled
reverse engineer. I'm definitely not a professional, and this is my
first real foray into reverse engineering a binary like this. Cosmic
justice aside, the main reason for this project was learning about
reverse engineering for fun. So all that said, I spent the next
couple of evenings reverse engineering a curious set of string
manipulation functions that turned out to just be std::ostringstream.

[ida_badnam]

Behold, function names

Eventually I dug myself out of that rabbit-hole, and found out what
Emperor.exe was doing that was so special. Before running Game.exe,
it creates a mutex, and an anonymous file mapping handle -
essentially just a chunk of allocated memory with a handle associated
to it. It doesn't do much with the mutex, as far as I can tell just
using it to make sure there is only one instance of the game running,
but it does something odd with the file mapping. It launches Game.exe
with bInheritHandles set to 1. This means the child process
"inherits" our handles - it is able to use the same numeric value of
the open file mapping handle, as though it had opened it itself.
Emperor.exe then loads some data from the file Emperor.dat in the
install directory, and does a bunch of manipulation to that data,
presumably some sort of decryption. It then maps the mapping handle,
and copies the decrypted data into the mapping.

So now we have some decrypted data in a mapping. The mapping handle
is valid in both processes, so the child process can use the handle
to retrieve the data that the launcher stored. But the child process
doesn't know what the handle's value is. So the parent needs to send
it somehow. Windows has an IPC message passing system where threads
have Message Queues. This is used as the basis of window
functionality - windows (HWNDs, actual window objects) use this queue
to send and receive the messages that make them work.

The parent process tells the child the handle value by sending a
message to the main thread of the child, using 0xBEEF as the custom
message ID, because why not. This works because CreateProcessA will
actually tell the parent process the thread ID of the main thread of
the newly created child process. I didn't really want to reverse
engineer the decryption code, so I created a dumping tool that I
could sub in in place of Game.exe, which reads the data sent to it
and dumps it to disk.

[dumptruck]

Not that kind of dumping tool

Turns out the data was "UIDATA,3DDATA,MAPS", three strings that get
passed to a bunch of asset loading code, making sure the game cannot
work without them. From that point it was simple enough to write some
code to perform the sequence myself, and I was able to start Game.exe
successfully.

Patch injection

Now we get to the real meat and potatoes of this project - injecting
our custom patches into Game.exe. We can force a process to load and
run our code by using the ol' CreateRemoteThread & LoadLibrary trick.

Excuse me, what trick is that?

CreateRemoteThread is a Windows API function that takes a
"ThreadProc" function pointer, argument pointer, and process handle
as arguments. The process handle is normally a handle representing
the calling process, but not this time. When CreateRemoteThread is
called, it creates a thread in the target process, which runs the
passed function with the passed argument. LoadLibrary is a Windows
API function that takes a string path, and loads the DLL located at
that path into the process. DLLs on windows can have a dllmain
function which will be called when the DLL is loaded.

A function suitable to be passed to CreateRemoteThread must take a
single pointer argument. LoadLibrary is a function that takes a
single pointer argument. Do you see where this horror show is going?

So the only remaining problem is that we need to get the path to our
DLL into the memory space of our target process. Not a problem, we
can use our friends VirtualAllocEx (which takes a process handle
parameter), and WriteProcessMemory to allocate a buffer in the target
process' memory, and copy our DLL path into it. We then grab the
address of LoadLibrary and pass it, along with our newly created DLL
path buffer, to CreateRemoteThread. A new thread gets created in the
target process, and loads our DLL. The dllmain function present in
that DLL runs, and boom! We have code execution, from here we are
free to do whatever nefarious things we like.

Wait a minute, how do we know the address of CreateRemoteThread in
the target process? Windows does have ASLR, so DLLs are loaded at
random addresses for security reasons. You would think this would
make it impossible for us to know where CreateRemoteThread is located
in another process. Well, bizarrely, we can just use the address of
CreateRemoteThread in the current process. Windows does randomise DLL
base addresses, but it will randomise on first load, then try to use
the same address in every process after that. Documentation on this
is hard to find (if you do find something concrete, please let me
know), and Raymond does only say the kernel will "try" to use the
same address, but in practice there seems to be a special rule for
some basic DLLs, like Kernel32.dll (which contains LoadLibrary), so
they are always loaded at the same address on a given boot, to enable
procedures like this one. Even if it isn't documented, this is a well
known technique, which is used in all sorts of things out in the
wild, including, I'm sure, some Microsoft products - so by now it's
not something they could afford to break.

So, if we start the process in a suspended state, then inject a dll,
we get code execution in the target process before main is run. But
what we really need is to modify existing functions in the Game.exe
binary. How do we do that? In short, we use the detours library[1].

What is detours?

Detours is the real black magic of this world. It is the arcane
invocation that dispenses with arbitrary social constructs, and cuts
straight to the fundamental truth of the machine. Types are an
illusion. Data structures don't exist. There is only functions. And
detours patches functions.

Let's say we want to wrap a function from the standard library with
some logging, in this example we'll use sendto. We want to replace
all calls to sendo with our modified function, which does some
logging, and then calls the real sendto function. Something like
this:

[sendto_wra]


We can grab a pointer to the original sendto function pretty easily:

[sendto_loa]


Then comes the real magic. We can just edit the bytes of the original
function, replacing the first instruction with the opcode for an
unconditional jump to our patched function. It's a little bit more
involved than just memcpy(&sendto, &jumpCode, sizeof(jumpCode));
though. Memory pages containing functions are not writable for
security reasons, so we need to set it writable with VirtualProtect,
edit the function, then set it back. We then need to call
FlushInstructionCache, because otherwise we could end up with
different instructions in cache vs memory which could cause a
multitude of problems. And of course, we need to be sure that while
we're doing all this, there isn't some other thread executing the
code we're modifying. In our case we're executing our hooks before
main runs, so we know there are no other threads to worry about.

Ok, so now we've redirected the original function to our replacement,
but there's one thing missing. If we want to wrap the original
function, not just replace it, then we need a pointer we can call to
run the original, unpatched code. But we've just stomped all over it
by replacing random instructions with an unconditional jump. This is
where the real genius of detours comes in. The library will take note
of the instructions it ruined, copy them to a newly allocated chunk
of memory somewhere, and set up a little wrapper function. That
function contains the copied instructions, followed by a jump
instruction that jumps back into the original code, just after the
last broken instruction. As you can imagine, there is a lot of
subtlety here, especially when dealing with an instruction set like
x86, where not all opcodes are the same length. The end result is you
can do something like this:

[sendto_det]


sendto_orig now points at the magic generated chunk, so it behaves
exactly like the original function, while the original sendto address
is now set up to jump straight into our wrapper, leaving us free to
call the original function from inside our wrapper.

Now I can take the addresses of functions that I found and figured
out in IDA, and wrap / replace them with detours. During my
miscellaneous poking around in IDA, I noticed several calls that
looked like some sort of debug logging function. When I looked at the
function being called though, it was empty, just a return
instruction. I hooked up a detour which forwarded the log message on
to vprintf, but no luck, the process just crashes - a segfault inside
vprintf.

The problem with this function is that it's not one function, it's
many functions. My guess is that in the original source code, there
were a bunch of debugging functions that were behind #ifdef DEBUG,
and in release builds they were replaced with empty functions. This,
combined with things like empty virtual functions to satisfy
inheritance, means the final binaries end up with a lot of empty
functions. They all compile to nothing but a single ret (return)
instruction, and then the linker says "oh look, I have duplicated
functions" and merges them into one location. Unfortunately, one of
these empty functions is the debug logger.

My first attempt at solving this was to use a heuristic to detect if
the parameters I've got look like a log string. The heuristic is
pretty simple: if we interpret the first argument as a pointer to
char, then we iterate a few bytes and check if those bytes are ASCII
printable characters. If so - great, we have a log call, we can
forward our parameters to vprintf. If not, then do nothing. There's
also the obvious problem here that we can get all sorts of random
values passed in here as a first parameter, and they're not all valid
pointers at all, let alone pointers to null terminated strings. So,
there's a pretty high chance we're going to segfault. But that's ok.
Windows handles segfaults by throwing an SEH exception. We can catch
and ignore the exception, then presume the call is not a logging
call. Surprisingly, this actually worked pretty well.

[debug_log]

I wonder how long it's been since someone last read these log lines

That was cool, but not really enough. There were still false
positives and false negatives, and false positives could still cause
the game to crash. If I wanted to fix this for real, then I'd need to
get in there and annotate every.  single.  call. I could give myself
a bit of a head start though. I tightened up my heuristic to give no
false positives (at the expense of some false negatives), and logged
call sites by using the AddressOfReturnAddress intrinsic inside my
detour.

What I really wanted was to separate the logging calls into a
different function from the other empty function calls. Actually, two
functions, because there were two variants of the logging function.
IDA has a patching mode that allows you to make binary changes to the
executable under analysis, without applying them to the real file on
disk. It then uses those patches during disassembly and
decompilation, so you get a virtualised view of how your patches will
affect the binary. You can also dump the patches to a file, or apply
them to the original binary.

The original empty function had a bunch of zeroed out padding bytes
immediately following it, so I could patch in a few empty functions
in that padding space by inserting ret instructions. Then I patch the
actual logging calls to call one of the new empty functions, and
later on I can detour that function to reenable logging. With an
incomplete list of addresses from my heuristic dumper, I could use
IDAs built in python scripting to patch known good call sites.

[ida_patch_]

IDApython script to patch call sites with an offset

My heuristic worked pretty well for most of the call sites, but it
was not all of them by a long shot. I wrote another python script to
find unpatched call sites and apply some more heuristics, like
looking for patterns of pushing a string constant before the call
instruction. These got me another bunch of calls, but in the end I
still had several hundred call sites that I just had to manually
annotate. It took a few excruciatingly boring hours, but I got it
done. I then exported the patch data, reformatted it as a C++ array,
and applied the patch at runtime.

The reason I spent so long making sure the debug log was working so
well, was that I had an intuition that it would end up being
enormously helpful during the rest of the project. Actually, that's a
lie, it was just a fun technical challenge and made me feel like a
digital archaeologist. But hey - I had you fooled with that excuse
right? Jokes aside, it really did help to have the original logs. For
example, when I was working on getting multiplayer over WOL (Westwood
Online) working, the client was hanging some time after receiving the
game start command. By looking at the debug log, I could see a failed
assert message: "MyId == INVALID_ID". I then searched IDA's strings
view, jumped into the function where the assert was failing, and
realised that it was the function which handles receipt of an
SC_MESSAGE_YOUR_DETAILS message. A few lines above I saw that we had
logged successfully receiving an SC_MESSAGE_YOUR_DETAILS message
already. This tipped me off to look at my wireshark dumps, and I
noticed that I was incorrectly sending a GAMEOPT command containing
an SC_MESSAGE_YOUR_DETAILS message to all connected players instead
of just the target. Without the debug log prompting me, I have no
idea how long I would have taken to notice.

Patching graphics

[dx_overvie]

Contemporary graphic design from the D3D7 era

Emperor was built with the then-current graphics API Direct3D 7.
Broadly speaking, DirectX 8/9 is around when graphics APIs started to
vaguely approximate the kind of feature set we expect today, moving
away from a fixed-function pipeline and towards a modern shader-based
world. DirectX 9 in particular, was still occasionally used up until
quite recently, with some still-popular games still using it (for
example CS:GO, which was only retired in 2023). I won't get into the
weeds on graphics APIs here, but suffice it to say that Direct3D 7
support in modern windows is... patchy at best.

High resolution windows

One of the major issues is the resolution limit. From what I can
tell, Direct3D7 on modern systems is implemented as some sort of
wrapper layer that redirects to a more modern version of DirectX,
like 9 or 11. Somewhere in that wrapper layer, the maximum texture
size is being limited to 2048. Luckily, I was able to pull in some
code from UCyborg's LegacyD3DResolutionHack patch, which solved that
problem for me[2].

There was also another problem. The game cannot handle running at
anything but a 4:3 ratio. Rendering works, but the UI is completely
broken, like it's zoomed in way too far. Also the in-game mouse
rendering has a broken offset, which varies depending on your
distance from the centre of the screen[3].

Mouse offset. The system cursor is where clicks really register.

So, we need some 4:3 letterboxing. This was actually surprisingly
difficult. I tried just telling the game to render fullscreen at
2160x2880 (a 4:3 slice out of 4k), but it didn't like using arbitrary
resolutions in fullscreen mode like that. The game does have a
windowed mode though, accessible by passing -w in the command line
args, and arbitrary resolutions are not a problem when windowed. But
then we get an ugly floating window with a border, that doesn't cover
up the taskbar. I tried a bunch of bad ideas, but what worked in the
end was a patch to remove the border style, and reparent the game
window on top of a fullscreen black window. I also added mouse
capture, so edge scrolling isn't broken if you have multiple
monitors, or want to play in windowed mode.

Limiting framerate

Next I patched the IDirect3DDevice7::EndScene method, to limit the
FPS to 60.

Soooo much better

To limit the framerate to 60 FPS, we need to detect the current
framerate, calculate the amount we need to delay, and then sleep the
thread for that amount of time. To do that, we need to patch a
function that is called once per frame, preferably at the very end of
the frame. IDirect3DDevice7::EndScene fits the bill perfectly. But
this was a little tricky, because IDirect3DDevice7::EndScene is not
exported directly - it's a member function of a "lightweight COM"
object created through a chain of COM method calls.

What is lightweight COM?

COM is many things, but among other things it is a method of
representing objects that allows us to have ABI stability.
Lightweight COM (also known as nano COM) is a subset of COM concepts
that focuses on that representation and ignores the rest.

What is ABI stability?

To understand ABI stability, let's take a step back and discuss API
stability. API stability means that you provide a source-code level
API that does not change, so consumers of that API can upgrade to new
versions without having to change their code. ABI stability is taking
this concept one step further for native code, and providing
guarantees that consumers can upgrade your library without
recompiling.

For example, if I add a new field to a struct that is part of my
public interface, the API has not changed. Any code written against
the old version can be successfully recompiled against the new
version, and all is well. But an old binary, compiled against the old
version will not work anymore. The compiler used information about
the size of the struct, and the offsets of its members during
compilation, and that information is now wrong. This binary interface
is known as the ABI (Application Binary Interface). ABI stability is
the practice of crafting updates to native code libraries without
causing this kind of breakage.

Microsoft doesn't want to force everyone to recompile their DirectX
apps every time they add a field to a class, so the COM object
representation takes care of that. In essence, a COM object is a
struct that contains, as its first member, a pointer to a table of
functions (known as a vtable). Something like this:

[com_lite]

When you're actually implementing a COM object in C++, Microsoft's
MSVC compiler gives you a bit of help, but under the hood this is
what you get. You could also implement a COM object in plain C, and
in that case you would need to handle your layout manually. Under
COM, you never access the members of a class directly, the consumer
of your class only ever sees your vtable and a pointer to your class.
This way, consumers never depend on the exact size or layout of the
underlying struct, so you're free to change it.

COM allows a single object to provide multiple interfaces, think of
this like normal class inheritance, but with some extra
possibilities. Each interface has its own base pointer, with its own
vtable. You switch interface not by a simple cast, but by using a
special QueryInterface function. There is also some reference
counting going on, but that's not important to this discussion.

To get a pointer to the IDirect3DDevice7::EndScene function, the
application has to:

  * Call DirectDrawCreateEx to get an IDirect3D7*
  * Call IDirect3D7::CreateDevice to get an IDirect3DDevice7*
  * Grab the function pointer from the vtable on the IDirect3DDevice7


So I'm confronted with a dilemma. I want to get the function pointer,
but in order to do so I need to call a bunch of Direct3D functions
that would create complicated objects, potentially interfering with
the application. But maybe...

[we_could_l]

... maybe we could let the game create the objects

So in the end, what we do is the following:

  * Patch DirectDrawCreateEx on app start
  * When DirectDrawCreateEx is called for the first time, before
    returning we use the IDirect3DDevice7 we just created to grab a
    pointer to IDirect3D7::CreateDevice, and patch that
  * When IDirect3D7::CreateDevice is called for the first time,
    before returning we use the IDirect3DDevice7 we just created to
    grab a pointer to IDirect3DDevice7::EndScene and patch it to
    insert a frame rate limit.


Patching networking

[WOL]

It's got a 28.8 BPS modem

The next job is to get multiplayer working. What we want is a very
basic version - no fancy lobbies, hosted servers or clan systems.
Just forward ports and then copy/paste your friends IP to connect.
Classic style, and most importantly requires no infrastructure so it
won't break if I lose interest in maintaining it.

There's two roads we can go down here, patching LAN mode, or WOL
(Westwood Online - the game's defunct online service). LAN mode is
actually working, however it is not ideal for modern players. It
depends on sending UDP broadcast packets to announce servers, a
commonly used trick in old LAN games. The idea is, when you host a
game, you broadcast a special packet to all hosts on your LAN subnet,
essentially announcing "I am running a game server". That lets the
game show a list of lobbies in the LAN menu, so players can connect
without having to open a cmd window and call out their IP address to
eachother.

The downside is that, for obvious reasons, broadcast does not work
online. And as the game has no way to specify an ip to connect to
manually, there is no way to play with a friend online using this
mode. I did initially work on allowing online play through LAN mode
by patching the LAN chat to allow you to specify an IP to connect to,
but once I realised there was a WOL-exclusive coop campaign mode I
abandoned that approach.

There's two components to getting WOL working. We need to stand up a
fake WOL master server so the game can figure out where to connect
and what kind of game to start, and then we need to do some proxying
of the game packets to make it work over a direct IP connection.
Surprisingly enough, there is an master server running at xwis.net.
From what I can tell, it's fan run, and at time of writing they have
access to the original DNS entries used by the game
(servserv.westwood.com). Emperor is not really working out of the box
on xwis though, so despite them mentioning Emperor on their homepage,
I think it's really only used for other Westwood games in the Command
& Conquer series. Still, it's enough to make and join a lobby so
we'll start with the packet proxying and come back to the master
server issue later.

The packets must flow

[packets_mu]

You're probably wondering "why the hell is he talking about proxying
packets, I thought he wanted a direct connection". Well, I do. But
there's a problem. Emperor uses a peer-to-peer networking model.
Pairs (every pair? not sure) of players open direct connections to
eachother. Critically, they open two connections for each pair one
from A->B and one from B->A. This means every client in the game
needs to have open ports so they can accept connections. Worse still,
the game randomly chooses which ports to listen on. That's really not
gonna fly, unfortunately.

Originally, the WOL master server was accompanied by a "mangler"
server. "Mangler" here seems to refer to mangling packets for NAT
punching, a technique for allowing connections between two clients,
both of whom are behind a NAT so direct connections are not normally
available. This is unreliable at best, but more to the point, it
needs an accessible server to coordinate the connection and the
original is long gone. The game connection was just hanging at this
stage, waiting forever for the mangler server to respond, so I
patched out the mangler call entirely.

Now we come to the real problem: the server will assign port ranges
to each of the clients, send the ips and port ranges of each client
to all the other clients, and they will attempt to open connections
to one another. The meta communication about client list and port
ranges takes place over IRC (yes, actual IRC, more on that later)
hosted on the master server.

I solved this by completely co-opting all of winsock (winsock is the
windows implementation of network sockets). I intercept all the
required functions, and tunnel all connections through a single
client->server network connection. So when a client wants to send a
message to the server, or even to another client, it is intercepted,
wrapped in a header and pushed through the single network connection.
There is a thread running on the server dedicated to receiving
messages, and dispatching them to the appropriate destination. There
are some subtleties around determining the appropriate destination
based on port range, faking the correct sender ip, etc, but in the
end it works quite well. The game still believes it's operating in a
peer to peer fashion, but we get the benefit of being able to direct
connect with only the server host needing to worry about their
network config.

With all of that setup, I was able to start and join probably the
first Coop Campaign game in over a decade.

[first_coop]

Atreides tutorial mission, immediately followed by a victory dance
around my apartment

The part where I write an IRC server for some reason

As mentioned before, the WOL master server is really just a glorified
IRC server. There is some customisation though, so luckily the xwis
server that's still running served as a great example to work from.
There is even an open source WOL server implementation for reference
as well.

But wait a second, why write a server at all if there's one available
online already? Well, because it won't be forever. My goal with this
project is partly about just being able to play coop with my friend,
yes, but it's also about cultural preservation. I want to craft a
collection of bytes that ensures anyone who wants to play this game
can do so, forever. If I rely on some server being up, I can't do
that. As for why I didn't just use the pvpgn open source code - that
project has very different goals to me. I want to provide the bare
minimum needed to get a multiplayer game running, not run a
competitive play community.

WOL is a weird mix of standard IRC stuff and custom bits. Game
lobbies are special channels, they use standard IRC topics for game
lobby info, but then use PAGE instead of PRIVMSG for sending messages
in lobby chat, and synchronise game settings with GAMEINFO messages
whose content isn't even ASCII. In the end, implementing a basic WOL
server wasn't too complicated. It took some trial and error, and it
definitely isn't robust if you wander off the happy path, but it
works.

Packaging


[packaging]

Ready for UPS

Replacing the installer

As I mentioned at the beginning of this post, the original installer
for Emperor is broken. Westwood did release a workaround installer
that you can use by copying the contents of the install CD to your
hard drive and overwriting the setup exe, but honestly that's a mess.
I'd prefer to provide a nice simple tool that can handle it for the
user, and also deal with patching the game up to v1.09, the last
official patch.

The basic install is pretty straightforward, just copying some files
off the CD, and extracting some others from a .cab file on the CD.
cab files are just a basic archive format, like zip, and there is an
interface built into windows for extracting them.

The hard part was patching to v1.09. I was hoping I could just grab a
copy of the patch file, EM109EN.EXE, right click -> extract with 7zip
and be presented with some shiny new up to date binaries, but alas
no. The next thing I tried was checking for windows resources.
Windows resources are a standard method of embedding binary data into
an executable file on windows, supported by the compiler and build
system. Here, I actually got a hit. I didn't have any filenames, so I
couldn't tell their types, but there were a few, mostly tiny files on
the order of a few hundred bytes. One of the files was a few hundred
kilobytes though. I opened it up in a hex editor and saw something I
didn't expect.

[dosmode]

This program cannot be run in DOS mode

That's a windows executable file header![4] So we have an executable
file embedded in the patch exe. I tried to start analysing the new
binary, but none of my tools worked. "file", the unix magic number
file identification tool couldn't tell what it was either. Clearly
something wasn't right about this file. If I'd paid more attention
and done some research on the windows PE format, I would have seen
that that "MZ" bytes that should be at the very beginning of a PE
file were offset by one word, and that was all that was wrong. But
for now I decided to go back to the original patch binary and see if
I could figure out what it was doing with the executable.

[rtpatch]

Loading & running the DLL

I went searching for uses of LoadResource, a standard windows API
that a program can use to get a pointer to a resource it has
embedded, and I found this code. It is extracting the binary resource
we found to a temporary file, then loading the temp file as a .DLL
and running a function from inside it. We can also see here why we
weren't able to load the executable before - the first four bytes of
the resource are storing the size of the file, then the rest is the
file itself. I'm not quite sure why they did this, as they could have
just used the SizeofResource function instead.

I also noticed when running the patch program in a debugger, that all
the actual file changes in the install directory were happening in
the function pulled out of the embedded DLL. I had a look at the
functions exported by the DLL to check out their names - the function
that EM109EN.EXE calls is named "RTPatch32@12". RTPatch sounded like
it might be some kind of patching tool, so I went searching and sure
enough, it is.

I then went searching for any third party tools for working with
RTpatch, and found the myRTP tool by Luigi Auriemma. Checking his
source code, he straight up just loads and runs the DLL, with some
arguments to tell it where to find the game. I tried the same, but
noticed that it was just ignoring the directory I was passing in and
fetching the Emperor install directory straight from the registry. I
was able to fake the registry keys that it expected, pass in the same
command string that EM109EN.EXE did, and that actually worked. This
is kind of funny, as the embedded DLL is 168 KiB, while EM109EN.EXE
is 5.5 MiB, meaning the actual patch data is about 3% of the file
size. If they were willing to accept that kind of overhead, it makes
me wonder if it was worth all the effort of using a binary diffing
tool rather than just shipping the changed files whole.

Westwood Online Shared Internet Components

[Loading_an]

Loading the shared components was tricker than it seemed

I knew we would get back to COM eventually. When you use the original
installer, there are two components - Emperor, and something called
the "Westwood Online Shared Internet Components". Without this
optional component installed, WOL doesn't work. Of course, I want to
package this into my installer so the user doesn't need to worry
about getting it set up.

The problem is that this component seems to live up to its name - it
really is shared. I suppose the same installation of the shared
internet components could be used by multiple Westwood games with WOL
support. It's not installed in the Emperor directory, but in its own
folder somewhere, and at first I didn't know how Emperor was finding
it. I thought it might be through something saved in the registry?
Well, it turns out that I was correct, but not in the way I expected.

The main thing installed is WOLAPI.DLL, and it is a COM class
library. Emperor loads code from the library by using CoCreateClass
to instantiate the various COM objects contained in WOLAPI.DLL, which
were registered at install time.

Ok for real this time, WTF is COM?

If you didn't read the previous section "What is lightweight COM?", I
suggest you go back and read it now.

Again, I'm not going to give you a full explanation of all the
subtleties of COM, mostly because I'm certain I'm not qualified to do
so. But I can explain the process of registering a COM class library
DLL.

Let's say we have a chunk of functionality implemented as COM
objects, and we want to make that code available as a library to be
used by other processes. How do they find our DLL? How do they
identify the objects they want to create? COM registration is a
method for solving this problem.

In COM, in addition to its human readable name, each class is
identified by a unique identifier known as a GUID (or more
specifically, a class ID or CLSID). There is a standard COM function,
CoCreateInstance, which is used to create instances of classes. You
identify the class you want to create by passing in the CLSID of the
class, and it gives you back a pointer to a new instance of that
class. But where does it store the list of available classes?

In the windows registry, under the key HKEY_LOCAL_MACHINE\SOFTWARE\
Classes\CLSID. So COM registration is essentially taking a DLL full
of COM classes, and saving the CLSIDs of those classes, along with a
path to the DLL implementing them into the windows registry at a
system wide level.

I've glossed over a lot of complexity here - for example COM has "out
of process" servers, where the class is provided by an implementation
that lives in a separate process, with automatic marshalling across
process boundaries. But the part that matters here is just about
telling CoCreateInstance where to find the WOL interfaces.

Ok, so can we just copy WOLAPI.DLL into the install folder and do
normal COM registration? Well, we could but it's not ideal. COM
registration writes to the HKEY_LOCAL_MACHINE key, which requires
admin access. Also, if at all possible I would like to keep things
scoped to just affect our game process. Registering COM objects
system wide seems unnecessarily messy.

As far as I can tell, it's not possible to register a class library
for just one process[5], but it is possible to register a class for
just one user by redirecting the registry, and then using the
OaEnablePerUserTLibRegistration function, so that's what I did.

Launcher UI

The last thing left to do was to make a basic launcher UI where
players could type an ip to connect to, and tweak some basic
settings.

[launcher]

UI design is my passion

In keeping with the very microsofty and quite retro theme of all the
tech presented so far, I decided to use plain old win32. It was my
first time making a UI with raw win32 controls, and I gotta say -
it's kinda rough. I can see why people generally don't use it
anymore. But for something so simple and static like this, it was
fine.

Conclusion

So with that, my goals were pretty much achieved. This blog post is
getting long so I won't got into details about the last few bits of
polish. If you made it this far, thanks for listening to my
ramblings, and if you try playing it, I hope you enjoy the game.


***


1: Detours actually takes care of the CreateRemoteThread &
LoadLibrary trick for us also.

2: It's actually a bit of a crazy hack, and probably not very
resilient to change. What they're doing is searching for the bytes of
a comparison to 2048, like this:

[SetRenderT]

Example from DIRECT3DDEVICEI::SetRenderTarget

They just search the whole binary for the pattern B8 00 08 00 00 39,
and replace the 00 08 00 00 part (little endian / 2048 in decimal)
with FF FF FF FF which makes the check an unconditional pass. B8 at
the start is the mov opcode, and the 39 at the end is the start of
the cmp opcode that follows. I had a scan through d3dim700.dll on my
system, and it seemed like there weren't any false positives. Still
though, this is not a super nice solution, and could be broken if
Microsoft does ever decide to ship an update to Direct3D 7 for some
reason. I would actually like to use a complete reimplementation of
the D3D7 API on top of a more modern D3D, like dxwrapper, but I tried
a few, and none of them really worked well with Emperor.

3: Actually, it seems like the offset is always there, and always a
little broken, even at 4:3. It's just small enough that it doesn't
really matter.

4: Every windows executable actually starts with a DOS executable.
Normally, this is just a small program that prints the message "This
program cannot be run in DOS mode". This was a backwards
compatability thing in the early days of windows. If a user tried to
run a windows EXE from DOS, they would get a nice error explaining to
them that they needed to use windows to run this file. Source here.

5: I did try using the DllGetClassObject function directly from the
DLL, but it didn't work.

***

Blog index
Subscribe via RSS, Email or twitter.