mthsdj

Binary patching A2 Racer IV to fix a crash

A short exercise in reversing, identifying and patching a major issue in an old videogame.


Davilex Games was a game developer based in Houten, NL. The company pioneered the casual PC game category in their country, selling games in the late ’90s to early ’00s developed within short timespans at significantly lower prices than AAA titles. Their games enjoyed great popularity within their home country, with their racing games likely being the most well-known: A2 Racer, Grachtenracer, Autobahn Raser, USA Racer, etc. the list goes on and on.

A2 Racer IV: The Cop’s Revenge (Dutch title: ‘A2 Racer: de Politie slaat terug’) from 2000 is the 4th installment in the A2 Racer series, named after the A2 motorway spanning from Amsterdam to the Belgian border. Though, this name did not let Davilex limit the setting: two-thirds of the race courses in this edition take place in either Germany or Austria.

As with many old games, running them on newer software and/or hardware can become challenging. Graphics APIs change, physics linked to frame rates cause issues, drivers lose compatibility, etcetera. However, the issue preventing this game to launch is different, affecting even era-appropriate systems!

The issue

Upon launching A2Racer4.exe, the game enters the menu just fine, which curiously runs as a separate process. Entering a race launches ‘spel.dat’, which is just a regular (yet another!) executable in PE format. Race parameters (i.e., which track, car, upgrades, etc. have been selected in the menu) are communicated through ini files in the root game directory.

After spel.dat is launched, the menu shows itself again immediately after. Either the error box is obscured (alt+tab brings it up), or the exception is caught and ignored, throwing the player back to the menu. Fiddling with settings does not help. The game can’t be played.

Now the interesting part: going back to early/mid 00’s hardware or software does not necessarily fix this! An AMD Athlon with a GeForce FX 5700 shows the same behavior [1]. In the past, attempts to run the game on an Athlon XP combined with GeForce4 Ti4800, FX 5700 or a 7600 GT failed in the same fashion.

The cause

Time to investigate. Conveniently, the game can be started directly since the previous race configuration has been stored in ini files. Launching the game (spel.dat /Default) with a debugger attached reveals the true error:

Access violation
To find the cause, some decyphering around this address was needed to find what the code is trying to do. Within the crashing function, there is a function call with parameters 256 and 256, which hints towards something texture related. Sure enough, calls to ddraw.dll were still pointed towards in some registers after the crash. Starting at the imports (DirectDrawCreate, DirectDrawEnumerate), one can keep decompiling towards this crashing function. Importing the DirectDraw virtual tables from Wine headers into IDA proved to be very helpful. The binary also contains quite a lot of debug strings, which also helps. The pseudocode of the crashing function looks something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
static IDirectDraw *lpDD;

IDirectDrawSurface* make_surf(int width, int height) {
    IDirectDrawSurface *surf;
    HRESULT res = lpDD->CreateSurface(.., &surface, ..);
    return res == 0 ? surf : 0;
}

int crashes() {
    IDirectDrawSurface *surfs[1024];
    int size = 0;
    for (void *p = &surfs; ; p++) {
        IDirectDrawSurface *surf = make_surf(256, 256);
        *p = surf; /* <--- Access violation after X iterations */
        if (!surf)
            break;
        size += bpp/8 * 256 * 256;
        counter++;
    }
    if (counter > 0) {
        p = &surfs;
        do {
            (*p)->Release();
            p++;
            counter--;
        } while (counter);
    }
    return size;
}

Unsuprisingly, as 256 x 256 x 16bpp texture surfaces keep getting created, line 14 eventually writes out of bounds and causes a crash. As long as there is sufficient video memory, CreateSurface() on line 6 returns zero (i.e., DDERR_OK). Also, notice how the surfaces are released right afterwards! Clearly, the allocated surfaces are not actually used during gameplay.

The system does not run out of video memory. When running with 16-bit color depth, each surface takes $256 \times 256 \times 2$ bytes of memory, which amounts to 128 KB per surface. However, note that space for only 1024 surface pointers have been allocated on the stack. This means that if there exists >= 128 MB free video memory, the program will hypothetically write out of bounds eventually. If the game runs at 8-bit color depth, then this happens when VRAM >= 64 MB. Graphics cards with these amounts of VRAM were certainly not commonplace during the development of the game until maybe a couple years after.

Stack

The exact number of iterations to crash is not 1025, as one would expect. The exact number depends on the stack size. On 32-bit Windows, there seems to consistently be about 411 bytes after the surfs array that surface pointers are written to without triggering a crash. For 64-bit Windows, this is 491 bytes. This amounts to 102 and 122 extra (out-of-bound) iterations, respectively, before crashing out with an access violation. Additional looking into Windows and MSVC internals is required to figure out why the stack is specifically this size.

If this is the true reason of the crash, a computer with lower video memory should not hit the end of the stack and run the game correctly. To test our hypothesis, we can use 86Box to emulate a 3dfx Voodoo 3 3000 with 16MB of VRAM (incidentally, this is what I ran the game on as a kid). Full emulation of x86 hardware is still an arduous task for modern CPUs, and the minimum requirement of the game is a Pentium II. However, actual gameplay is not necessary nor does it have to be smooth, we just want to check our hypothesis. Hence, a Pentium II downclocked to 166 Mhz was chosen.

And indeed, after 103 iterations (and just shy of 13 MB of pointless texture surfaces allocated), CreateSurface() returns a non-zero code: 0x8876017C. Lets look at the return codes of CreateSurface() in the DirectX documentation. One piques our interest: DDERR_OUTOFVIDEOMEMORY. The docs do not mention enum value definitions, so a quick look at the DirectDraw headers in Wine reveals the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// in ddraw.h
#define DDERR_OUTOFVIDEOMEMORY            MAKE_DDHRESULT( 380 )
...
#define _FACDD        0x876
...
#define MAKE_DDHRESULT( code )  MAKE_HRESULT( 1, _FACDD, code )
...
// in winerror.h
#define MAKE_HRESULT(sev,fac,code) \
    ((HRESULT) (((unsigned int)(sev)<<31) | ((unsigned int)(fac)<<16) | ((unsigned int)(code))) )

Throwing the values into MAKE_HRESULT indeed arrives at 0x8876017C (DDERR_OUTOFVIDEOMEMORY).

Stack

There is yet another way to verify. The excellent dgVoodoo2 can wrap ddraw.dll and present custom graphics devices to the game [2]. Additionally, the debug build also prints useful information. Setting the VRAM to 128 MB (or anything lower) shows the following debug output (full list of calls omitted for brevity):

...
INFO: DirectDraw (03935468)::CreateSurface: Texture is created, head DirectDrawSurface (06ACBFD8).
INFO: DirectDraw (03935468)::CreateSurface: Texture is created, head DirectDrawSurface (06AC9F80).
INFO: DirectDraw (03935468)::CreateSurface: Texture is created, head DirectDrawSurface (06ACCAA0).
INFO: DirectDraw (03935468)::CreateSurface: Texture is created, head DirectDrawSurface (06ACA3D0).
INFO: DirectDraw (03935468)::CreateSurface: Texture is created, head DirectDrawSurface (06ACC200).
ERROR: DirectDraw (03935468)::CreateSurface: Out of video memory for surface creation. Needed: 131072 bytes, available: 110592 bytes
ERROR: DirectDraw (03935468)::CreateSurface: creating surface has failed, HRESULT: DDERR_OUTOFVIDEOMEMORY
INFO: DirectDrawSurface (053E0370) Texture is released because this head elem is released.
INFO: DirectDrawSurface (053E0598) Texture is released because this head elem is released.
INFO: DirectDrawSurface (053E23C8) Texture is released because this head elem is released.
INFO: DirectDrawSurface (053E30B8) Texture is released because this head elem is released.
INFO: DirectDrawSurface (053E1900) Texture is released because this head elem is released.
...

CreateSurface returns out of video memory, and the loop is safely exited. Unsurprisingly, setting the VRAM to 256 MB leads to:

...
INFO: DirectDraw (038B17C0)::CreateSurface: Texture is created, head DirectDrawSurface (07B487B8).
INFO: DirectDraw (038B17C0)::CreateSurface: Texture is created, head DirectDrawSurface (07B4B728).
INFO: DirectDraw (038B17C0)::CreateSurface: Texture is created, head DirectDrawSurface (07B489E0).
INFO: DirectDraw (038B17C0)::CreateSurface: Texture is created, head DirectDrawSurface (07B48590).
INFO: DirectDraw (038B17C0)::CreateSurface: Texture is created, head DirectDrawSurface (07B4B950).
<process started at 14:13:45.880 has terminated with 0xc0000005 (EXCEPTION_ACCESS_VIOLATION)>

Why not use dgVoodoo to play the game? Well, it introduces a new error, possibly caused by stubbed out functions in the wrapper. Crucially, this error occurs after the issue that was discussed until now so it did not hinder the experiments. This is a problem to be investigated some other time.

CKeyboardDevice

Back to the problem at hand. What could be the original intention of this weird surface allocation function? Most likely, it is used to determine how many texture surfaces the system can handle. However, elsewhere in the code, more conventional ways are used to determine this as well. Anyway, there is no need to continue until CreateSurface() starts failing, so let’s do something about it.

The fix

Essentially, we want to change the loop to something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ...
    for (void *p = &surfs; ; p++) {
        IDirectDrawSurface *surf = make_surf(256, 256);
        *p = surf;
        if (!surf)
            break;
        if (counter == N)   // Added: also quit when counter reaches a certain threshold
            break;
        size += bpp/8 * 256 * 256;
        counter++;
    }
    ...

Currently, the code breaks out of the loop via a simple test eax, eax and jz. Adding extra code to a PE format exe is a no-go, and hooking via DLLs is a bit too much for such a simple patch. Let’s check for code caves with pycave [3] to find some space for extra code:

[+] Minimum code cave size: 10
[+] Image Base:  0x00400000
[+] Loading "spel.dat"...

[+] Looking for code caves...
[+] Code cave found in .text    Size: 13 bytes  RA: 0x000E2737  VA: 0x004E2737
[+] Code cave found in .text    Size: 13 bytes  RA: 0x000E5583  VA: 0x004E5583
[+] Code cave found in .text    Size: 13 bytes  RA: 0x000E560F  VA: 0x004E560F
[+] Code cave found in .rdata   Size: 13 bytes  RA: 0x00176253  VA: 0x00576253
...

There are larger caves in .data and .rdata segments, but then we’d have to mess with the program segment permissions and change them to .text. Besides, there’s 34 bytes of dead code after the third cave so we have 46 bytes to work with in total, which should be plenty. Note that the actual start of the cave is 0x4e5610 instead of 0x4e560f, since the first identified zero byte belongs to some vtable function definition. The cave is in the same code segment, so there’s no need to mess with call or reconstructing function stacks, things can be kept as simple as possible.

Below are the relevant parts of the code snippet from earlier in assembly, with changes made shown in green. The red lines have been overwritten; orange lines are instructions moved to a different location. Note that the code cave is too far to jmp short to (max -128 to +127 offsets), so the longer jmp instruction has been used instead.

...
4bc108 push 0x100
4bc10d push 0x100
4bc112 call make_surf
4bc117 add esp, 8
4bc11a mov [esi], eax
4bc11c test eax, eax          -> jmp 0x4e5610 ; Jump to code cave!
4bc11e jz short 0x4bc133      ; overwritten by above, but not needed anymore
4bc120 mov eax, var_bpp       ; partly overwritten, but is needed! need to reintroduce later
4bc125 shr eax, 3
4bc128 shl, eax, 0x10
4bc12b add ebx, eax           ; size += bpp/8 * 256 * 256;
4bc12d inc edi                ; counter++;
4bc12e add esi, 4             ; point to next surface pointer
4bc131 jmp short 0x4bc108     ; next iteration
...
...
...                           ; start of code cave:
4e5610 test eax, eax          ; if (!surf)
4e5612 jz 0x4bc133            ;     break;
4e5618 cmp edi, N             ; if (counter == N)
4e561b jz short 0x4e5627      ;     break; (but also free the surface!)
4e561d mov eax, var_bpp       ; no break? perform overwritten instruction
4e5622 jmp 0x4bc125           ; jump back, continue to regular programming
4e5627 add esi, 4             ; p++;
4e562a inc edi                ; counter++;
4e562b jmp 0x4bc133           ; jump out of cave, proceed with releasing surfs
...

Note that p and counter are incremented once more, since the make_surf() call did not actually fail to allocate for this iteration and thus there is one extra surface to call Release() on. For the case of a non-zero return value (i.e., DDERR_OUTOFVIDEOMEMORY), the regular clause is repeated. Omitting this clause could lead to leaking surfaces on systems hitting maximum video memory usage.

What value to pick for N? A hardcoded 64 works (amounting to 32 MB of VRAM). Technically, a higher value would be possible, but the game will not be using this much VRAM anyway. A lower N would return a lower size than available in reality. Possibly, this could cause the game to limit texture quality options. Attempts to find code paths that actually use the size variable have not been made, this could give some more hints.

After patching, the game now runs on Windows 9x to Windows 11 with cards having large amounts of VRAM. A testament to the backwards compatibility provided by Microsoft (“Wine is the most stable ABI on Linux”, etc, etc.), as long as a game is not doing too weird things. There are some more bugs in the game pertaining to the menu not rendering correctly with multi-monitor setups, this could be a puzzle for some other time.

Patch

Patcher can be found here: a2racer4patch.zip (23.8 KiB)

Fetches game location from registry, makes backup, and applies patch.

Alternatively, get out the hex editor. In spel.dat:

Offset Find Replace
0xbc11c 85 c0 74 13 a1 e9 ef 94 02 00
0xe5610 00 00 00 00 00 00 00 00 85 c0 0f 84 1b 6b fd ff
00 00 00 00 49 55 4e 00 83 ff 40 74 0a b8 74 09
6f 55 4e 00 27 55 4e 00 74 00 e9 fe 6a fd ff 83
90 90 90 90 90 90 90 90 c6 04 47 e9 03 6b fd ff

References

[1] A2 Racer 4 gaat de race niet in (Windows 98) : Davilex
[2] Directx - Dege’s stuffs
[3] GitHub - axcheron/pycave: Simple tool to find code caves in Portable Executable (PE) files.