IndieCity Developer Challenge

Sunday, April 24, 2011

YDAB! - The Vista Crisis

It's probably a tradition, that even with a software that has been working without problems for a long time during development and testing, a serious bug shows up just when the release date is getting close (or sometimes just after release).

[Warning: This is about to get quite technical!]

This also happened with Your Doodles Are Bugged!. With release scheduled for Monday, on the previous Thursday I get a message from Neil at Blitz 1UP, that the Steam version of the game crashes! But for me it always worked - what can this be? After some more testing by Neil (a lot more, actually - thanks again for that!) the details become clearer. The Steam version only crashes on Windows Vista. It works fine on Windows XP and Windows 7. And on Vista, the standard non-Steam version of the game works fine too. It is only the combination Steam-version + Vista where the game crashes. Which is also the reason why we didn't find this earlier. The game had been tested on Vista, but only the non-Steam version. And the Steam version had been tested too, but only on Windows 7.

The kind of error messages that Neil reported led me to believe that this crash was related to calls into the Steam API. But to find out what was going on, I would have to test this myself.

Which meant that I needed access to a Vista machine. Being only a small indie developer, I do not have a large collection of computers with various OSes. And of the computers I have, none is running Vista (thank god!).
So I ended up ripping out the system disk from my development workstation and temporarily replacing it with a spare disk I had lying around, on which I then installed Vista, VS C# Express and the XNA Game Studio 3.1. This alone was a huge undertaking. Vista installs are so slow! Especially if you then need to have SP 1 too (for GS 3.1). I think Vista downloaded and installed about 125 updates from Microsoft before it was ready to install SP1. And this takes a lot of time!

So a few hours later, I finally had a computer with Vista, and yes, I was able to reproduce the problem. I got the same crash and error messages that Neil had reported. Which was actually good, because reproducing a problem is the first step in fixing it.

As said, my suspicion was that the crash was related to a Steam API call. And a quick run with the Debugger attached confirmed this: The game crashed when it called the achievement init function in my Steam wrapper DLL. In that init function, there was a call into the Steam API that returned NULL when actually it should not, so I ended up with trying to access a NULL pointer reference - with the expected disastrous results.

But why did this call return NULL on Vista, when it worked fine (returning a proper non-NULL object) on Windows 7 and Windows XP?

To be honest, I still don't know. I only know what I changed so that this doesn't happen any longer.

To explain this, I have to explain some background:

The Steam API comes in form of a DLL that is written in unmanaged C++, plus some C++ header files. It can therefore not be used directly in a program that is written in managed C# (the language of XNA, which I also used for YDAB!). Instead you have to do PInvoke calls, where you correctly marshal the arguments and return value of the function call. For complex C++ objects, this marshaling is very difficult (if even possible at all), so the usual solution, which I also chose, is to write a wrapper DLL in C++. In this wrapper, I have code that creates and uses the more complex C++ objects, and that exposes external functions in a way, that these functions are easy to call with PInvoke. So on the C# side, instead of having to deal with the complexities of the C++ objects (and how to marshal them), I only have to deal with the "nice" functions that I exposed in the wrapper DLL.

Some of the Steam API functions however have such a trivial signature, that it is easy to call them with PInvoke directly, without the need for such a "nice" wrapper. For example there is a steam_init function that does not take any arguments and returns bool. Totally trivial to call with PInvoke.

So for a handful of these trivial functions, I did not create stubs in my wrapper, but instead I called them directly from my C# code via PInvoke.

And this, for some reason that I do not understand, created the problem. Or more precisely: There was no problem on Windows 7, which is why I was never aware of this during development. And there doesn't seem to be a problem on XP either, as Neil's testing showed. But on Vista, it just didn't work.

The NULL pointer I mentioned was returned from a call into the Steam API that would normally return a pointer to an object. But only if you had previously called the steam_init function in the Steam API. Of course I had called  it (otherwise it wouldn't have worked on Windows 7 either), but I had called it like this:

  • First my C# code called the steam_init function in the steam_api.dll directly, via PInvoke.
  • After steam_init returned, my C# code then called the achievement init function in my SteamWrapper.dll, via PInvoke.
    • In this SteamWrapper, my C++ code then called a function in the Steam API to retrieve a certain object. This call returns NULL if steam_init has not been called before.

This last call should normally return the object, but it returned NULL - at least on Vista - as if I had never called steam_init (which I however had, see the first bullet).

A bit more testing showed me, that there was no problem if I added another call to steam_init directly in the achievement init function of my SteamWrapper code. Which sent me on the right track: It turns out, that if the call to steam_init happens via a direct PInvoke call into the steam_api.dll, but the the later call to another Steam API function happens from code in my SteamWrapper.dll, then on Vista this latter call behaves as if steam_init was never called.

So my solution was to never call any function in steam_api.dll directly. Instead, even for the trivial functions like steam_init, I made stubs in my wrapper DLL, and all my calls would now go through the wrapper DLL. So I now have the following, slightly different flow:

  • First my C# code calls the steam_init stub in my SteamWrapper.dll, via PInvoke.
    • In the SteamWrapper, my C++ code then calls the real steam_init function in the steam_api.dll.
  • After the steam_init stub returns, my C# code then calls the achievement init function in my SteamWrapper.dll, via PInvoke.
    • In this SteamWrapper, my C++ code then calls a function in the Steam API to retrieve a certain object. And now this call no longer returns NULL!

And this works on Vista too. Crisis averted! :-)

1 comments:

  1. So now you know: It is not your doodles but Your Vista Is Bugged!

    I tend to use one of the test virtual PC images from Microsoft whenever I run into something that seems OS related. You would need a decent download rate for that though (they are large images, but soooo very useful!)

    Grtz,
    Niels

    ReplyDelete