Blog

πŸ“ Posted:
πŸ’° Funded by:
Root, Ember2528
🏷️ Tags:

So, one push to make up for πŸ“ Shuusou Gyoku screenshots being closer to 4 pushes than 3, a second push to make up for πŸ“ the big PC-98 Touhou portability subproject being closer to 12 pushes than 11… and a third push because the planned Shuusou Gyoku maintenance turned out to actually involve significant work? I did not expect that implementing my vision would involve sending four pull requests to SDL that fixed three bugs and added one small feature.
On the flipside, it's great to see how my contributions were reasonable and well-explained enough for Sam Lantinga to merge them pretty much instantly. It's things like these, the merged support for ancient MSVC versions, or the ongoing DOS port that will probably be merged as well, that give SDL a sense of being more of a community-owned project as opposed to a more tightly controlled one. We should definitely try upstreaming our Windows 98 port too, once it's done.

Most of the changes in this build concern aspects I've explained at length in earlier blog posts:

Unfortunately, SDL 3 turned out to be not as perfect as it seemed in April after all:

  1. Most of SDL's C standard library function implementations do not deliver on the promise of a consistent implementation across platforms because they fall back on the compiler's C runtime by default. This may make sense for all the floating-point functions, which behave in largely unsurprising ways. In that case, you might as well use a compiler's own optimized implementation, sure. But doing the same for all the string functions, where we do want consistent behavior that isn't forced to implement the locale braindeath mandated by the C standard? Or maybe this is preferable after all if you consider the subtle limitations in SDL's rather lazily implemented replacements, like how the printf() family prints an undefined value if the integer portion of a float value exceeds the range of an unsigned long long? This probably hurts more applications than the rare actual effects of locale braindeath ever could. :thonk:
    Sure, we could configure our Windows builds to opt into these replacements, but we don't have that control on Linux where we'd better use a distribution's SDL package. If I do start using them one day, it's purely because it removes the statically linked implementations from the game binary.

  2. The Windows implementation of SDL's file I/O functions uses buffered I/O for reading, but not for writing. This isn't much of a problem for Shuusou Gyoku itself because all of the file formats written by game logic just consist of a rather small number of buffers.
    But it does become a problem in conjunction with SDL_SaveBMP(), which we'd like to use for .BMP screenshots due to its support for any possible pixel format. My previous implementation for pbg's unpublished .BMP saving code evolved out of the debug code I quickly cobbled together while I was πŸ“ reverse-engineering all the porting-relevant surface management details of DirectDraw. This code was (and still is) limited to the pixel formats that .BMP most naturally supports, which was fine since those formats exactly matched the ones used by DirectDraw's framebuffer at all relevant bit depths, at least on my machine. Thanks to my objectively correct solution for handling endianness at the type level, this code even has well-defined byte order for the header fields, and thus works just as well on big-endian systems. After naturally filling in two structures, the code can then simply write out the entire .BMP file within four write calls. Even with my previously equally unbuffered file I/O functions, it doesn't get much faster than that.
    SDL's .BMP writer, on the other hand, was implemented with the exact opposite set of priorities: With a pixel format converter as part of the library, it can always convert any image into a .BMP-compatible format. But then, it decided to shift byte order handling to the I/O subsystem, using one write call for each individual field within the .BMP header. And if those write calls aren't buffered and get directly translated into Win32 WriteFile() syscalls, well…

    Surface
    conversion
    Header Pixels Total Notes
    8-bit, fast 0.008 ms
    16-bit, fast 0.095 ms only supports XRGB1555
    32-bit, fast 0.164 ms
    8-bit, SDL 8.196 ms 1.972 ms 10.169 ms
    16-bit, SDL 2.154 ms 0.299 ms 3.166 ms 5.619 ms gets converted to 24-bit
    32-bit, SDL 0.753 ms 0.279 ms 3.162 ms 4.194 ms gets converted to 24-bit
    Durations of saving an already retrieved 640Γ—480-pixel buffer on the same Windows system.

    It's quite hilarious to see SDL getting slower as the bit depth decreases. If SDL ends up calling WriteFile() 1024 times to save the palette for an 8-bit image one byte at a time, it's no wonder that writing the header takes 4Γ— as long as writing the pixel data itself. With that much of a performance difference, removing my previous fast path would be an unacceptable downgrade. This is why the P0326 build only uses SDL's .BMP writer if it absolutely has to.

    That said, we could definitely improve SDL's .BMP writer to get the best of both worlds. Aside from adding general write buffering for Windows, I could add some of my fast paths, or even cover 16-bit RGB565 using .BMP's obscure BI_BITFIELDS feature rather than upconverting such images to 24-bit RGB888. If you like ReC98 being used as a means to get me to make much more globally valuable contributions to SDL, this is the issue you want to fund. If you do primarily care about the games though, it might still be worth it – then, we could save 32-bit screenshots as 24-bit .BMPs, making them 25% smaller with not that much of a reduction in saving performance.
    Then again, having access to SDL_ConvertPixels() in logic code means that we now even support arbitrary pixel formats for WebP, and computers are only going to get faster… :thonk:

These pushes only scratched the surface, and there's still a bit to do in terms of fully embracing SDL and removing redundant glue code. For example, we now load πŸ“ Shuusou Gyoku's packed sound effect files using SDL's .WAV loader, which removed miniaudio's integrated dr_wav along with the C runtime's fopen() implementation it forcibly depends on, but we still use miniaudio for both sound mixing and output. Swapping out these libraries takes more testing effort than you might think, and I had to stop somewhere. For now, I got everything out of this that I wanted, and it's time to go back to working on actual features.


In a final bit of SDL-unrelated and more wholesome news, the Windows 98 port now makes sure to actually pick MS Gothic on non-Japanese systems instead of potentially falling back on the different and possibly Mincho-styled font you might have seen in πŸ“ the screenshots for my first Windows 98 release:

Screenshot of Shuusou Gyoku's main menu as rendered by the P0310-2 build on Windows 98, falsely using MS Mincho rather than MS Gothic for all its textScreenshot of Shuusou Gyoku's main menu as rendered by the P0326 build on Windows 98, now using the correct MS Gothic font

And that's it for now!

Next up: No more delays, no more excuses, it's finally time for the long-expected big look at TH03's MAIN.EXE! I've long dreaded this moment because every time I've looked at that binary, I saw highly intertwined foundational gameplay features that made it hard to focus on just a single thing. But now that netplay hype has accumulated plenty of budget, I can take a more extended look at all of these aspects, or even cover all of them in one big delivery if need be.
The goal of netplay also guides my RE efforts into two more specific directions:

  1. Identifying πŸ“ the two remaining difficulty-controlling variables is crucial before we can even start working on netplay.
  2. Identifying data at the top and bottom edges of the currently un-RE'd portion of the data segment can help with optimizing rollback. If that data is constant, we can reduce the amount of per-frame data saved in the rollback buffer, increasing performance and rollback times.

Or maybe it makes more sense to just go for the AI, enemy, or pattern code commissioned by LeyDud instead? We'll see.