- π Posted:
- π° Funded by:
- Root, Ember2528
- π·οΈ Tags:
So, one push to make up for π Shuusou Gyoku screenshots being closer to 4 pushes than 3, a second push to make up for π the big PC-98 Touhou portability subproject being closer to 12 pushes than 11β¦ and a third push because the planned Shuusou Gyoku maintenance turned out to actually involve significant work? I did not expect that implementing my vision would involve sending four pull requests to SDL that fixed three bugs and added one small feature.
On the flipside, it's great to see how my contributions were reasonable and well-explained enough for Sam Lantinga to merge them pretty much instantly. It's things like these, the merged support for ancient MSVC versions, or the ongoing DOS port that will probably be merged as well, that give SDL a sense of being more of a community-owned project as opposed to a more tightly controlled one. We should definitely try upstreaming our Windows 98 port too, once it's done.
Most of the changes in this build concern aspects I've explained at length in earlier blog posts:
π As planned in January, the Linux build now compiles on GCC β₯15! As usual for C++ compilers, this switch once again required a nonzero amount of changes to make this codebase compile without errors or warnings, but that set of changes was far smaller this time around than it was when I added Clang support back in December.
Seeing how GCC lacks support and certain overloads for π a different set of C++ range algorithms is so tragic that it's almost hilarious again at this point. As for actual annoyances though, GCC still struggles with π import-then-#include, the apparent prime challenge of implementing C++ modules that both MSVC and Clang have largely solved by now. The resulting redefinition errors pretty much force us to move all#includes of third-party C library headers to above the firstimport. In.cppfiles, this is no problem, but what if we need any of those third-party data types in our headers? After all, we can#includeour headers in any order and can thus no longer guarantee that the third-party#includes will come before theimport std;statement we need in our headers. Further C++-modularization of our logic code is way beyond the scope of these three pushes, so we have no choice but to completely remove any third-party header#includes from our headers. If we need their declarations, we now have to resort to pre-declared struct types and even worse#ifdefhacks for enum typesβ¦ Oh well, this does speed up the build ever so slightly in the end.
On a more positive note, GCC brought another set of highly useful warnings to the table, especially in conjunction with the
CFLAGSthat Arch Linux sets by default for every package built withmakepkg. Special shoutout to-Wformat-overflow, which Arch Linux specifies via-Werror=format-security. This warning bringssprintf()buffer size validation to its logical conclusion: It doesn't just consider the format string and arguments in isolation, but factors in all statically available information and even control flow to precisely determine the exact required size of the output buffer, and then warns if the given buffer is too small.void format_stage_number(uint8_t stage) { // This is a classic shmup, we only ever have 6 stages, so 1 digit // plus terminating 0 is enough, right? char buffer[1 + 1]; // Wrong: A `uint8_t` can range from 0 to 255 inclusive, and the // compiler can't statically prove that [stage] will only ever have // a single-digit value. By comparing the (statically known) value // range of [stage] to the (statically known) [buffer] size, GCC's // `-Wformat-overflow` can then precisely warn that [buffer] must // be at least four characters large to safely avoid buffer // overflows in all possible circumstances. sprintf(buffer, "%d", stage); // This, on the other hand, causes no warning because the string // can't possibly take up more than two bytes. const uint8_t known_stage = 1; sprintf(buffer, "%d", known_stage); // This also passes with no warning! Compilers are awesome. if(stage >= 10) { return; } sprintf(buffer, "%d", stage); }Here's the full warning. Now that Shuusou Gyoku can compile with both GCC and Clang, it makes sense to support both of them in the build system without requiring users to edit the Lua files. Defining a generic *nix toolchain that just uses the
ccandc++symlinks might look like a good idea due to the general compatibility of GCC's and Clang's command-line flags, but wouldn't work for us due to π the completely different command-line flags needed for C++ modules. The build system must know in advance what's behind these symlinks in order to generate the correct set of build rules for Tup.
While CMake solves this compiler detection issue by compiling a test program that accesses a compiler's predefined macros, I went for the much simpler solution of parsing the string returned bycc --version. If you want to use a different compiler after all, you can always overrideccwith theCCenvironment variable, as you'd expect.π As planned in April, the config file now stores the selected graphics API in terms of SDL's driver identifier string rather than using the more volatile index into SDL's build-specific driver list.
π As also planned later in April, I now went all in on SDL 3, made it a hard dependency, and removed the SDL 2 code path, which paved the way for a much simpler and more featureful overall architecture. And yes, this means that even the Windows 98 build runs on SDL 3 now! The Windows 98 port of SDL 3 reuses most of the small changes we needed for SDL 2, but required a few more on top to compile SDL 3's expanded feature set without warnings.
Then, I went straight to replacing my π makeshift locale-independent file I/O abstraction with SDL's always-UTF-8 counterpart. And oh boy did this reveal how terrible my code actually was, particularly due to its aim of naturally supporting C++ data structures.
Switching to a more traditional stream-based API not only allowed me to delete all these abstractions, but surprisingly also simplified most file I/O call sites. Sure, a C API means no more well-defined lifetimes and forces me to manually close streams again, but Shuusou Gyoku doesn't do that much file I/O for that to be even just a slight annoyance. I sure didn't feel the need for a wrapper class, but I did feel the need for char8_t*wrappers that made SDL's file functions work more naturally with the strong type-level distinction between UTF-8 and packfile-originating Shift-JIS I applied to the rest of the code.
In the end, π timestamp preservation is the only remaining file-related feature that still requires custom platform-specific code, since it's most certainly outside of SDL's scope. It originally seemed as if I also needed to keep exclusive file opening for screenshots as SDL had no way of specifyingCREATE_NEWon Windows, but sincefopen(β¦, "wx")did work in an undocumented way on all of SDL's other automatically tested platforms, it made sense to just turn this into an officially supported feature and provide the missing Windows implementation.
Unfortunately, SDL 3 turned out to be not as perfect as it seemed in April after all:
Most of SDL's C standard library function implementations do not deliver on the promise of a consistent implementation across platforms because they fall back on the compiler's C runtime by default. This may make sense for all the floating-point functions, which behave in largely unsurprising ways. In that case, you might as well use a compiler's own optimized implementation, sure. But doing the same for all the string functions, where we do want consistent behavior that isn't forced to implement the locale braindeath mandated by the C standard? Or maybe this is preferable after all if you consider the subtle limitations in SDL's rather lazily implemented replacements, like how the
printf()family prints an undefined value if the integer portion of afloatvalue exceeds the range of anunsigned long long? This probably hurts more applications than the rare actual effects of locale braindeath ever could.
Sure, we could configure our Windows builds to opt into these replacements, but we don't have that control on Linux where we'd better use a distribution's SDL package. If I do start using them one day, it's purely because it removes the statically linked implementations from the game binary.The Windows implementation of SDL's file I/O functions uses buffered I/O for reading, but not for writing. This isn't much of a problem for Shuusou Gyoku itself because all of the file formats written by game logic just consist of a rather small number of buffers.
But it does become a problem in conjunction withSDL_SaveBMP(), which we'd like to use for .BMP screenshots due to its support for any possible pixel format. My previous implementation for pbg's unpublished .BMP saving code evolved out of the debug code I quickly cobbled together while I was π reverse-engineering all the porting-relevant surface management details of DirectDraw. This code was (and still is) limited to the pixel formats that .BMP most naturally supports, which was fine since those formats exactly matched the ones used by DirectDraw's framebuffer at all relevant bit depths, at least on my machine. Thanks to my objectively correct solution for handling endianness at the type level, this code even has well-defined byte order for the header fields, and thus works just as well on big-endian systems. After naturally filling in two structures, the code can then simply write out the entire .BMP file within four write calls. Even with my previously equally unbuffered file I/O functions, it doesn't get much faster than that.
SDL's .BMP writer, on the other hand, was implemented with the exact opposite set of priorities: With a pixel format converter as part of the library, it can always convert any image into a .BMP-compatible format. But then, it decided to shift byte order handling to the I/O subsystem, using one write call for each individual field within the .BMP header. And if those write calls aren't buffered and get directly translated into Win32WriteFile()syscalls, wellβ¦Surface
conversionHeader Pixels Total Notes 8-bit, fast 0.008 ms 16-bit, fast 0.095 ms only supports XRGB1555 32-bit, fast 0.164 ms 8-bit, SDL 8.196 ms 1.972 ms 10.169 ms 16-bit, SDL 2.154 ms 0.299 ms 3.166 ms 5.619 ms gets converted to 24-bit 32-bit, SDL 0.753 ms 0.279 ms 3.162 ms 4.194 ms gets converted to 24-bit Durations of saving an already retrieved 640Γ480-pixel buffer on the same Windows system. It's quite hilarious to see SDL getting slower as the bit depth decreases. If SDL ends up calling
WriteFile()1024 times to save the palette for an 8-bit image one byte at a time, it's no wonder that writing the header takes 4Γ as long as writing the pixel data itself. With that much of a performance difference, removing my previous fast path would be an unacceptable downgrade. This is why the P0326 build only uses SDL's .BMP writer if it absolutely has to.That said, we could definitely improve SDL's .BMP writer to get the best of both worlds. Aside from adding general write buffering for Windows, I could add some of my fast paths, or even cover 16-bit RGB565 using .BMP's obscure
BI_BITFIELDSfeature rather than upconverting such images to 24-bit RGB888. If you like ReC98 being used as a means to get me to make much more globally valuable contributions to SDL, this is the issue you want to fund. If you do primarily care about the games though, it might still be worth it β then, we could save 32-bit screenshots as 24-bit .BMPs, making them 25% smaller with not that much of a reduction in saving performance.
Then again, having access toSDL_ConvertPixels()in logic code means that we now even support arbitrary pixel formats for WebP, and computers are only going to get fasterβ¦
These pushes only scratched the surface, and there's still a bit to do in terms of fully embracing SDL and removing redundant glue code. For example, we now load π Shuusou Gyoku's packed sound effect files using SDL's .WAV loader, which removed miniaudio's integrated dr_wav along with the C runtime's fopen() implementation it forcibly depends on, but we still use miniaudio for both sound mixing and output. Swapping out these libraries takes more testing effort than you might think, and I had to stop somewhere. For now, I got everything out of this that I wanted, and it's time to go back to working on actual features.
In a final bit of SDL-unrelated and more wholesome news, the Windows 98 port now makes sure to actually pick MS Gothic on non-Japanese systems instead of potentially falling back on the different and possibly Mincho-styled font you might have seen in π the screenshots for my first Windows 98 release:


And that's it for now!
Next up: No more delays, no more excuses, it's finally time for the long-expected big look at TH03's MAIN.EXE! I've long dreaded this moment because every time I've looked at that binary, I saw highly intertwined foundational gameplay features that made it hard to focus on just a single thing. But now that netplay hype has accumulated plenty of budget, I can take a more extended look at all of these aspects, or even cover all of them in one big delivery if need be.
The goal of netplay also guides my RE efforts into two more specific directions:
- Identifying π the two remaining difficulty-controlling variables is crucial before we can even start working on netplay.
- Identifying data at the top and bottom edges of the currently un-RE'd portion of the data segment can help with optimizing rollback. If that data is constant, we can reduce the amount of per-frame data saved in the rollback buffer, increasing performance and rollback times.
Or maybe it makes more sense to just go for the AI, enemy, or pattern code commissioned by LeyDud instead? We'll see.