Blog

📝 Posted:
💰 Funded by:
Ember2528
🏷️ Tags:

Well, that fell apart surprisingly quickly. The release of Shuusou Gyoku's Linux port just happened to be surrounded by the unluckiest sequence of events in Arch Linux land:

After I fixed a silly mistake on my part, Shuusou Gyoku was still playable on sdl2-compat as it was only affected by rather minor bugs, but these bugs still undermined the effort I put into the port. That left us with three options:

  1. Let the more involved SDL community fix sdl2-compat out on their own. After all, why should we bother if rogue distros randomly mess with our dependencies?
  2. Become part of that community and help fix the issues in either sdl2-compat or SDL 3.
  3. Properly update Shuusou Gyoku to SDL 3 right now, while keeping SDL 2 support for the Flatpak, more conservative Linux distributions, and the upcoming Windows 98 backport.

I really would have preferred to delay this migration for a few years until the dust has settled. For this project, I already picked C++ as the dependency I want to be on the bleeding edge of, and SDL 2 was supposed to balance this out by being the conservative and stable choice. Oh well, if we've got to update at some point, we might as well do it now. The ReC98 development schedule at least gave me another month of waiting for the community to sort out SDL 3's growing pains…

  1. Forced onto an unstable SDL 2 compatibility layer
  2. Updating to SDL 3
  3. Picking a screenshot format
  4. Letting players pick an effort level
  5. Future performance improvements
  6. Rendering the build ID with some unused glyphs

So, why does something like sdl2-compat even exist if it only causes problems? And why are distros rolling it out so soon after SDL 3 if SDL 2 has been working fine all the time? In a nutshell, sdl2-compat is the second pillar in SDL's forward compatibility strategy. While the 📝 dynamic API mechanism ensures compatibility with future minor versions by integrating dynamic linking so deeply that static linking is made entirely useless, sdlN-compat ensures compatibility with one future major version by implementing version N's API in terms of SDL version N+1. This allows the SDL team to very quickly stop updating version N while still allowing programs linked against that version to run well on modern systems by using all the actively maintained backends of version N+1. This worked out well with sdl12-compat, which nowadays seems to do a great job at preserving abandoned SDL 1 games – especially if we consider that you'd be running sdl12-compat on top of sdl2-compat on top of SDL 3 from now on. :tannedcirno:

So it only makes sense why the SDL developers would want to repeat this success story with the transition from SDL 2 to 3. The problem is that they're already selling sdl2-compat as a perfect drop-in replacement for proper SDL 2, and wanted to push it onto people even before SDL 3 was officially released. The sales pitch follows their usual "trust me bro" rhetoric:

If you absolutely must have the real SDL2 ("SDL 2 Classic"), please use the SDL2 branch at https://github.com/libsdl-org/SDL, which occasionally gets bug fixes (and eventually, no new formal releases). But we strongly encourage you not to do that.

Followed by zero arguments to back up this audacious suggestion. So they not only imply that sdl2-compat is already perfectly compatible and works without bugs for every SDL 2 program ever, but also that the underlying SDL 3 implementation doesn't introduce any bugs on top – and it only takes a single look into either project's issue tracker to disprove that notion. There is no technical reason why a distro couldn't ship SDL 3 and 2 in parallel. The continued existence of the SDL 2 AUR package is proof of that, and still received upset comments as of mid-March that justified its existence.
There was absolutely no reason to push sdl2-compat on everyone by default other than forcefully turning users into beta testers. SDL 2 was still stable, maintained, and working well. People who needed SDL 3 before its release for whatever feature already used SDL 3. People who want to use the SDL 3 backends to solve some obscure backend-related issue in an SDL 2 program can use sdl2-compat without needing it to be the only option available. And with a package size of 1.2 MiB, you can't convince me that SDL 2 is somehow a burden on the packaging front either – especially if your distro has separate packages for every commonly used fiddly Python and Haskell library.
I can't help but imagine the reaction if Microsoft pushed an enforced update of this magnitude. They're already getting regularly lambasted by the press for much smaller and ultimately inconsequential offenses…

For all the 📝 criticism I had about Flatpak and Flathub last time, they made the right choice of not treating their base package as a rolling and bleeding-edge distribution. The Freedesktop platform will only ship SDL 3 in its next version releasing in August, which will probably leave enough time for the SDL developers to address all but the rarest remaining issues in sdl2-compat. Although I'm not sure how I should interpret this commit being made at that specific time: This is either very considerate (because they've chosen to take up the job of early-adopting SDL 3 as part of developing the new SDK version, and thus will be helping out with reporting bugs), or very inconsiderate because they bought the whole sdl2-compat story just like Arch did. If Freedesktop SDK updates shipped in February rather than August and the release tag was on this branch, they would have screwed over their users just as much. Also, there's still not much point in force-updating everyone onto a compatibility layer in freaking 2025

Then again, I can empathize with the SDL developers to a degree. Lots of developers have been asking the "when is SDL 3 ready and stable enough for regular use?" question while picturing SDL as this highly important and central library that surely has a big team of testers who could ensure its stability at one point. But if there just isn't enough Valve money to form such a team, what else should you do as a developer other than turn your personal hype into a "it's ready now, go use it and please leave feedback" reply? Maybe, turning your users into beta testers is the only realistic way to ever approach stability in this economy. And sure, they call it 3.2.0 for… reasons, but they're not fooling anyone.

The big irony, however, is this: At one point in the future, sdl2-compat will be that perfect solution for running abandoned SDL 2 (and SDL 1) programs on top of SDL 3. But it's the exact opposite of what you'd want during active development: You want to update to SDL 3 and use the new APIs and function names to be ready for the future, but also retain the option to run on the stable SDL 2 foundation for at least a little longer until every distribution has caught up. Or, in other words, you want to run SDL 3 on top of SDL 2.
You could totally have a library that implements this alternate kind of compatibility layer. It would still be prone to bugs just like sdl2-compat, but unlike that one, the chance for new bugs is halved since you'd be running on top of the proven and stable SDL 2. But of course, such a library would restrict your codebase to SDL 2's feature set, which is probably why something like this doesn't exist. So instead, our SDL platform layer now contains 64 conditional branches and a bunch of function renaming macros and generic helper code to support compiling against both SDL 3 and SDL 2. At least I wrote it all in a way that allows us to quickly rip out SDL 2 support once we no longer need it…


Oh well, enough ranting. Because once it works, there are plenty of things to like about SDL 3. Limited to, of course, everything notable that applies to Shuusou Gyoku:

A few changes have good and bad elements:

Thankfully, the list of entirely bad changes is quite short:

Still, the constant stumbling over bugs and deliberate instabilities made this take way longer than it had any right to. For three of these bugs, I was the first one to report them, and I could have even reported a fourth one if I actually cared about Vulkan and didn't happen to find a workaround right before I pushed out the release.
With the additional API unbricking feature, we've ended up well into a second push. Replays were too big of a feature for now, but screenshot compression sounded like a nice task for the rest of that push. Really, how hard can it be? Add reference C library of our encoder of choice, call API with pixel buffer we get from SDL, write compressed pixel buffer to file. Easy, right? Well…


For starters, which format do we choose? Ember2528 had a clear preference, but it makes sense to compare it against other contenders first. There will be a complete benchmark further below, but let's get the seemingly most obvious candidate out of the way first:

QOI

Because who doesn't want a fast encoder for a simple format with steadily growing adoption? Sure, part of the adoption might be hype-driven, but as far as hype goes, there are definitely worse targets than a codec that fits in less than 300 lines of C. The low-color images we want to compress are rather simple from a modern point of view as well, so you'd expect QOI to be a perfect match…
…until you actually try encoding a few representative images and are greeted with file sizes that are way further removed from PNG than you'd expect after seeing the official benchmarks. Since the specification is short enough, we can easily explain these results:

So while reduced complexity and blazingly fast encoding speed are good arguments, they don't cut it if decent compression of our source images relies on all the complexity found in PNG. But shouldn't this deficiency have stuck out in the official benchmark in some way? After all, 43% of the images in QOI's test suite have ≤256 colors, with most of them coming from Philip K's Ancient Collection in the textures_pk directory, where they make up 80%. For this directory, the official numbers claim average compressed sizes of 80 KiB for PNG and 75 KiB for QOI, and running the benchmark myself confirms these numbers…
…but wait, the input PNG files in the test suite package are actually half that size?! Yup – this benchmark merely tests the fixed, untunable QOI format against two specific PNG encoders, libpng and stb_image, at their default compression level and filter settings. It does not claim anything about QOI's relation to the known limits of PNG as a format, despite what the hype drivers would lead you to conclude all too easily. In any case, it paints a much different picture of QOI's 256-color capabilities:

Average file size
stb_image110,337
libpng82,136
QOI77,404
PNG source files43,437
oxipng -o max -Z41,032
We will later see why comparing the slowest PNG encoders against the constantly fast QOI is, in fact, not unfair.

The final nail in QOI's coffin is this concession at the end of its release announcement:

SIMD acceleration for QOI would also be cool but (from my very limited knowledge about some SIMD instructions on ARM), the format doesn't seem to be well suited for it. Maybe someone with a bit more experience can shed some light?

I'd rather take a new image format that's designed around modern SIMD instructions from the start. Then, it can invest these performance gains into more complex filters to end up with better compression at a roughly similar encoding performance. Heck, it can even be slightly slower for all I care. SIMD-first design worked great for non-cryptographic hashes, and we'll see in a minute that it works just as well for image formats.
But Ember2528 had a different codec in mind anyway. Let's jump right to the polar opposite of the complexity spectrum:

Lossless JPEG XL

Because why wouldn't you use the currently best and most popular image format according to actual professionals who know a couple of things about image compression? It's winning benchmarks left and right, and blog posts like these make it appear as if even version 0.10 of its reference encoder already beats out every other widely used codec. And after it unfairly got removed from Chromium in 2022, you can't help but root for it. Time to do my small part in bringing its adoption to a level that Google can no longer deny!

Too bad that the enthusiasm immediately drops after cloning the libjxl repo and running a CMake test build. What are all these library dependencies, and why can't I just reduce the build to the lossless encoder? The resulting binaries are way larger than what I'd consider appropriate in relation to game code. 😩
Looking through the repo more thoroughly, however, reveals a very welcome little surprise: If a few basic requirements are met, the fastest lossless speed tier actually uses an entirely separate encoder that's implemented in a single source file and can be used independently from the rest of libjxl. Nice to see that someone thought about simple integration after all! That's exactly what I've hoped to find. Sadly, Linux distributions don't have a separate standalone package for this encoder, but it wouldn't be the only library we'd statically link on Linux.
Having a single function as an easy entry point is always a good sign, too. Those parameters, though… :thonk:

As the FJXL abbreviation implies, this encoder actually started as an independent project that, coincidentally, was a direct response to the hype surrounding QOI. By using AVX2 instructions within the confines of an existing format, it managed to beat QOI in both encoded file sizes and compression speed for every type of image its developer tested. But it's this competitive focus that brings us to its most questionable implementation decision.
The good news is that FJXL acknowledges that low-color images exist, are a prime use case for lossless compression, and are best dealt with using JPEG XL's palette features. However, detecting and optimizing that palette takes up a lot of time relative to QOI. If the input image uses more colors than a palette would make sense for, you'd want to fail as early as possible. Slide 11 explains the solution FJXL came up with:

  • Hash table with 65k possible entries
  • Any collision -> no palette
  • […]

On non-palette-friendly images, this fails quickly (birthday paradox says after ~256 distinct pixels).

On palette images, encoding 1 channel rather than 4 more than compensates the cost of detection.

With 10 additional bits and a widely renowned multiplier, the hash function looks leaps and bounds ahead of the one in QOI:

// has to map 0 to 0
uint16_t pixel_hash(uint32_t p) {
	return ((p * 2654435761) >> 16);
}
Adapted from the original code.

But since we're still hashing 32-bit RGBA pixels to 16 bits, we're bound to run into a collision sooner or later. You can certainly think of this hash function as mapping color values to uniformly distributed random numbers and then reason about its efficacy using probability theory, as we saw in the slide above. However, the conclusion drawn in that slide is rather abbreviated and ultimately misleading: The birthday paradox does not return a binary success/failure result, but a probability. In this case of 256 distinct colors:

(1 - ( 65536!  /  (65536 - 256)! ) /  65536256 ) ≈ 39.27%

Let's plug in 191, for no reason whatsoever:

(1 - ( 65536!  /  (65536 - 191)! ) /  65536191 ) ≈ 24.21%

That's a smaller probability, but a 1/4 failure rate would still be way too high for our use case. And sure enough, it actually happens in the main menu, where a single #583732FF pixel (or 0xFF323758 in its little-endian representation) collides with #FFFFFFFF:

The `main_menu` benchmark image.A 16× zoomed view of the `main_menu` benchmark image, highlighting the single #583732FF pixel that causes the hash collision in FJXL's palette detection code

The resulting 143 KiB file immediately tells us how not palettizing such images completely ruins the compression ratio. If this one pixel had any other non-colliding color, FJXL would have compressed it into a still decent 52 KiB. Therefore, the slides should have better added a graph of the failure probability, and said something like:

Not perfect, and likely to misdetect even low-color images with <256 distinct colors as not palette-friendly according to the birthday paradox.

For our use case of screenshots without an alpha channel, we could work around this whole issue by having a separate non-alpha code path. Detecting the potential palette of an RGBA image within a worst-case time complexity of 𝑂(𝑛) without using hashes requires a (232/8) = 512 MiB bit array to cover the entire RGBA color space, which is probably too steep of a memory requirement. Removing the alpha channel, however, would shrink this array to a definitely appropriate 2 MiB.

Ultimately though, we decided against doing any of that because FJXL by itself is as untunable from the outside as the codec it was inspired by. Ember2528 preferred the opposite: an encoder with multiple effort levels that offer different trade-offs between encoding speed and file size, which would allow faster CPUs to produce the smallest files at still reasonable speeds. So let's look past the bloat, link in the complete libjxl reference encoder, and see how it performs on higher effort levels…

…um, what is this API? Adapting the example code gave me encoding times that are at least 1.5× slower than the cjxl command-line encoder, and already hit the 100 ms mark at -e 2. Even -e 1 is suddenly much slower than using FJXL in isolation while yielding the same compressed sizes. Also, pushing speculative allocation onto the caller? 🤨 📝 stb_vorbis is a bad joke, not a model to be emulated.
The compressed file sizes are pretty underwhelming as well. Most of the test cases don't even get close to oxipng at -e ≤6 while still taking absurdly long to encode within the game. Even at peak effort, it's a mixed bag at best, with both oxipng and JPEG XL -e 10 massively beating the other in 3 out of 7 cases. And if that's the best we can say about this format…

All this is echoed by this recent issue that points out JPEG XL's inadequacy with an even more retro 16-color example. In the end, the documentation said it all along:

They are about 60-75% of size of PNG, and smaller than WebP lossless for photos.

But there is one widely-used image codec that both perfectly fits Ember2528's priorities and compresses well on lower effort levels. Let's finally look at the complete benchmark numbers:

main_menu / Effort0123456789
JPEG XL146,35251,85159,45345,32937,86437,27636,13035,22233,79331,724
WebP54,11632,19428,11227,86027,71228,27228,17828,12028,68427,816
AVIF272,604272,604136,220131,235119,398117,525111,380110,684110,543109,601
BMP (8 bpp)308,278
BMP/RLE 92,034
QOI 93,884
oxipng -o max -Z 30,702
ingame / Effort0123456789
JPEG XL123,606102,949130,689102,94484,91672,59068,30249,61845,86546,997
WebP50,67849,03043,62041,76040,72440,85438,60837,94037,84237,138
AVIF462,703462,703197,818156,007141,043139,689133,399132,573126,270125,379
BMP (8 bpp)308,278
BMP/RLE185,842
QOI175,949
oxipng -o max -Z 38,409
BMP, cropped185,398
BMP/RLE, cropped177,456
QOI, cropped165,620
stage6 / Effort0123456789
JPEG XL32,20424,14635,05324,59919,93619,56019,33618,44417,42316,183
WebP20,85619,91617,07016,52416,38016,56215,48815,38615,40415,124
AVIF185,676185,67684,43762,35457,79156,52452,95652,61151,96951,795
BMP (8 bpp)308,278
BMP/RLE 55,838
QOI 52,302
oxipng -o max -Z 18,741
BMP, cropped185,398
BMP/RLE, cropped 48,954
QOI, cropped 45,874
laser / Effort0123456789
JPEG XL345,199287,279301,608248,85292,46385,52981,20666,81161,44547,173
WebP85,31856,72451,55853,96453,49253,49251,86051,46051,46041,726
AVIF218,858218,858122,10088,49082,67581,24575,86675,39575,46275,138
BMP (24 bpp)921,654
QOI290,088
oxipng -o max -Z 61,595
BMP, cropped553,014
QOI, cropped280,462
laserbomb / Effort0123456789
JPEG XL332,706125,197150,436128,755110,357102,89199,71868,96866,97564,484
WebP129,47294,56486,53864,99064,06264,06260,77660,31860,31859,198
AVIF313,731313,731168,388114,111109,239107,121104,109102,05499,10699,103
BMP (24 bpp)921,654
QOI210,496
oxipng -o max -Z 87,286
BMP, cropped553,014
QOI, cropped200,002
gates / Effort0123456789
JPEG XL208,293185,662212,615172,008124,466117,509113,563110,99297,45491,146
WebP124,308125,070113,896102,656102,482102,48295,53694,76894,76857,850
AVIF306,742306,742293,874293,276254,073243,953243,947242,188241,943241,359
BMP (24 bpp)921,654
QOI157,705
oxipng -o max -Z 90,545
BMP, cropped553,014
QOI, cropped147,670
seihou / Effort0123456789
JPEG XL6,1245,0884,7324,4684,4274,4164,3774,1124,0164,040
WebP39,5185,9045,6425,5745,5005,5185,5185,5045,4865,490
AVIF26,98426,98425,08524,92722,58221,69821,69721,62721,63121,505
BMP (8 bpp)308,278
BMP/RLE 17,654
QOI 18,047
oxipng -o max -Z  5,383
BMP, cropped 23,798
BMP/RLE, cropped 14,144
QOI, cropped 13,371
The effort value directly corresponds to cwebp's -z parameter. Add 1 to get cjxl's -e parameter, and subtract from 10 for avifenc's -s parameter.
I definitely could have surveyed the landscape of PNG encoders more thoroughly, but since Ember2528 prioritized compression ratio over compression speed, there was no need to. oxipng is as good as it gets, but even its strongest and most sluggish setting is still outperformed by regular WebP at some level, and often as early as -z 2.
191 colors. The large areas in black and #DDE4FA are a great test case for an encoder's RLE capabilities. The menu's half-transparent background is slightly nasty, but should still keep this image well within the range of potential palette-based compression. (Unless you're QOI, of course.)
FJXL palette detection collision chance: 24.21%.
92 colors. Lots of repeated bullet sprites to appropriately represent gameplay, plus a small transparency effect in the Evade gauge that shouldn't complicate compression all too much.
FJXL palette detection collision chance: 6.20%.
96 colors. The wavy clock animation makes Stage 6 look complex, but we expect encoders to actually have a much easier time on the last three stages due to their backgrounds being mostly black.
FJXL palette detection collision chance: 6.72%.
1219 colors. A simple repeated tile in the background, with a big gradient that is likely to push the color count beyond palette-based algorithms.
831 colors. Similar to enemy-fired lasers, but with multiple smaller gradients rather than a single big one.
2326 colors. With a comparatively complex background, bullets, and a big laser, this is probably the most intense test case for lossless compression that this game has to offer.
40 colors. A small consolation prize for JPEG XL, as the smoothly feathered and blurred colors match the photo-like characteristics this codec was meant to target. Even oxipng gets to barely outperform WebP on this one. Then again, the difference between JPEG XL and WebP is still less than 1.5 KiB at most, for an image that doesn't represent the rest of the game.
FJXL palette detection collision chance: 1.18%.
The `main_menu` benchmark image.The `ingame` benchmark image.The `stage6` benchmark image.The `laser` benchmark image.The `laserbomb` benchmark image.The `gates` benchmark image.The `seihou` benchmark image.

Lossless WebP

Yup, it's 📝 ZMBV beating AV1 all over again. For these kinds of retro game screenshots, JPEG XL is vastly outperformed by its counterpart from the previous generation of widely-used image formats. And not just in terms of compressed file sizes, but also in every single other aspect that matters to us:

That's not to say that libwebp is perfect. Its code makes it very obvious that lossless WebP was designed for 2010-era hardware as the encoder never got optimized for modern CPUs. There was an attempt at optimizing at least the lossy encoder for AVX2, but it was ultimately abandoned because it never got fast enough. Surprisingly, the codebase did receive new AVX2 code one week before I released this build, but it only covers the lossless decoder so far.
As for concurrency, libwebp does come with support for multi-threaded encoding, and I did activate it for the Shuusou Gyoku integration, but it's only used at effort levels 8 and 9. Also, why is argb in this structure interpreted as native-endian and therefore BGRA memory order, but these are interpreted as big-endian?

But the main criticism is the same that also applies to JPEG XL: The lossless and lossy modes are lumped into the same repository despite having virtually no code in common, and are selected via a structure field rather than having unrelated API entry points. This once again makes it very difficult for static linkers to remove all the code on the lossy branches that I never asked for in the first place.
And I sure never want to run the lossy encoder under any circumstance. Lossy WebP deserves all its bad reputation for basically being VP8's intra-frame coding applied to still images. VP8, 📝 if you remember, is that bad video codec from two generations ago that I'm only serving on this website due to sheer inertia. Applying its enforced YCbCr 4:2:0 chroma subsampling to images does not only make it utterly unsuitable for pixel art, but also even worse than well-compressed JPEG which isn't limited to a single subsampling scheme. If anything in the GIAN07 process accidentally flips the "I want lossless" flag, I'd rather want the WebP encoder to error out and have the screenshot frontend fall back on BMP than save an image with mutilated colors.

But while JPEG XL is a lost cause as far as I'm concerned, I've grown to like lossless WebP too much to leave it trapped within the unfortunate organization of its codebase. Also, there seems to be a lot of untapped potential in the format – really, why does PNG get all the attention of people writing alternative encoders when lossless WebP is the demonstrably much more capable format?
So I've decided to fork libwebp and surgically remove all code related to the lossy encoder. The statically linked result now only takes up ~100 KiB in the Windows build while still being API- and ABI-compatible. Of course, Linux users will still use their distribution's libwebp package with the lossy encoder included, but let's hope that the aforementioned possibility of accidents stays purely theoretical.

Really though, why have people started to bundle lossless and lossy image codecs under the same format in the first place if their algorithms have nothing in common? It might make sense for Opus where SILK and CELT are different kinds of lossy, but lossless and lossy are two completely different paradigms. The bloat and usability confusion far outweigh any situational tricks this might offer.


Alright, we found a good format with configurable effort levels, and we're only missing a way for players to pick an effort level. Depending on how they want to use this rapid-fire screenshot feature, almost all of the options make sense in some context:

  1. You'd like to screenshot a whole section of a stage as fast as possible with the help of the disabled frame rate limiter, and you got plenty of free disk space? You probably want to stick with BMP and compress the screenshots outside of the game, just like how you would have done it without this feature.
  2. A slight slowdown is OK or maybe even welcome for providing additional feedback that you're actually taking screenshots? Pick one of WebP's higher effort values that certainly take longer than 16 ms to encode, but are still reasonably fast and won't turn the game into a <2-FPS slideshow.
  3. Want the lowest file size that your system can encode while staying at 62.5 FPS? Well, how fast is your system? And not just the CPU – maybe your system is actually bottlenecked by I/O and writing a large uncompressed BMP file takes much longer than encoding it into WebP and writing the resulting smaller file.

The latter two use cases would be covered by automatic detection of the maximum effort value that encodes within a given number of frames. The problem, however, is that encoding times are always relative to the complexity of the image. Once we're in-game and have lots of bullets and lasers, any choice that might have been appropriate for the main menu might suddenly start dropping frames after all. Thus, we can't solve this with an upfront benchmark, but have to dynamically adapt to the complexity of the current game scene. But then the whole idea falls apart as we can't possibly treat the configurable allowed screenshot time as a hard limit. To figure out whether it's safe to raise the effort level again, there's no way around periodically exceeding that limit and thus dropping more frames after all.
The ideal solution would involve deep hooks into the WebP encoder that could dynamically adjust the compression algorithms depending on the remaining time in the current frame. An image compressor with real-time guarantees… sure sounds like an interesting research project.

In the end, letting players choose a fixed format and effort level remains the best option. However, they can only make an informed choice if they know the performance of all options relative to each other. And that's how we arrive at this new submenu:

These specific numbers I got on my now almost 7-year-old Intel Core i5 8400T are very peculiar. -z 0 gets quite close to the 16 ms we have per frame, but would still be too slow to reliably compress every gameplay situation without dropping frames. A 64-bit build would speed up -z 0 by 10%, -z 2 through -z 7 by 25%, -z 8 by 210% (!), and -z 9 by 60%. Linux users already enjoy these higher speeds, and the Windows build is just a few compiler settings away from matching them. 📝 Last time, the bitness argument was a lot more balanced, but WebP encoding performance presents the first compelling reason for going 64-bit.
Or we could always go multi-threaded, which already is a much more popular idea within the Seihou development Discord group.
Or I could investigate PNG after all to find out how exactly its encoding speed compares to WebP… :thonk:

But then, Ember2528 posted the encoding times he got on his new Ryzen 9 9950X3D:


Finally, you probably already noticed another small change in this build: The ReC98 push ID is now shown in the bottom-right corner of the title screen image, just below the original game version number. This was the one part of replay preparations that I wanted to get in sooner rather than later. Since the game binary and the data files can be updated or modded independently from each other, I'm going to tag future replays with both of their respective versions to guarantee reproducibility. Of course, newer builds should never introduce bugs that affect gameplay and desynchronize existing replays. But if they ever do, the included push ID allows hosting sites to remove any replays recorded on such a broken build from the official competition tier associated with a specific data file version.
As for rendering the push ID, it should obviously look similar to the VERSION 1.005 text above. We can find these glyphs in GRAPH.DAT file #0, but this particular text is actually baked into the main menu's background image, which explains why the decimal point glyph isn't part of that data file. The glyphs for 0-9 are also used in-game for the score popups, but the A-Z glyphs remain unused – so unused, in fact, that pbg didn't even leave any reference to them in the source code:

The unused 5×5 uppercase gradient font in GRAPH.DAT file #0

This means that the game provides us with all the glyphs we would need to display the ReC98 push ID. However:

So, all the glyphs next to the BUILD label actually come from the TrueType text renderer. The non-slashed zeroes immediately give this away, but exactly emulating the color gradient of the 0-9 glyphs makes MS Gothic blend in very well regardless:

Screenshot of the bottom-right corner of Shuusou Gyoku's title screen in the P0309 build, showing the new ReC98 build tag below the version number baked into the original title screen image

And that's all I've got for these very packed three pushes! In exchange, I'll reserve the next Shuusou Gyoku push for another round of maintenance and forward compatibility.
The new builds:

Next up: The long-awaited Windows 98 backport of our Shuusou Gyoku build! This has been in development for quite a while, so this should now be a matter of days rather than weeks.