And so, the year unfortunately ended with yet another slow month. During the
MediaWiki upgrade, I was slowly decompiling the TH05 Sara fight on the side,
but stumbled over one interesting but high-maintenance detail there that
would really enhance her blog post. TH02 would need a lot of attention for
the basic rendering calls as well…
…so let's end the year with Shuusou Gyoku instead, looking at its most
critical issue in particular. As if that were the easy option here…
The game does not run properly on modern Windows systems due to its usage of
the ancient DirectDraw APIs, with issues ranging from unbearable slowdown to
glitched colors to the game not even starting at all. Thankfully, Shuusou
Gyoku is not the only ancient Windows game affected by these issues, and
people have developed a variety of generic DirectDraw wrappers and patches
for playing such games on modern systems. Out of all these, DDrawCompat is one of the
simpler solutions for Shuusou Gyoku in particular: Just drop its
ddraw proxy DLL into the game directory, and the game will run
as it's supposed to.
So let's just bundle that DLL with all my future Shuusou Gyoku releases
then? That would have been the quick and dirty option, coming with
Linux users might be annoyed by the potential need to configure a native
DLL override for ddraw.dll. It's not too much of an issue as we
could simply rename the DLL and replace the import with the new name.
However, doing that reproducibly would already involve changes to either the
DDrawCompat or Shuusou Gyoku build process.
Win32 API hooking is another potential point of failure in general,
requiring continual maintenance for new Windows versions. This is not even a
hypothetical concern: DDrawCompat does rely on particularly volatile Win32
API details, to the point that the recent Windows 11 22H2 update completely
broke it, causing a hang at startup that required a workaround.
But sure, it's still just a single third-party component. Keeping it up to
date doesn't sound too bad by itself…
…if DDrawCompat weren't evolving way beyond what we need to keep Shuusou
Gyoku running. Being a typical DirectDraw wrapper, it has always aimed to
solve all sorts of issues in old DirectDraw games. However, the latest
version, 0.4.0, has gone above and beyond in this regard, adding lots of
configuration options with default settings that actually
break Shuusou Gyoku.
To get a glimpse of how this is likely to play out, we only have to look at
the more mature DxWnd
project. In its expert mode, DxWnd features three rows of tabs, each packed
with checkboxes that toggle individual hacks, and most of these are
related to something that Shuusou Gyoku could be affected by. Imagine
checking a precise permutation of a three-digit number of checkboxes just to
keep an old game running at full speed on modern systems…
Finally, aesthetic and bloat considerations. If
📝 C++ fstreams were already too embarrassing
with the ~100 KB of bloat they add to the binary, a 565 KiB DLL is
even worse. And that's the old version 0.3.2 – version 0.4.0 comes in
at 2.43 MiB.
Fortunately, I had the budget to dig a bit deeper and figure out what
exactly DDrawCompat does to make Shuusou Gyoku work properly. Turns
out that among all the hooks and patches, the game only needs the most
central one: Enforcing a 32-bit display mode regardless of whatever lower
bit depth the game requests natively, combined with converting the game's
pixel buffer to 32-bit on the fly.
So does this mean that adding 32-bit to the game's list of supported bit
depths is everything we have to do?
Well, almost everything. Initially, this surprised me as well: With
all the if statements checking for precise bit depths, you
would think that supporting one more bit depth would be way harder in this
code base. As it turned out though, these conditional branches are not
really about 8-bit or 16-bit color for the most part, but instead
differentiate between two very distinct rendering approaches:
"8-bit" is a pure 2D mode with palettized colors,
while "16-bit" is a hybrid 2D/3D mode that uses Direct3D 2 on top of DirectDraw, with
3-channel RGB colors.
Consequently, most of these branches deal with differences between these two
approaches that couldn't be nicely abstracted away in pbg's renderer
interface: Specific palette changes that are exclusive to "8-bit" mode, or
certain entities and effects whose Direct3D draw calls in "16-bit" mode
require tailor-made approximations for the "8-bit" mode. Since our new
32-bit mode is equivalent to the 16-bit mode in all of these branches, I
only needed to replace the raw number comparisons with more meaningful
That only left a very small number of 2D raster effects that directly write
to or read from DirectDraw surface memory, and therefore do need to know the
bit size of each pixel. Thanks to std::variant and
std::visit(), adding 32-bit support becomes trivial here: By
rewriting the code in a generic manner that derives all offsets from the
template type, you only have to say hey,
I'd like to have 32-bit as well, and C++ will automatically
instantiate correct 32-bit variants of all bit depth-dependent code
There are only three features in the entire game that access pixel buffers
this way: a color key retrieval function, the lens ball animation on the
logo screen, and… the ending staff roll? Sure, the text sprites fade in and
out, but so does the picture next to it, using Direct3D alpha blending or
palette color ramping depending on the current rendering mode. Instead, the
only reason why these sprites directly access their pixel buffer is… an
unused and pretty wild spiral effect. 😮 It's still part of the code, and
only doesn't show up because the
parameters that control its timing were commented out before release:
Alright, 32-bit mode complete, let's set it as the default if possible… and
break compatibility to the original 秋霜CFG.DAT format in the
process? When validating this file, the original game only allows the
originally supported 8-bit or 16-bit modes. Setting the
BitDepth field to any other value causes the entire file
to be reset to its defaults, re-locking the Extra Stage in the process.
Introducing a backward-compatible version
system for 秋霜CFG.DAT was beyond the scope of this push.
Changing the validation to a per-field approach was a good small first step
to take though. The new build no longer validates the BitDepth
field against a fixed list, but against the actually supported bit depths on
your system, picking a different supported one if necessary. With the
original approach, this would have caused your entire configuration to fail
the validation check. Instead, you can now safely update to the new build
without losing your option settings, or your previously unlocked access to
the Extra Stage.
Side note: The validation limit for starting bombs is off by one, and the
one for starting lives check is off by two. By modifying
秋霜CFG.DAT, you could theoretically get new games to start with
7 lives and 3 bombs… if you then calculate a correct checksum for your
hacked config file, that is. 🧑💻
Interestingly, DirectDraw doesn't even indicate support for 8-bit or 16-bit
color on systems that are affected by the initially mentioned issues.
Therefore, these issues are not the fault of DirectDraw, but of
Shuusou Gyoku, as the original release requested a bit depth that it has
even verified to be unsupported. Unfortunately, Windows sides with
Sim City Shuusou Gyoku here: If you previously experimented with the
Windows app compatibility settings, you might have ended up with the
DWM8And16BitMitigation flag assigned to the full file path of
your Shuusou Gyoku executable in either
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\Layers, or
As the term mitigation suggests, these modes are (poorly) emulated,
which is exactly what causes the issues with this game in the first place.
Sure, this might be the lesser evil from the point of view of an operating
system: If you don't have the budget for a full-blown DDrawCompat-style
DirectDraw wrapper, you might consider it better for users to have the game
run poorly than have it fail at startup due to incorrect API usage.
Controlling this with a flag that sticks around for future runs of a binary
is definitely suboptimal though, especially given how hard it
is to programmatically remove this flag within the binary itself. It
only adds additional complexity to the ideal clean upgrade path.
So, make sure to check your registry and manually remove these flags for the
time being. Without them, the new Config → Graphic menu will
correctly prevent you from selecting anything else but 32-bit on modern
After all that, there was just enough time left in this push to implement
basic locale independence, as requested by the Seihou development
Discord group, without looking into automatic fixes for previous mojibake
filenames yet. Combining std::filesystem::path with the native
Win32 API should be straightforward and bloat-free, especially with all the
abstractions I've been building, right?
Well, turns out that std::filesystem::path does not
actually meet my expectations. At least as long as it's not
constexpr-enabled, because you still get the unfortunate
conversion from narrow to wide encoding at runtime, even for globals with
static storage duration. That brings us back to writing our path abstraction
in terms of the regular std::string and
std::wstring containers, which at least allow us to enforce the
respective encoding at compile time. Even std::string_view only
adds to the complexity here, as its strings are never inherently
null-terminated, which is required by both the POSIX and Win32 APIs. Not to
mention dynamic filenames: C++20's std::format() would be the
obvious idiomatic choice here, but using it almost doubles the size
of the compiled binary… 🤮
In the end, the most bloat-free way of implementing C++ file I/O in 2023 is
still the same as it was 30 years ago: Call system APIs, roll a custom
abstraction that conditionally uses the L prefix, and pass
around raw pointers. And if you need a dynamic filename, just write the
dynamic characters into arrays at fixed positions. Just as PC-98 Touhou used
Oh, and the game's window also uses a Unicode title bar now.
And that's it for this push! Make sure to rename your configuration
(秋霜CFG.DAT), score (秋霜SC.DAT), and replay
(秋霜りぷ*.DAT) filenames if you were previously running the
game on a non-Japanese locale, and then grab the new build:
Next up: Starting the new year with all my plans hopefully working out for
once. TH05 Sara very soon, ZMBV code review afterward, low-hanging fruit of
the TH01 Anniversary Edition after that, and then kicking off TH02 with a
bunch of low-level blitting code.
Thanks to handlerug for
implementing and PR'ing the feature in a very clean way. That makes at least
two people I know who wanted to see feed support, so there are probably
a few more out there.
So, Shuusou Gyoku. pbg released the original source code for the first two
Seihou games back in February 2019, but notably removed the crucial
decompression code for the original packfiles due to… various unspecified
reasons, considerations, and implications. This vague
language and subsequent rejection of a pull request
to add these features back in were probably the main reasons why no one
has publicly done anything with this codebase since.
The only other fork I know about is Priw8's private fork from 2020, but only
informed me about it shortly after this push was funded. Both of them
might also contribute some features to my fork in the future if their time
In this fork, Priw8 replaced packfile decompression with raw reads from
directories with the pre-extracted contents of all the .DAT files. This
works for playing the game, but there are actually two more things that
require the original packfile code:
High scores are stored as a bitstream with every variable separated by
an alternating 0 or 1 bit, using the same bit-level access functions as the
packfile reader. That's a quite… unique form of obfuscation: It requires way
too much code to read and write the format, and doesn't even obfuscate the
data that well because you can still see clear patterns when opening
these scorefiles in a hex editor.
Replays are 2-"file" archives compressed using the same algorithm as the
packfile. The first "file" contains metadata like the shot type, stage, and
RNG seed, and the second one contains the input state for every frame.
We can surely implement our own simple and uncompressed formats for these
things, but it's not the idea to build all future Shuusou Gyoku features on
top of a replay-incompatible fork. So, what do we do? On the one hand, pbg
expressed the clear wish to not include data reverse-engineered from the
original binary. On the other hand, he released the code under the MIT
license, which allows us to modify the code and distribute the results in
any way we wish.
So, let's meet in the middle, and go for a clean-room implementation of the
missing features as indicated by their usage, without looking at either the
original binary or wangqr's reverse-engineered code.
With incremental rebuilds being broken in the latest Visual Studio project
files as well, it made sense to start from scratch on pbg's last commit. Of
course, I can't pass up a chance to use
📝 Tup, my favorite build system for every
project I'm the main developer of. It might not fit Shuusou Gyoku as well as
it fits ReC98, but let's see whether it would be reasonable at all…
… and it's actually not too bad! Modern Visual Studio makes this a bit
harder than it should be with all the intermediate build artifacts you have
to keep track of. In the end though, it's still only 70
lines of Lua to have a nice abstraction for both Debug and Release
builds. With this layer underneath, the actual
Shuusou Gyoku-specific part can be expressed as succinctly as in any
other modern build system, while still making every compiler flag explicit.
It might be slightly slower than a traditional .vcxproj build
due to launching
one cl.exe process per translation unit, but the result is
way more reliable and trustworthy compared to anything that involves Visual
Studio project files. This simplicity paves the way for expanding the build
process to multiple steps, and doing all the static checking on translation
strings that I never got to do for thcrap-based patches. Heck, I might even
compile all future translations directly into the binary…
Every C++ build system will invariably be hated by someone, so I'd
say that your goal should always be to simplify the actually important parts
of your build enough to allow everyone else to easily adapt it to their
favorite system. This Tupfile definitely does a better job there than your
average .vcxproj file – but if you still want such a thing (or,
gasp, 🤮 CMake project files 🤮) for better Visual Studio IDE
integration, you should have no problem generating them for yourself.
There might still be a point in doing that because that's the one part that
unfortunately sucks about this approach. Visual Studio is horribly broken
for any nonstandard C++ project even in 2022:
Makefile projects can be nicely integrated with Debug and Release
configurations, but setting a later C++ language standard requires dumb
.vcxproj hacks that don't even work properly anymore.
Folder projects are ridiculously ugly: The Build toolbar is permanently
grayed out even if you configured a build task. For some reason,
configuring these tasks merely adds one additional element to a 9-element
context menu in the Solution Explorer. Also, why does the big IDE use a
different JSON schema than the perfectly functional and adequate one from
Visual Studio Code?
In both cases, IntelliSense doesn't work properly at all even if it
appears to be configured correctly, and Tup's dependency tracking appeared
to be weirdly cut off for the very final .PDB file. Interestingly though,
using the big Visual Studio IDE for just debugging a binary via
devenv bin/GIAN07.exe suddenly eliminates all the IntelliSense
issues. Looks like there's a lot of essential information stored in the .PDB
files that Visual Studio just refuses to read in any other context.
But now compare that to Visual Studio Code: Open it from the x64_x86
Cross Tools Command Prompt via code ., launch a build or
debug task, or browse the code with perfect IntelliSense. Three small
configuration files and everything just works – heck, you even get the Tup
progress bar in the terminal. It might be Electron bloatware and horribly
slow at times, but Visual Studio Code has long outperformed regular Visual
Studio in terms of non-debug functionality.
On to the compression algorithm then… and it's just textbook LZSS,
with 13 bits for the offset of a back-reference and 4 bits for its length?
Hardly a trade secret there. The hard parts there all come from unexpected
inefficiencies in the bitstream format:
Encoding back-references as offsets into an 8 KiB ring buffer dictionary
means that the most straightforward implementation actually needs an 8 KiB
array for the LZSS sliding window. This could have easily been done with
zero additional memory if the offset was encoded as the difference to the
current byte instead.
The packfile format stores the uncompressed size of every file in its
header, which is a good thing because you want to know in advance how much
heap memory to allocate for a specific file. Nevertheless, the original game
only stops reading bits from the packfile once it encountered a
back-reference with an offset of 0. This means that the compressor not only
has to write this technically unneeded back-reference to the end of the
compressed bitstream, but also ignore any potential other longest
back-reference with an offset of 0 within the file. The latter can
easily happen with a ring buffer dictionary.
The original game used a single BIT_DEVICE class with mode
flags for every combination of reading and writing memory buffers and
on-disk files. Since that would have necessitated a lot of error checking
for all (pseudo-)methods of this class, I wrote one dedicated small class
for each one of these permutations instead. To further emphasize the
clean-room property of this code, these use modern C++ memory ownership
features: std::unique_ptr for the fixed-size read-only buffers
we get from packfiles, std::vector for the newly compressed
buffers where we don't know the size in advance, and std::span
for a borrowed reference to an immutable region of memory that we want to
treat as a bitstream. Definitely better than using the native Win32
LocalAlloc() and LocalFree() allocator, especially
if we want to port the game away from Windows one day.
One feature I didn't use though: C++ fstreams, because those are trash.
These days, they would seem to be the natural
choice with the new std::filesystem::path type from C++17:
Correctly constructed, you can pass that type to an fstream constructor and
gain both locale independence on Windows and portability to
everything else, without writing any Windows-specific UTF-16 code. But even
in a Release build, fstreams add ~100 KB of locale-related bloat to the .EXE
which adds no value for just reading binary files. That's just too
embarrassing if you look at how much space the rest of the game takes up.
Writing your own platform layer that calls the Win32
CreateFileW(), ReadFile(), and
WriteFile() API functions is apparently still the way to go
even in 2022. And with std::filesystem::path still being a
welcome addition to C++, it's not too much code to write either.
This gets us file format compatibility with the original release… and a
crash as soon as the ending starts, but only in Release mode? As it turns
out, this crash is caused by an
access bug that was present even in the original game, and only turned
into a crash now because the optimizer in modern Visual Studio versions
reorders static data. As a result, the 6-element pFontInfo
array got placed in front of an ECL-related counter variable that then got
corrupted by the write to the 7th element, which subsequently
crashed the game with a read access to previously deallocated danmaku script
data. That just goes to show that these technical bugs are important
and worth fixing even if they don't cause issues in the original game. Who
knows how many of these will turn into crashes once we get to porting PC-98
So here we go, a new build of Shuusou Gyoku, compiled with Visual Studio
2022, and compatible with all original data formats:
Inside the regular Shuusou Gyoku installation directory, this binary works
as a full-fledged drop-in replacement for the original
秋霜玉.exe. It still has all of the original binary's problems
Separate Japanese locale emulation is still needed to correctly refer to
the original names of the configuration (秋霜CFG.DAT), score
(秋霜SC.DAT), and replay (秋霜りぷ*.DAT) files.
It's also required for the ending text to not render as mojibake.
Running the game at full speed and without graphical glitches on modern
Windows still requires a separate DirectDraw patch such as DDrawCompat. To
eliminate any remaining flickering, configure the game to use 16-bit
graphics in the Config → Graphic menu.
As well as some of its own:
The original screenshot feature is still missing, as it also wasn't part
of pbg's released source code.
So all in all, it's a strict downgrade at this point in time.
And more of a symbol that we can now start
doing actual work on this game. Seihou has been a fun change of pace, and I
hope that I get to do more work on the series. There is quite a lot to be
done with Shuusou Gyoku alone, and the 21 GitHub issues I've opened
are probably only scratching the surface.
However, all the required research for this one consumed more like 1⅔
pushes. Despite just one push being funded, it wouldn't have made sense to
release the commits or this binary in any earlier state. To repay this debt,
I'm going to put the next for Seihou towards the
small code maintenance and performance tasks that I usually do for free,
before doing any more feature and bugfix work. Next up: Improving video
playback on the blog, and maybe delivering some microtransaction work on the