Technical debt, part 5… and we only got TH05's stupidly optimized
.PI functions this time?
As far as actual progress is concerned, that is. In maintenance news
though, I was really hyped for the #include improvements I've
mentioned in 📝 the last post. The result: A
new x86real.h file, bundling all the declarations specific to
the 16-bit x86 Real Mode in a smaller file than Turbo C++'s own
DOS.H. After all, DOS is something else than the underlying
CPU. And while it didn't speed up build times quite as much as I had hoped,
it now clearly indicates the x86-specific parts of PC-98 Touhou code to
future port authors.
After another couple of improvements to parameter declaration in ASM land,
we get to TH05's .PI functions… and really, why did ZUN write all of
them in ASM? Why (re)declare all the necessary structures and data in
ASM land, when all these functions are merely one layer of abstraction
above master.lib, which does all the actual work?
I get that ZUN might have wanted masked blitting to be faster, which is
used for the fade-in effect seen during TH05's main menu animation and the
ending artwork. But, uh… he knew how to modify master.lib. In fact, he
did already modify the graph_pack_put_8() function
used for rendering a single .PI image row, to ignore master.lib's VRAM
clipping region. For this effect though, he first blits each row regularly
to the invisible 400th row of VRAM, and then does an EGC-accelerated
VRAM-to-VRAM blit of that row to its actual target position with the mask
enabled. It would have been way more efficient to add another version of
this function that takes a mask pattern. No amount of REP
MOVSW is going to change the fact that two VRAM writes per line are
slower than a single one. Not to mention that it doesn't justify writing
every other .PI function in ASM to go along with it…
This is where we also find the most hilarious aspect about this: For most
of ZUN's pointless micro-optimizations, you could have maybe made the
argument that they do save some CPU cycles here and there, and
therefore did something positive to the final, PC-98-exclusive result. But
some of the hand-written ASM here doesn't even constitute a
micro-optimization, because it's worse than what you would have got
out of even Turbo C++ 4.0J with its 80386 optimization flags!
At least it was possible to "decompile" 6 out of the 10 functions
here, making them easy to clean up for future modders and port authors.
Could have been 7 functions if I also decided to "decompile"
pi_free(), but all the C++ code is already surrounded by ASM,
resulting in 2 ASM translation units and 2 C++ translation units.
pi_free() would have needed a single translation unit by
itself, which wasn't worth it, given that I would have had to spell out
every single ASM instruction anyway.
There you go. What about this needed to be written in ASM?!?
The function calls between these small translation units even seemed to
glitch out TASM and the linker in the end, leading to one CALL
offset being weirdly shifted by 32 bytes. Usually, TLINK reports a fixup
overflow error when this happens, but this time it didn't, for some reason?
Mirroring the segment grouping in the affected translation unit did solve
the problem, and I already knew this, but only thought of it after spending
quite some RTFM time… during which I discovered the -lE
switch, which enables TLINK to use the expanded dictionaries in
Borland's .OBJ and .LIB files to speed up linking. That shaved off roughly
another second from the build time of the complete ReC98 repository. The
more you know… Binary blobs compiled with non-Borland tools would be the
only reason not to use this flag.
So, even more slowdown with this 5th dedicated push, since we've still only
repaid 41% of the technical debt in the SHARED segment so far.
Next up: Part 6, which hopefully manages to decompile the FM and SSG
channel animations in TH05's Music Room, and hopefully ends up being the
final one of the slow ones.
As expected, we've now got the TH04 and TH05 stage enemy structure,
finishing position independence for all big entity types. This one was
quite straightfoward, as the .STD scripting system is pretty simple.
Its most interesting aspect can be found in the way timing is handled. In
Windows Touhou, all .ECL script instructions come with a frame field that
defines when they are executed. In TH04's and TH05's .STD scripts, on the
other hand, it's up to each individual instruction to add a frame time
parameter, anywhere in its parameter list. This frame time defines for how
long this instruction should be repeatedly executed, before it manually
advances the instruction pointer to the next one. From what I've seen so
far, these instruction typically apply their effect on the first frame
they run on, and then do nothing for the remaining frames.
Oh, and you can't nest the LOOP instruction, since the enemy
structure only stores one single counter for the current loop iteration.
Just from the structure, the only innovation introduced by TH05 seems to
have been enemy subtypes. These can be used to parametrize scripts via
conditional jumps based on this value, as a first attempt at cutting down
the need to duplicate entire scripts for similar enemy behavior. And
thanks to TH05's favorable segment layout, this game's version of the
.STD enemy script interpreter is even immediately ready for decompilation,
in one single future push.
As far as I can tell, that now only leaves
.MPN file loading
player bomb animations
some structures specific to the Shinki and EX-Alice battles
plus some smaller things I've missed over the years
until TH05's MAIN.EXE is completely position-independent.
Which, however, won't be all it needs for that 100% PI rating on the front
page. And with that many false positives, it's quite easy to get lost with
immediately reverse-engineering everything around them. This time, the
rendering of the text dissolve circles, used for the stage and BGM title
popups, caught my eye… and since the high-level code to handle all of
that was near the end of a segment in both TH04 and TH05, I just decided
to immediately decompile it all. Like, how hard could it possibly be?
Sure, it needed another segment split, which was a bit harder due
to all the existing ASM referencing code in that segment, but certainly
Oh wait, this code depends on 9 other sets of identifiers that haven't
been declared in C land before, some of which require vast reorganizations
to bring them up to current consistency standards. Whoops! Good thing that
this is the part of the project I'm still offering for free…
Among the referenced functions was tiles_invalidate_around(),
which marks the stage background tiles within a rectangular area to be
redrawn this frame. And this one must have had the hardest function
signature to figure out in all of PC-98 Touhou, because it actually
seems impossible. Looking at all the ways the game passes the center
coordinate to this function, we have
X and Y as 16-bit integer literals, merged into a single
PUSH of a 32-bit immediate
X and Y calculated and pushed independently from each other
by-value copies of entire Point instances
Any single declaration would only lead to at most two of the three cases
generating the original instructions. No way around separately declaring
the function in every translation unit then, with the correct parameter
list for the respective calls. That's how ZUN must have also written it.
Oh well, we would have needed to do all of this some time. At least
there were quite a bit of insights to be gained from the actual
decompilation, where using const references actually made it
possible to turn quite a number of potentially ugly macros into wholesome
But still, TH04 and TH05 will come out of ReC98's decompilation as one big
mess. A lot of further manual decompilation and refactoring, beyond the
limits of the original binary, would be needed to make these games
portable to any non-PC-98, non-x86 architecture.
And yes, that includes IBM-compatible DOS – which, for some reason, a
number of people see as the obvious choice for a first system to port
PC-98 Touhou to. This will barely be easier. Sure, you'll save the effort
of decompiling all the remaining original ASM. But even with
master.lib's MASTER_DOSV setting, these games still very much
rely on PC-98 hardware, with corresponding assumptions all over ZUN's
code. You will need to provide abstractions for the PC-98's
superimposed text mode, the gaiji, and planar 4-bit color access in
general, exchanging the use of the PC-98's GRCG and EGC blitter chips with
something else. At that point, you might as well port the game to one
generic 640×400 framebuffer and away from the constraints of DOS,
resulting in that Doom source code-like situation which made that
game easily portable to every architecture to begin with. But ZUN just
wasn't a John Carmack, sorry.
Or what do I know. I've never programmed for IBM-compatible DOS, but maybe
ReC98's audience does include someone who is intimately familiar
with IBM-compatible DOS so that the constraints aren't much of an issue
for them? But even then, 16-bit Windows would make much more sense
as a first porting target if you don't want to bother with that
At least I won't have to look at TH04 and TH05 for quite a while now.
The delivery delays have made it obvious that
my life has become pretty busy again, probably until September. With a
total of 9 TH01 pushes from monthly subscriptions now waiting in the
backlog, the shop will stay closed until I've caught up with most of
these. Which I'm quite hyped for!