ReC98 would highly benefit from a build server – both in order to
immediately spot issues like this one, and as a service for modders.
Even more so than the usual open-source project of its size, I would say.
But that might be exactly
because it doesn't seem like something you can trivially outsource
to one of the big CI providers for open-source projects, and quickly set
it up with a few lines of YAML.
That might still work in the beginning, and we might get by with a regular
64-bit Windows 10 and DOSBox running the exact build tools from the DevKit.
Ideally, though, such a server should really run the optimal configuration
of a 32-bit Windows 10, allowing both the 32-bit and the 16-bit build step
to run natively, which already is something that no popular CI service out
there offers. Then, we'd optimally expand to Linux, every other Windows
version down to 95, emulated PC-98 systems, other TASM versions… yeah, it'd
be a lot. An experimental project all on its own, with additional hosting
costs and probably diminishing returns, the more it expands…
I've added it as a category to the order form, let's see how much interest
there is once the store reopens (which will be at the beginning of May, at
the latest). That aside, it would 📝 also be
a great project for outside contributors!
So, technical debt, part 8… and right away, we're faced with TH03's
low-level input function, which
📝 once📝 again📝 insists on being word-aligned in a way we
can't fake without duplicating translation units.
Being undecompilable isn't exactly the best property for a function that
has been interesting to modders in the past: In 2018,
spaztron64 created an
ASM-level mod that hardcoded more ergonomic key bindings for human-vs-human
multiplayer mode: 2021-04-04-TH03-WASD-2player.zip
However, this remapping attempt remained quite limited, since we hadn't
(and still haven't) reached full position independence for TH03 yet.
There's quite some potential for size optimizations in this function, which
would allow more BIOS key groups to already be used right now, but it's not
all that obvious to modders who aren't intimately familiar with x86 ASM.
Therefore, I really wouldn't want to keep such a long and important
function in ASM if we don't absolutely have to…
… and apparently, that's all the motivation I needed? So I took the risk,
and spent the first half of this push on reverse-engineering
TCC.EXE, to hopefully find a way to get word-aligned code
segments out of Turbo C++ after all.
And there is! The -WX option, used for creating
applications, messes up all sorts of code generation aspects in weird
ways, but does in fact mark the code segment as word-aligned. We can
consider ourselves quite lucky that we get to use Turbo C++ 4.0, because
this feature isn't available in any previous version of Borland's C++
That allowed us to restore all the decompilations I previously threw away…
well, two of the three, that lookup table generator was too much of a mess
in C. But what an abuse this is. The
subtly different code generation has basically required one creative
workaround per usage of -WX. For example, enabling that option
causes the regular PUSH BP and POP BP prolog and
epilog instructions to be wrapped with INC BP and
DEC BP, for some reason:
inc bp ; ???
mov bp, sp
; [… function code …]
dec bp ; ???
Luckily again, all the functions that currently require -WX
don't set up a stack frame and don't take any parameters.
While this hasn't directly been an issue so far, it's been pretty
close: snd_se_reset(void) is one of the functions that require
word alignment. Previously, it shared a translation unit with the
immediately following snd_se_play(int new_se), which does take
a parameter, and therefore would have had its prolog and epilog code messed
up by -WX.
Since the latter function has a consistent (and thus, fakeable) alignment,
I simply split that code segment into two, with a new -WX
translation unit for just snd_se_reset(void). Problem solved –
after all, two C++ translation units are still better than one ASM
translation unit. Especially with all the
previous #include improvements.
The rest was more of the usual, getting us 74% done with repaying the
technical debt in the SHARED segment. A lot of the remaining
26% is TH04 needing to catch up with TH03 and TH05, which takes
comparatively little time. With some good luck, we might get this
done within the next push… that is, if we aren't confronted with all too
many more disgusting decompilations, like the two functions that ended this
If we are, we might be needing 10 pushes to complete this after all, but
that piece of research was definitely worth the delay. Next up: One more of
Alright, tooling and technical debt. Shouldn't be really much to talk
about… oh, wait, this is still ReC98
For the tooling part, I finished up the remaining ergonomics and error
handling for the
📝 sprite converter that Jonathan Campbell contributed two months ago.
While I familiarized myself with the tool, I've actually ran into some
unreported errors myself, so this was sort of important to me. Still got
no command-line help in there, but the error messages can now do that job
probably even better, since we would have had to write them anyway.
So, what's up with the technical debt then? Well, by now we've accumulated
quite a number of 📝 ASM code slices that
need to be either decompiled or clearly marked as undecompilable. Since we
define those slices as "already reverse-engineered", that decision won't
affect the numbers on the front page at all. But for a complete
decompilation, we'd still have to do this someday. So, rather than
incorporating this work into pushes that were purchased with the
expectation of measurable progress in a certain area, let's take the
"anything goes" pushes, and focus entirely on that during them.
The second code segment seemed like the best place to start with this,
since it affects the largest number of games simultaneously. Starting with
TH02, this segment contains a set of random "core" functions needed by the
binary. Image formats, sounds, input, math, it's all there in some
capacity. You could maybe call it all "libzun" or something like
that? But for the time being, I simply went with the obvious name,
seg2. Maybe I'll come up with something more convincing in
Oh, but wait, why were we assembling all the previous undecompilable ASM
translation units in the 16-bit build part? By moving those to the 32-bit
part, we don't even need a 16-bit TASM in our list of dependencies, as
long as our build process is not fully 16-bit.
And with that, ReC98 now also builds on Windows 95, and thus, every 32-bit
Windows version. 🎉 Which is certainly the most user-visible improvement
in all of these two pushes.
Back in 2015, I already decompiled all of TH02's seg2
functions. As suggested by the Borland compiler, I tried to follow a "one
translation unit per segment" layout, bundling the binary-specific
contents via #include. In the end, it required two
translation units – and that was even after manually inserting the
original padding bytes via #pragma codestring… yuck. But it
worked, compiled, and kept the linker's job (and, by extension,
segmentation worries) to a minimum. And as long as it all matched the
original binaries, it still counted as a valid reconstruction of ZUN's
However, that idea ultimately falls apart once TH03 starts mixing
undecompilable ASM code inbetween C functions. Now, we officially have no
choice but to use multiple C and ASM translation units, with maybe only
just one or two #includes in them…
…or we finally start reconstructing the actual seg2 library,
turning every sequence of related functions into its own translation unit.
This way, we can simply reuse the once-compiled .OBJ files for all the
binaries those functions appear in, without requiring that additional
layer of translation units mirroring the original segmentation.
The best example for this is
almost undecompilable function that generates a lookup table for
horizontally flipping 8 1bpp pixels. It's part of every binary since
TH03, but only used in that game. With the previous approach, we would
have had to add 9 C translation units, which would all have just
#included that one file. Now, we simply put the .OBJ file
into the correct place on the linker command line, as soon as we can.
💡 And suddenly, the linker just inserts the correct padding bytes itself.
The most immediate gains there also happened to come from TH03. Which is
also where we did get some tiny RE% and PI% gains out of this after
all, by reverse-engineering some of its sprite blitting setup code. Sure,
I should have done even more RE here, to also cover those 5 functions at
the end of code segment #2 in TH03's MAIN.EXE that were in
front of a number of library functions I already covered in this push. But
let's leave that to an actual RE push 😛
All in all though, I was just getting started with this; the real
gains in terms of removed ASM files are still to come. But in the
meantime, the funding situation has become even better in terms of
allowing me to focus on things nobody asked for. 🙂 So here's a slightly
better idea: Instead of spending two more pushes on this, let's shoot for
TH05 MAINE.EXE position independence next. If I manage to get
it done, we'll have a 100% position-independent TH05 by the time
-Tom- finishes his MAIN.EXE PI demo, rather
than the 94% we'd get from just MAIN.EXE. That's bound to
make a much better impression on all the people who will then
(re-)discover the project.
(tl;dr: ReC98 has switched to Tup for
the 32-bit build. You probably want to get
💾 this build of Tup, and put it somewhere in your
PATH. It's optional, and always will be, but highly
P0001! Reserved for the delivery of the very first financial contribution
I've ever received for ReC98, back in January 2018. GhostPhanom
requested the exact opposite of immediate results, which motivated me to
go on quite a passionate quest for the perfect ReC98 build system. A quest
that went way beyond the crowdfunding…
Makefiles are a decent idea in theory: Specify the targets to generate,
the source files these targets depend on and are generated from, and the
rules to do the generating, with some helpful shorthand syntax. Then, you
have a build dependency graph, and your make tool of choice
can provide minimal rebuilds of only the targets whose sources changed
since the last make call. But, uh… wait, this is C/C++ we're
talking about, and doesn't pretty much every source file come with a
second set of dependent source files, namely, every single
#include in the source file itself? Do we really
have to duplicate all these inside the Makefile, and keep it in sync with the source file? 🙄
This fact alone means that Makefiles are inherently unsuited for
any language with an #include feature… that is, pretty
much every language out there. Not to mention other aspects like changes
to the compilation command lines, or the build rules themselves, all of
which require metadata of the previous build to be persistently stored in
some way. I have no idea why such a trash technology is even touted as a
viable build tool for code.
So, I decided to just
write my own build system, tailor-made for the needs of ReC98's 16-bit
build process, and combining a number of experimental ideas. Which is
still not quite bug-free and ready for public use, given that the
entire past year has kept me busy with actual tangible RE and PI progress.
What did finally become ready, however, is the improvement for the
32-bit build part, and that's what we've got here.
💭 Now, if only there was a build system that would perfectly track
dependencies of any compiler it calls, by injecting code and
hooking file opening syscalls. It'd be completely unrealistic for it to
also run on DOS (and we probably don't want to traverse a graph database
in a cycle-limited DOSBox), but it would be perfect for our 32-bit build
part, as long as that one still exists.
Sure, it might seem really minor to worry about not unconditionally
rebuilding all 32-bit .asm files, which just takes a couple
of seconds anyway. But minimal rebuilds in the 32-bit part also provide
the foundation for minimal rebuilds in the 16-bit part – and those
TLINK invocations do take quite some time after all.
Using Tup for ReC98 was an idea that dated back to January 2017. Back
then, I already opened
the pull request with a fix to allow Tup to work together with 32-bit
TASM. As much as I love Tup though, the fact that it only worked on
64-bit Windows ≥Vista would have meant that we had to exchange perfect
dependency tracking for the ability to build on 32-bit and older Windows
versions at all. For a project that relies on DOS compilers, this
would have been exactly the wrong trade-off to make.
What's worse though: TLINK fails to run on modern 32-bit
Windows with Loader error (0000) : Unrecognized Error.
Therefore, the set of systems that Tup runs on, and the set of systems
that can actually compile ReC98's 16-bit build part natively, would have
been exactly disjoint, with no OS getting to use both at the same time.
So I've kept using Tup for only my own development, but indefinitely
shelved the idea of making it the official build system, due to those
drawbacks. Recently though, it all came together:
The tup generate sub-command can generate a
.bat file that does a full dumb rebuild of everything, which
can serve as a fallback option for systems that can't run Tup. All we have
to do is to commit that .bat file to the ReC98 Git repository
as well, and tell build32b.bat to fall back on that if Tup
can't be run. That alone would have given us the benefits of Tup without
being worse than the current dumb build process.
In the meantime, other contributors improved Tup's own build process to
the point where 32-bit builds were simple enough to accomplish from the
comfort of a WSL terminal.
Two commits of mine
later, and 32-bit Windows Tup was fully functional. Another one later,
and 32-bit Windows Tup even gained one potential advantage over its 64-bit
counterpart. Since it only has to support DLL injection into 32-bit
programs, it doesn't need a separate 32-bit binary for retrieving function
pointers to the 32-bit version of Windows' DLL loading syscalls. Weirdly
enough, Windows Defender on current Windows 10 falsely flags that binary as
malware, despite it doing nothing but printing those pointer values to
I've also added it to the DevKit, for any newcomers to ReC98.
After the switch to Tup and the fallback option, I extensively tested
building ReC98 on all operating systems I had lying around. And holy cow,
so much in that build was broken beyond belief. In the end, the solution
involved just fully rebuilding the entire 16-bit part by default.
Which, of course, nullifies any of the
advantages we might have gotten from a Makefile in the first place, due to
just how unreliable they are. If you had problems building ReC98 in the
past, try again now!
And sure, it would certainly be possible to also get Tup working on
Windows ≤XP, or 9x even. But I leave that to all those tinkerers out there
who are actually motivated to keep those OSes alive. My work here is
done – we now have a build process that is optimal on 32-bit
Windows ≧Vista, and still functional and reliable on 64-bit
Windows, Linux, and everything down to Windows 98 SE, and therefore also
real PC-98 hardware. Pretty good, I'd say.
(If it weren't for that weird crash of the 16-bit TASM.EXE in
that Windows 95 command prompt I've tried it in, it would also work on
that OS. Probably just a misconfiguration on my part?)
Now, it might look like a waste of time to improve a 32-bit build part
that won't even exist anymore once this project is done. However, a fully
16-bit DOS build will only make sense after
master.lib has been turned into a proper library, linked in by
TLINK rather than #included in the big .ASM
This affects all games. If master.lib's data was consistently placed at
the beginning or end of each data segment, this would be no big deal, but
it's placed somewhere else in every binary.
So, this will only make sense sometime around 90% overall PI, and maybe
~50% RE in each game. Which is something else than 50% overall –
especially since it includes TH02, the objectively worst Touhou game,
which hasn't received any dedicated funding ever.
Then, it will probably still require a couple of dedicated pushes to
move all the remaining data to C land.
Oh, and my 16-bit build system project also needs to be done before,
because, again, Makefiles are trash and we shouldn't rely on them even
And who knows whether this project will get funded for that long. So yeah,
the 32-bit build part will stay with us for quite some more time, and for
all upcoming PI milestones. And with the current build process, it's
pretty much the most minor among all the minor issues I can think of.
Let's all enjoy the performance of a 32-bit build while we can 🙂
Next up: Paying some technical debt while keeping the RE% and PI% in place.
TH01 pellets are coming up next, and for the first time, we'll have the
chance to move hardcoded sprite data from ASM land to C land. As it would
turn out, bad luck with the 2-byte alignment at the end of
REIIDEN.EXE's data segment pretty much forces us to declare
TH01's pellet sprites in C if we want to decompile the final few pellet
functions without ugly workarounds for the float literals there. And while
I could have just converted them into a C array and called it a day, it
did raise the question of when we are going to do this The Right And
Moddable Way, by auto-converting actual image files into ASM or C arrays
during the build process. These arrays are even more annoying to edit in
C, after all – unlike TASM, the old C++ we have to work with doesn't
support binary number literals, only hexadecimal or, gasp, octal.
Without the explicit funding for such a converter,
I reached out to
GitHub, asking backers and outside contributors whether they'd be in
favor of it. As something that requires no RE skills and collides with
nothing else, it would be a perfect task for C/C++ coders who want to
support ReC98 with something other than money.
And surprisingly, those still exist!
Jonathan Campbell, of
went ahead and implemented all the required functionality, within just a
few days. Thanks again! The result is probably a lot more portable than it
would have been if I had written it. Which is pretty relevant for future
port authors – any additional tooling we write ourselves should not
add to the list of problems they'll have to worry about.
Right now, all of the sprites are #included from the big ASM
dump files, which means that they have to be converted before those files
are assembled during the 32-bit build part. We could have introduced a
third distinct build step there, perhaps even a 16-bit one so that we can
use Turbo C++ 4.0J to also compile the converter… However, the more
reasonable option was to do this at the beginning of the 32-bit build
step, and add a 32-bit Windows C++ compiler to the list of tools required
for ReC98's build process.
And the best choice for ReC98 is, in fact… 🥁… the 20-year-old Borland C++
5.5 freeware release.
See the README for a lengthy justification, as well as
So yes, all sprites mentioned in the GitHub issue can now be modded by
simply editing .BMP files, using an image editor of your choice. 🖌
And now that that's dealt with, it's finally time for more actual
progress! TH01 pellets coming tomorrow.