Whoops, the build was broken again? Since
P0127 from
mid-November 2020, on TASM32 version 5.3, which also happens to be the
one in the DevKit… That version changed the alignment for the default
segments of certain memory models when requesting .386
support. And since redefining segment alignment apparently is highly
illegal and absolutely has to be a build error, some of the stand-alone
.ASM translation units didn't assemble anymore on this version. I've only
spotted this on my own because I casually compiled ReC98 somewhere else –
on my development system, I happened to have TASM32 version 5.0 in the
PATH during all this time.
At least this was a good occasion to
get rid of some
weird segment alignment workarounds from 2015, and replace them with the
superior convention of using the USE16 modifier for the
.MODEL directive.
ReC98 would highly benefit from a build server – both in order to
immediately spot issues like this one, and as a service for modders.
Even more so than the usual open-source project of its size, I would say.
But that might be exactly
because it doesn't seem like something you can trivially outsource
to one of the big CI providers for open-source projects, and quickly set
it up with a few lines of YAML.
That might still work in the beginning, and we might get by with a regular
64-bit Windows 10 and DOSBox running the exact build tools from the DevKit.
Ideally, though, such a server should really run the optimal configuration
of a 32-bit Windows 10, allowing both the 32-bit and the 16-bit build step
to run natively, which already is something that no popular CI service out
there offers. Then, we'd optimally expand to Linux, every other Windows
version down to 95, emulated PC-98 systems, other TASM versions… yeah, it'd
be a lot. An experimental project all on its own, with additional hosting
costs and probably diminishing returns, the more it expands…
I've added it as a category to the order form, let's see how much interest
there is once the store reopens (which will be at the beginning of May, at
the latest). That aside, it would 📝 also be
a great project for outside contributors!
So, technical debt, part 8… and right away, we're faced with TH03's
low-level input function, which
📝 once📝 again📝 insists on being word-aligned in a way we
can't fake without duplicating translation units.
Being undecompilable isn't exactly the best property for a function that
has been interesting to modders in the past: In 2018,
spaztron64 created an
ASM-level mod that hardcoded more ergonomic key bindings for human-vs-human
multiplayer mode: 2021-04-04-TH03-WASD-2player.zip
However, this remapping attempt remained quite limited, since we hadn't
(and still haven't) reached full position independence for TH03 yet.
There's quite some potential for size optimizations in this function, which
would allow more BIOS key groups to already be used right now, but it's not
all that obvious to modders who aren't intimately familiar with x86 ASM.
Therefore, I really wouldn't want to keep such a long and important
function in ASM if we don't absolutely have to…
… and apparently, that's all the motivation I needed? So I took the risk,
and spent the first half of this push on reverse-engineering
TCC.EXE, to hopefully find a way to get word-aligned code
segments out of Turbo C++ after all.
And there is! The -WX option, used for creating
DPMI
applications, messes up all sorts of code generation aspects in weird
ways, but does in fact mark the code segment as word-aligned. We can
consider ourselves quite lucky that we get to use Turbo C++ 4.0, because
this feature isn't available in any previous version of Borland's C++
compilers.
That allowed us to restore all the decompilations I previously threw away…
well, two of the three, that lookup table generator was too much of a mess
in C. But what an abuse this is. The
subtly different code generation has basically required one creative
workaround per usage of -WX. For example, enabling that option
causes the regular PUSH BP and POP BP prolog and
epilog instructions to be wrapped with INC BP and
DEC BP, for some reason:
a_function_compiled_with_wx proc
inc bp ; ???
push bp
mov bp, sp
; [… function code …]
pop bp
dec bp ; ???
ret
a_function_compiled_with_wx endp
Luckily again, all the functions that currently require -WX
don't set up a stack frame and don't take any parameters.
While this hasn't directly been an issue so far, it's been pretty
close: snd_se_reset(void) is one of the functions that require
word alignment. Previously, it shared a translation unit with the
immediately following snd_se_play(int new_se), which does take
a parameter, and therefore would have had its prolog and epilog code messed
up by -WX.
Since the latter function has a consistent (and thus, fakeable) alignment,
I simply split that code segment into two, with a new -WX
translation unit for just snd_se_reset(void). Problem solved –
after all, two C++ translation units are still better than one ASM
translation unit. Especially with all the
previous #include improvements.
The rest was more of the usual, getting us 74% done with repaying the
technical debt in the SHARED segment. A lot of the remaining
26% is TH04 needing to catch up with TH03 and TH05, which takes
comparatively little time. With some good luck, we might get this
done within the next push… that is, if we aren't confronted with all too
many more disgusting decompilations, like the two functions that ended this
push.
If we are, we might be needing 10 pushes to complete this after all, but
that piece of research was definitely worth the delay. Next up: One more of
these.
(tl;dr: ReC98 has switched to Tup for
the 32-bit build. You probably want to get
💾 this build of Tup, and put it somewhere in your
PATH. It's optional, and always will be, but highly
recommended.)
P0001! Reserved for the delivery of the very first financial contribution
I've ever received for ReC98, back in January 2018. GhostPhanom
requested the exact opposite of immediate results, which motivated me to
go on quite a passionate quest for the perfect ReC98 build system. A quest
that went way beyond the crowdfunding…
Makefiles are a decent idea in theory: Specify the targets to generate,
the source files these targets depend on and are generated from, and the
rules to do the generating, with some helpful shorthand syntax. Then, you
have a build dependency graph, and your make tool of choice
can provide minimal rebuilds of only the targets whose sources changed
since the last make call. But, uh… wait, this is C/C++ we're
talking about, and doesn't pretty much every source file come with a
second set of dependent source files, namely, every single
#include in the source file itself? Do we really
have to duplicate all these inside the Makefile, and keep it in sync with the source file? 🙄
This fact alone means that Makefiles are inherently unsuited for
any language with an #include feature… that is, pretty
much every language out there. Not to mention other aspects like changes
to the compilation command lines, or the build rules themselves, all of
which require metadata of the previous build to be persistently stored in
some way. I have no idea why such a trash technology is even touted as a
viable build tool for code.
So, I decided to just
write my own build system, tailor-made for the needs of ReC98's 16-bit
build process, and combining a number of experimental ideas. Which is
still not quite bug-free and ready for public use, given that the
entire past year has kept me busy with actual tangible RE and PI progress.
What did finally become ready, however, is the improvement for the
32-bit build part, and that's what we've got here.
💭 Now, if only there was a build system that would perfectly track
dependencies of any compiler it calls, by injecting code and
hooking file opening syscalls. It'd be completely unrealistic for it to
also run on DOS (and we probably don't want to traverse a graph database
in a cycle-limited DOSBox), but it would be perfect for our 32-bit build
part, as long as that one still exists.
Sure, it might seem really minor to worry about not unconditionally
rebuilding all 32-bit .asm files, which just takes a couple
of seconds anyway. But minimal rebuilds in the 32-bit part also provide
the foundation for minimal rebuilds in the 16-bit part – and those
TLINK invocations do take quite some time after all.
Using Tup for ReC98 was an idea that dated back to January 2017. Back
then, I already opened
the pull request with a fix to allow Tup to work together with 32-bit
TASM. As much as I love Tup though, the fact that it only worked on
64-bit Windows ≥Vista would have meant that we had to exchange perfect
dependency tracking for the ability to build on 32-bit and older Windows
versions at all. For a project that relies on DOS compilers, this
would have been exactly the wrong trade-off to make.
What's worse though: TLINK fails to run on modern 32-bit
Windows with Loader error (0000) : Unrecognized Error.
Therefore, the set of systems that Tup runs on, and the set of systems
that can actually compile ReC98's 16-bit build part natively, would have
been exactly disjoint, with no OS getting to use both at the same time.
So I've kept using Tup for only my own development, but indefinitely
shelved the idea of making it the official build system, due to those
drawbacks. Recently though, it all came together:
The tup generate sub-command can generate a
.bat file that does a full dumb rebuild of everything, which
can serve as a fallback option for systems that can't run Tup. All we have
to do is to commit that .bat file to the ReC98 Git repository
as well, and tell build32b.bat to fall back on that if Tup
can't be run. That alone would have given us the benefits of Tup without
being worse than the current dumb build process.
In the meantime, other contributors improved Tup's own build process to
the point where 32-bit builds were simple enough to accomplish from the
comfort of a WSL terminal.
Two commits of mine
later, and 32-bit Windows Tup was fully functional. Another one later,
and 32-bit Windows Tup even gained one potential advantage over its 64-bit
counterpart. Since it only has to support DLL injection into 32-bit
programs, it doesn't need a separate 32-bit binary for retrieving function
pointers to the 32-bit version of Windows' DLL loading syscalls. Weirdly
enough, Windows Defender on current Windows 10 falsely flags that binary as
malware, despite it doing nothing but printing those pointer values to
stdout. 🤷
I've also added it to the DevKit, for any newcomers to ReC98.
After the switch to Tup and the fallback option, I extensively tested
building ReC98 on all operating systems I had lying around. And holy cow,
so much in that build was broken beyond belief. In the end, the solution
involved just fully rebuilding the entire 16-bit part by default.
Which, of course, nullifies any of the
advantages we might have gotten from a Makefile in the first place, due to
just how unreliable they are. If you had problems building ReC98 in the
past, try again now!
And sure, it would certainly be possible to also get Tup working on
Windows ≤XP, or 9x even. But I leave that to all those tinkerers out there
who are actually motivated to keep those OSes alive. My work here is
done – we now have a build process that is optimal on 32-bit
Windows ≧Vista, and still functional and reliable on 64-bit
Windows, Linux, and everything down to Windows 98 SE, and therefore also
real PC-98 hardware. Pretty good, I'd say.
(If it weren't for that weird crash of the 16-bit TASM.EXE in
that Windows 95 command prompt I've tried it in, it would also work on
that OS. Probably just a misconfiguration on my part?)
Now, it might look like a waste of time to improve a 32-bit build part
that won't even exist anymore once this project is done. However, a fully
16-bit DOS build will only make sense after
master.lib has been turned into a proper library, linked in by
TLINK rather than #included in the big .ASM
files.
This affects all games. If master.lib's data was consistently placed at
the beginning or end of each data segment, this would be no big deal, but
it's placed somewhere else in every binary.
So, this will only make sense sometime around 90% overall PI, and maybe
~50% RE in each game. Which is something else than 50% overall –
especially since it includes TH02, the objectively worst Touhou game,
which hasn't received any dedicated funding ever.
Then, it will probably still require a couple of dedicated pushes to
move all the remaining data to C land.
Oh, and my 16-bit build system project also needs to be done before,
because, again, Makefiles are trash and we shouldn't rely on them even
more.
And who knows whether this project will get funded for that long. So yeah,
the 32-bit build part will stay with us for quite some more time, and for
all upcoming PI milestones. And with the current build process, it's
pretty much the most minor among all the minor issues I can think of.
Let's all enjoy the performance of a 32-bit build while we can 🙂
Next up: Paying some technical debt while keeping the RE% and PI% in place.