⮜ Blog

⮜ List of tags

Showing all posts tagged build-process-

📝 Posted:
🚚 Summary of:
P0217
Commits:
(Seihou) pbg...P0217
💰 Funded by:
Arandui
🏷 Tags:
seihou+ sh01+ build-process-

First of all: This blog is now available as a web feed, in three different formats!

Thanks to handlerug for implementing and PR'ing the feature in a very clean way. That makes at least two people I know who wanted to see feed support, so there are probably a few more out there.


So, Shuusou Gyoku. pbg released the original source code for the first two Seihou games back in February 2019, but notably removed the crucial decompression code for the original packfiles due to… various unspecified reasons, considerations, and implications. :thonk: This vague language and subsequent rejection of a pull request to add these features back in were probably the main reasons why no one has publicly done anything with this codebase since.

The only other fork I know about is Priw8's private fork from 2020, but only because WishMakers informed me about it shortly after this push was funded. Both of them might also contribute some features to my fork in the future if their time allows it.
In this fork, Priw8 replaced packfile decompression with raw reads from directories with the pre-extracted contents of all the .DAT files. This works for playing the game, but there are actually two more things that require the original packfile code:

We can surely implement our own simple and uncompressed formats for these things, but it's not the best idea to build all future Shuusou Gyoku features on top of a replay-incompatible fork. So, what do we do? On the one hand, pbg expressed the clear wish to not include data reverse-engineered from the original binary. On the other hand, he released the code under the MIT license, which allows us to modify the code and distribute the results in any way we wish.
So, let's meet in the middle, and go for a clean-room implementation of the missing features as indicated by their usage, without looking at either the original binary or wangqr's reverse-engineered code.


With incremental rebuilds being broken in the latest Visual Studio project files as well, it made sense to start from scratch on pbg's last commit. Of course, I can't pass up a chance to use 📝 Tup, my favorite build system for every project I'm the main developer of. It might not fit Shuusou Gyoku as well as it fits ReC98, but let's see whether it would be reasonable at all…

… and it's actually not too bad! Modern Visual Studio makes this a bit harder than it should be with all the intermediate build artifacts you have to keep track of. In the end though, it's still only 70 lines of Lua to have a nice abstraction for both Debug and Release builds. With this layer underneath, the actual Shuusou Gyoku-specific part can be expressed as succinctly as in any other modern build system, while still making every compiler flag explicit. It might be slightly slower than a traditional .vcxproj build due to launching one cl.exe process per translation unit, but the result is way more reliable and trustworthy compared to anything that involves Visual Studio project files. This simplicity paves the way for expanding the build process to multiple steps, and doing all the static checking on translation strings that I never got to do for thcrap-based patches. Heck, I might even compile all future translations directly into the binary…

Every C++ build system will invariably be hated by someone, so I'd say that your goal should always be to simplify the actually important parts of your build enough to allow everyone else to easily adapt it to their favorite system. This Tupfile definitely does a better job there than your average .vcxproj file – but if you still want such a thing (or, gasp, 🤮 CMake project files 🤮) for better Visual Studio IDE integration, you should have no problem generating them for yourself.
There might still be a point in doing that because that's the one part that unfortunately sucks about this approach. Visual Studio is horribly broken for any nonstandard C++ project even in 2022:

In both cases, IntelliSense doesn't work properly at all even if it appears to be configured correctly, and Tup's dependency tracking appeared to be weirdly cut off for the very final .PDB file. Interestingly though, using the big Visual Studio IDE for just debugging a binary via devenv bin/GIAN07.exe suddenly eliminates all the IntelliSense issues. Looks like there's a lot of essential information stored in the .PDB files that Visual Studio just refuses to read in any other context. :thonk:

But now compare that to Visual Studio Code: Open it from the x64_x86 Cross Tools Command Prompt via code ., launch a build or debug task, or browse the code with perfect IntelliSense. Three small configuration files and everything just works – heck, you even get the Tup progress bar in the terminal. It might be Electron bloatware and horribly slow at times, but Visual Studio Code has long outperformed regular Visual Studio in terms of non-debug functionality.


On to the compression algorithm then… and it's just textbook LZSS, with 13 bits for the offset of a back-reference and 4 bits for its length? Hardly a trade secret there. The hard parts there all come from unexpected inefficiencies in the bitstream format:

  1. Encoding back-references as offsets into an 8 KiB ring buffer dictionary means that the most straightforward implementation actually needs an 8 KiB array for the LZSS sliding window. This could have easily been done with zero additional memory if the offset was encoded as the difference to the current byte instead.
  2. The packfile format stores the uncompressed size of every file in its header, which is a good thing because you want to know in advance how much heap memory to allocate for a specific file. Nevertheless, the original game only stops reading bits from the packfile once it encountered a back-reference with an offset of 0. This means that the compressor not only has to write this technically unneeded back-reference to the end of the compressed bitstream, but also ignore any potential other longest back-reference with an offset of 0 within the file. The latter can easily happen with a ring buffer dictionary.

The original game used a single BIT_DEVICE class with mode flags for every combination of reading and writing memory buffers and on-disk files. Since that would have necessitated a lot of error checking for all (pseudo-)methods of this class, I wrote one dedicated small class for each one of these permutations instead. To further emphasize the clean-room property of this code, these use modern C++ memory ownership features: std::unique_ptr for the fixed-size read-only buffers we get from packfiles, std::vector for the newly compressed buffers where we don't know the size in advance, and std::span for a borrowed reference to an immutable region of memory that we want to treat as a bitstream. Definitely better than using the native Win32 LocalAlloc() and LocalFree() allocator, especially if we want to port the game away from Windows one day.

One feature I didn't use though: C++ fstreams, because those are trash. :tannedcirno: These days, they would seem to be the natural choice with the new std::filesystem::path type from C++17: Correctly constructed, you can pass that type to an fstream constructor and gain both locale independence on Windows and portability to everything else, without writing any Windows-specific UTF-16 code. But even in a Release build, fstreams add ~100 KB of locale-related bloat to the .EXE which adds no value for just reading binary files. That's just too embarrassing if you look at how much space the rest of the game takes up. Writing your own platform layer that calls the Win32 CreateFileW(), ReadFile(), and WriteFile() API functions is apparently still the way to go even in 2022. And with std::filesystem::path still being a welcome addition to C++, it's not too much code to write either.

This gets us file format compatibility with the original release… and a crash as soon as the ending starts, but only in Release mode? As it turns out, this crash is caused by an out-of-bounds array access bug that was present even in the original game, and only turned into a crash now because the optimizer in modern Visual Studio versions reorders static data. As a result, the 6-element pFontInfo array got placed in front of an ECL-related counter variable that then got corrupted by the write to the 7th element, which subsequently crashed the game with a read access to previously deallocated danmaku script data. That just goes to show that these technical bugs are important and worth fixing even if they don't cause issues in the original game. Who knows how many of these will turn into crashes once we get to porting PC-98 Touhou?


So here we go, a new build of Shuusou Gyoku, compiled with Visual Studio 2022, and compatible with all original data formats:

:sh01: Shuusou Gyoku P0217

Inside the regular Shuusou Gyoku installation directory, this binary works as a full-fledged drop-in replacement for the original 秋霜玉.exe. It still has all of the original binary's problems though:

As well as some of its own:

So all in all, it's a strict downgrade at this point in time. :onricdennat: And more of a symbol that we can now start doing actual work on this game. Seihou has been a fun change of pace, and I hope that I get to do more work on the series. There is quite a lot to be done with Shuusou Gyoku alone, and the 21 GitHub issues I've opened are probably only scratching the surface.
However, all the required research for this one consumed more like 1⅔ pushes. Despite just one push being funded, it wouldn't have made sense to release the commits or this binary in any earlier state. To repay this debt, I'm going to put the next for Seihou towards the small code maintenance and performance tasks that I usually do for free, before doing any more feature and bugfix work. Next up: Improving video playback on the blog, and maybe delivering some microtransaction work on the side?

📝 Posted:
🚚 Summary of:
P0137
Commits:
07bfcf2...8d953dc
💰 Funded by:
[Anonymous]
🏷 Tags:
rec98+ th02+ th03+ th04+ th05+ build-process- meta+ contribution-ideas+ mod+ tasm+ tcc+

Whoops, the build was broken again? Since P0127 from mid-November 2020, on TASM32 version 5.3, which also happens to be the one in the DevKit… That version changed the alignment for the default segments of certain memory models when requesting .386 support. And since redefining segment alignment apparently is highly illegal and absolutely has to be a build error, some of the stand-alone .ASM translation units didn't assemble anymore on this version. I've only spotted this on my own because I casually compiled ReC98 somewhere else – on my development system, I happened to have TASM32 version 5.0 in the PATH during all this time.
At least this was a good occasion to get rid of some weird segment alignment workarounds from 2015, and replace them with the superior convention of using the USE16 modifier for the .MODEL directive.

ReC98 would highly benefit from a build server – both in order to immediately spot issues like this one, and as a service for modders. Even more so than the usual open-source project of its size, I would say. But that might be exactly because it doesn't seem like something you can trivially outsource to one of the big CI providers for open-source projects, and quickly set it up with a few lines of YAML.
That might still work in the beginning, and we might get by with a regular 64-bit Windows 10 and DOSBox running the exact build tools from the DevKit. Ideally, though, such a server should really run the optimal configuration of a 32-bit Windows 10, allowing both the 32-bit and the 16-bit build step to run natively, which already is something that no popular CI service out there offers. Then, we'd optimally expand to Linux, every other Windows version down to 95, emulated PC-98 systems, other TASM versions… yeah, it'd be a lot. An experimental project all on its own, with additional hosting costs and probably diminishing returns, the more it expands…
I've added it as a category to the order form, let's see how much interest there is once the store reopens (which will be at the beginning of May, at the latest). That aside, it would 📝 also be a great project for outside contributors!


So, technical debt, part 8… and right away, we're faced with TH03's low-level input function, which 📝 once 📝 again 📝 insists on being word-aligned in a way we can't fake without duplicating translation units. Being undecompilable isn't exactly the best property for a function that has been interesting to modders in the past: In 2018, spaztron64 created an ASM-level mod that hardcoded more ergonomic key bindings for human-vs-human multiplayer mode: 2021-04-04-TH03-WASD-2player.zip However, this remapping attempt remained quite limited, since we hadn't (and still haven't) reached full position independence for TH03 yet. There's quite some potential for size optimizations in this function, which would allow more BIOS key groups to already be used right now, but it's not all that obvious to modders who aren't intimately familiar with x86 ASM. Therefore, I really wouldn't want to keep such a long and important function in ASM if we don't absolutely have to…

… and apparently, that's all the motivation I needed? So I took the risk, and spent the first half of this push on reverse-engineering TCC.EXE, to hopefully find a way to get word-aligned code segments out of Turbo C++ after all.

And there is! The -WX option, used for creating DPMI applications, messes up all sorts of code generation aspects in weird ways, but does in fact mark the code segment as word-aligned. We can consider ourselves quite lucky that we get to use Turbo C++ 4.0, because this feature isn't available in any previous version of Borland's C++ compilers.
That allowed us to restore all the decompilations I previously threw away… well, two of the three, that lookup table generator was too much of a mess in C. :tannedcirno: But what an abuse this is. The subtly different code generation has basically required one creative workaround per usage of -WX. For example, enabling that option causes the regular PUSH BP and POP BP prolog and epilog instructions to be wrapped with INC BP and DEC BP, for some reason:

a_function_compiled_with_wx proc
	inc 	bp    	; ???
	push	bp
	mov 	bp, sp
	    	      	; [… function code …]
	pop 	bp
	dec 	bp    	; ???
	ret
a_function_compiled_with_wx endp

Luckily again, all the functions that currently require -WX don't set up a stack frame and don't take any parameters.
While this hasn't directly been an issue so far, it's been pretty close: snd_se_reset(void) is one of the functions that require word alignment. Previously, it shared a translation unit with the immediately following snd_se_play(int new_se), which does take a parameter, and therefore would have had its prolog and epilog code messed up by -WX. Since the latter function has a consistent (and thus, fakeable) alignment, I simply split that code segment into two, with a new -WX translation unit for just snd_se_reset(void). Problem solved – after all, two C++ translation units are still better than one ASM translation unit. :onricdennat: Especially with all the previous #include improvements.

The rest was more of the usual, getting us 74% done with repaying the technical debt in the SHARED segment. A lot of the remaining 26% is TH04 needing to catch up with TH03 and TH05, which takes comparatively little time. With some good luck, we might get this done within the next push… that is, if we aren't confronted with all too many more disgusting decompilations, like the two functions that ended this push. If we are, we might be needing 10 pushes to complete this after all, but that piece of research was definitely worth the delay. Next up: One more of these.

📝 Posted:
🚚 Summary of:
P0113, P0114
Commits:
150d2c6...6204fdd, 6204fdd...967bb8b
💰 Funded by:
Lmocinemod
🏷 Tags:
rec98+ th02+ th03+ th04+ build-process- pipeline+ tasm+ blitting+ file-format+ input+

Alright, tooling and technical debt. Shouldn't be really much to talk about… oh, wait, this is still ReC98 :tannedcirno:

For the tooling part, I finished up the remaining ergonomics and error handling for the 📝 sprite converter that Jonathan Campbell contributed two months ago. While I familiarized myself with the tool, I've actually ran into some unreported errors myself, so this was sort of important to me. Still got no command-line help in there, but the error messages can now do that job probably even better, since we would have had to write them anyway.

So, what's up with the technical debt then? Well, by now we've accumulated quite a number of 📝 ASM code slices that need to be either decompiled or clearly marked as undecompilable. Since we define those slices as "already reverse-engineered", that decision won't affect the numbers on the front page at all. But for a complete decompilation, we'd still have to do this someday. So, rather than incorporating this work into pushes that were purchased with the expectation of measurable progress in a certain area, let's take the "anything goes" pushes, and focus entirely on that during them.

The second code segment seemed like the best place to start with this, since it affects the largest number of games simultaneously. Starting with TH02, this segment contains a set of random "core" functions needed by the binary. Image formats, sounds, input, math, it's all there in some capacity. You could maybe call it all "libzun" or something like that? But for the time being, I simply went with the obvious name, seg2. Maybe I'll come up with something more convincing in the future.


Oh, but wait, why were we assembling all the previous undecompilable ASM translation units in the 16-bit build part? By moving those to the 32-bit part, we don't even need a 16-bit TASM in our list of dependencies, as long as our build process is not fully 16-bit.
And with that, ReC98 now also builds on Windows 95, and thus, every 32-bit Windows version. 🎉 Which is certainly the most user-visible improvement in all of these two pushes. :onricdennat:


Back in 2015, I already decompiled all of TH02's seg2 functions. As suggested by the Borland compiler, I tried to follow a "one translation unit per segment" layout, bundling the binary-specific contents via #include. In the end, it required two translation units – and that was even after manually inserting the original padding bytes via #pragma codestring… yuck. But it worked, compiled, and kept the linker's job (and, by extension, segmentation worries) to a minimum. And as long as it all matched the original binaries, it still counted as a valid reconstruction of ZUN's code. :zunpet:

However, that idea ultimately falls apart once TH03 starts mixing undecompilable ASM code inbetween C functions. Now, we officially have no choice but to use multiple C and ASM translation units, with maybe only just one or two #includes in them…

…or we finally start reconstructing the actual seg2 library, turning every sequence of related functions into its own translation unit. This way, we can simply reuse the once-compiled .OBJ files for all the binaries those functions appear in, without requiring that additional layer of translation units mirroring the original segmentation.
The best example for this is TH03's almost undecompilable function that generates a lookup table for horizontally flipping 8 1bpp pixels. It's part of every binary since TH03, but only used in that game. With the previous approach, we would have had to add 9 C translation units, which would all have just #included that one file. Now, we simply put the .OBJ file into the correct place on the linker command line, as soon as we can.

💡 And suddenly, the linker just inserts the correct padding bytes itself.

The most immediate gains there also happened to come from TH03. Which is also where we did get some tiny RE% and PI% gains out of this after all, by reverse-engineering some of its sprite blitting setup code. Sure, I should have done even more RE here, to also cover those 5 functions at the end of code segment #2 in TH03's MAIN.EXE that were in front of a number of library functions I already covered in this push. But let's leave that to an actual RE push 😛


All in all though, I was just getting started with this; the real gains in terms of removed ASM files are still to come. But in the meantime, the funding situation has become even better in terms of allowing me to focus on things nobody asked for. 🙂 So here's a slightly better idea: Instead of spending two more pushes on this, let's shoot for TH05 MAINE.EXE position independence next. If I manage to get it done, we'll have a 100% position-independent TH05 by the time -Tom- finishes his MAIN.EXE PI demo, rather than the 94% we'd get from just MAIN.EXE. That's bound to make a much better impression on all the people who will then (re-)discover the project.

📝 Posted:
🚚 Summary of:
P0001
Commits:
e447a2d...150d2c6
💰 Funded by:
GhostPhanom
🏷 Tags:
rec98+ build-process- tcc+ tasm+

(tl;dr: ReC98 has switched to Tup for the 32-bit build. You probably want to get 💾 this build of Tup, and put it somewhere in your PATH. It's optional, and always will be, but highly recommended.)


P0001! Reserved for the delivery of the very first financial contribution I've ever received for ReC98, back in January 2018. GhostPhanom requested the exact opposite of immediate results, which motivated me to go on quite a passionate quest for the perfect ReC98 build system. A quest that went way beyond the crowdfunding…

Makefiles are a decent idea in theory: Specify the targets to generate, the source files these targets depend on and are generated from, and the rules to do the generating, with some helpful shorthand syntax. Then, you have a build dependency graph, and your make tool of choice can provide minimal rebuilds of only the targets whose sources changed since the last make call. But, uh… wait, this is C/C++ we're talking about, and doesn't pretty much every source file come with a second set of dependent source files, namely, every single #include in the source file itself? Do we really have to duplicate all these inside the Makefile, and keep it in sync with the source file? 🙄

This fact alone means that Makefiles are inherently unsuited for any language with an #include feature… that is, pretty much every language out there. Not to mention other aspects like changes to the compilation command lines, or the build rules themselves, all of which require metadata of the previous build to be persistently stored in some way. I have no idea why such a trash technology is even touted as a viable build tool for code.

But wait! Most make implementations, including Borland's, do support the notion of auto-dependency information, emitted by the compiler in a specific format, to provide make with the additional list of #includes. Sure, this should be a basic feature of any self-respecting build tool, and not something you have to add as an extension, but let's just set our idealism aside for a moment. Well, too bad that Borland's implementation only works if you spell out both the source➜object and the object➜binary rules, which loses the performance gained from compiling multiple translation units in a single BCC or TCC process. And even then, it tends to break in that DOS VM you're probably using. Not to mention, again, all the other aspects that still remain unsolved.

So, I decided to just write my own build system, tailor-made for the needs of ReC98's 16-bit build process, and combining a number of experimental ideas. Which is still not quite bug-free and ready for public use, given that the entire past year has kept me busy with actual tangible RE and PI progress. What did finally become ready, however, is the improvement for the 32-bit build part, and that's what we've got here.


💭 Now, if only there was a build system that would perfectly track dependencies of any compiler it calls, by injecting code and hooking file opening syscalls. It'd be completely unrealistic for it to also run on DOS (and we probably don't want to traverse a graph database in a cycle-limited DOSBox), but it would be perfect for our 32-bit build part, as long as that one still exists.

Turns out Tup is exactly that system. In practice, its low-level nature as a make replacement does limit its general usefulness, which is why you probably haven't seen it used in a lot of projects. But for something like ReC98 with its reliance on outdated compilers that aren't supported by any decent high-level tool, it's exactly the right tool for the job. Also, it's completely beyond me how Ninja, the most popular make replacement these days, was inspired by Tup, yet went a step back to parsing the specific dependency information produced by gcc, Clang, and Visual Studio, and only those

Sure, it might seem really minor to worry about not unconditionally rebuilding all 32-bit .asm files, which just takes a couple of seconds anyway. But minimal rebuilds in the 32-bit part also provide the foundation for minimal rebuilds in the 16-bit part – and those TLINK invocations do take quite some time after all.

Using Tup for ReC98 was an idea that dated back to January 2017. Back then, I already opened the pull request with a fix to allow Tup to work together with 32-bit TASM. As much as I love Tup though, the fact that it only worked on 64-bit Windows ≥Vista would have meant that we had to exchange perfect dependency tracking for the ability to build on 32-bit and older Windows versions at all. For a project that relies on DOS compilers, this would have been exactly the wrong trade-off to make.

What's worse though: TLINK fails to run on modern 32-bit Windows with Loader error (0000) : Unrecognized Error. Therefore, the set of systems that Tup runs on, and the set of systems that can actually compile ReC98's 16-bit build part natively, would have been exactly disjoint, with no OS getting to use both at the same time.
So I've kept using Tup for only my own development, but indefinitely shelved the idea of making it the official build system, due to those drawbacks. Recently though, it all came together:

As I'm writing this post, the pull request has unfortunately not yet been merged. So, here's my own custom build instead:

💾 Download Tup for 32-bit Windows (optimized build at this commit)

I've also added it to the DevKit, for any newcomers to ReC98.


After the switch to Tup and the fallback option, I extensively tested building ReC98 on all operating systems I had lying around. And holy cow, so much in that build was broken beyond belief. In the end, the solution involved just fully rebuilding the entire 16-bit part by default. :tannedcirno: Which, of course, nullifies any of the advantages we might have gotten from a Makefile in the first place, due to just how unreliable they are. If you had problems building ReC98 in the past, try again now!

And sure, it would certainly be possible to also get Tup working on Windows ≤XP, or 9x even. But I leave that to all those tinkerers out there who are actually motivated to keep those OSes alive. My work here is done – we now have a build process that is optimal on 32-bit Windows ≧Vista, and still functional and reliable on 64-bit Windows, Linux, and everything down to Windows 98 SE, and therefore also real PC-98 hardware. Pretty good, I'd say.

(If it weren't for that weird crash of the 16-bit TASM.EXE in that Windows 95 command prompt I've tried it in, it would also work on that OS. Probably just a misconfiguration on my part?)

Now, it might look like a waste of time to improve a 32-bit build part that won't even exist anymore once this project is done. However, a fully 16-bit DOS build will only make sense after

And who knows whether this project will get funded for that long. So yeah, the 32-bit build part will stay with us for quite some more time, and for all upcoming PI milestones. And with the current build process, it's pretty much the most minor among all the minor issues I can think of. Let's all enjoy the performance of a 32-bit build while we can 🙂

Next up: Paying some technical debt while keeping the RE% and PI% in place.

📝 Posted:
🏷 Tags:
rec98+ meta+ build-process- pipeline+

TH01 pellets are coming up next, and for the first time, we'll have the chance to move hardcoded sprite data from ASM land to C land. As it would turn out, bad luck with the 2-byte alignment at the end of REIIDEN.EXE's data segment pretty much forces us to declare TH01's pellet sprites in C if we want to decompile the final few pellet functions without ugly workarounds for the float literals there. And while I could have just converted them into a C array and called it a day, it did raise the question of when we are going to do this The Right And Moddable Way, by auto-converting actual image files into ASM or C arrays during the build process. These arrays are even more annoying to edit in C, after all – unlike TASM, the old C++ we have to work with doesn't support binary number literals, only hexadecimal or, gasp, octal.
Without the explicit funding for such a converter, I reached out to GitHub, asking backers and outside contributors whether they'd be in favor of it. As something that requires no RE skills and collides with nothing else, it would be a perfect task for C/C++ coders who want to support ReC98 with something other than money.

And surprisingly, those still exist! Jonathan Campbell, of DOSBox-X fame, went ahead and implemented all the required functionality, within just a few days. Thanks again! The result is probably a lot more portable than it would have been if I had written it. Which is pretty relevant for future port authors – any additional tooling we write ourselves should not add to the list of problems they'll have to worry about.

Right now, all of the sprites are #included from the big ASM dump files, which means that they have to be converted before those files are assembled during the 32-bit build part. We could have introduced a third distinct build step there, perhaps even a 16-bit one so that we can use Turbo C++ 4.0J to also compile the converter… However, the more reasonable option was to do this at the beginning of the 32-bit build step, and add a 32-bit Windows C++ compiler to the list of tools required for ReC98's build process.
And the best choice for ReC98 is, in fact… 🥁… the 20-year-old Borland C++ 5.5 freeware release. See the README for a lengthy justification, as well as download links.

So yes, all sprites mentioned in the GitHub issue can now be modded by simply editing .BMP files, using an image editor of your choice. 🖌
And now that that's dealt with, it's finally time for more actual progress! TH01 pellets coming tomorrow.