⮜ Blog

⮜ List of tags

Showing all posts tagged meta- and th05-

📝 Posted:
🚚 Summary of:
P0137
Commits:
07bfcf2...8d953dc
💰 Funded by:
[Anonymous]
🏷 Tags:
rec98+ th02+ th03+ th04+ th05- build-process+ meta- contribution-ideas+ mod+ tasm+ tcc+

Whoops, the build was broken again? Since P0127 from mid-November 2020, on TASM32 version 5.3, which also happens to be the one in the DevKit… That version changed the alignment for the default segments of certain memory models when requesting .386 support. And since redefining segment alignment apparently is highly illegal and absolutely has to be a build error, some of the stand-alone .ASM translation units didn't assemble anymore on this version. I've only spotted this on my own because I casually compiled ReC98 somewhere else – on my development system, I happened to have TASM32 version 5.0 in the PATH during all this time.
At least this was a good occasion to get rid of some weird segment alignment workarounds from 2015, and replace them with the superior convention of using the USE16 modifier for the .MODEL directive.

ReC98 would highly benefit from a build server – both in order to immediately spot issues like this one, and as a service for modders. Even more so than the usual open-source project of its size, I would say. But that might be exactly because it doesn't seem like something you can trivially outsource to one of the big CI providers for open-source projects, and quickly set it up with a few lines of YAML.
That might still work in the beginning, and we might get by with a regular 64-bit Windows 10 and DOSBox running the exact build tools from the DevKit. Ideally, though, such a server should really run the optimal configuration of a 32-bit Windows 10, allowing both the 32-bit and the 16-bit build step to run natively, which already is something that no popular CI service out there offers. Then, we'd optimally expand to Linux, every other Windows version down to 95, emulated PC-98 systems, other TASM versions… yeah, it'd be a lot. An experimental project all on its own, with additional hosting costs and probably diminishing returns, the more it expands…
I've added it as a category to the order form, let's see how much interest there is once the store reopens (which will be at the beginning of May, at the latest). That aside, it would 📝 also be a great project for outside contributors!


So, technical debt, part 8… and right away, we're faced with TH03's low-level input function, which 📝 once 📝 again 📝 insists on being word-aligned in a way we can't fake without duplicating translation units. Being undecompilable isn't exactly the best property for a function that has been interesting to modders in the past: In 2018, spaztron64 created an ASM-level mod that hardcoded more ergonomic key bindings for human-vs-human multiplayer mode: 2021-04-04-TH03-WASD-2player.zip However, this remapping attempt remained quite limited, since we hadn't (and still haven't) reached full position independence for TH03 yet. There's quite some potential for size optimizations in this function, which would allow more BIOS key groups to already be used right now, but it's not all that obvious to modders who aren't intimately familiar with x86 ASM. Therefore, I really wouldn't want to keep such a long and important function in ASM if we don't absolutely have to…

… and apparently, that's all the motivation I needed? So I took the risk, and spent the first half of this push on reverse-engineering TCC.EXE, to hopefully find a way to get word-aligned code segments out of Turbo C++ after all.

And there is! The -WX option, used for creating DPMI applications, messes up all sorts of code generation aspects in weird ways, but does in fact mark the code segment as word-aligned. We can consider ourselves quite lucky that we get to use Turbo C++ 4.0, because this feature isn't available in any previous version of Borland's C++ compilers.
That allowed us to restore all the decompilations I previously threw away… well, two of the three, that lookup table generator was too much of a mess in C. :tannedcirno: But what an abuse this is. The subtly different code generation has basically required one creative workaround per usage of -WX. For example, enabling that option causes the regular PUSH BP and POP BP prolog and epilog instructions to be wrapped with INC BP and DEC BP, for some reason:

a_function_compiled_with_wx proc
	inc 	bp    	; ???
	push	bp
	mov 	bp, sp
	    	      	; [… function code …]
	pop 	bp
	dec 	bp    	; ???
	ret
a_function_compiled_with_wx endp

Luckily again, all the functions that currently require -WX don't set up a stack frame and don't take any parameters.
While this hasn't directly been an issue so far, it's been pretty close: snd_se_reset(void) is one of the functions that require word alignment. Previously, it shared a translation unit with the immediately following snd_se_play(int new_se), which does take a parameter, and therefore would have had its prolog and epilog code messed up by -WX. Since the latter function has a consistent (and thus, fakeable) alignment, I simply split that code segment into two, with a new -WX translation unit for just snd_se_reset(void). Problem solved – after all, two C++ translation units are still better than one ASM translation unit. :onricdennat: Especially with all the previous #include improvements.

The rest was more of the usual, getting us 74% done with repaying the technical debt in the SHARED segment. A lot of the remaining 26% is TH04 needing to catch up with TH03 and TH05, which takes comparatively little time. With some good luck, we might get this done within the next push… that is, if we aren't confronted with all too many more disgusting decompilations, like the two functions that ended this push. If we are, we might be needing 10 pushes to complete this after all, but that piece of research was definitely worth the delay. Next up: One more of these.

📝 Posted:
🚚 Summary of:
P0126, P0127
Commits:
6c22af7...8b01657, 8b01657...dc65b59
💰 Funded by:
Blue Bolt, [Anonymous]
🏷 Tags:
rec98+ th03+ th04+ th05- pc98+ micro-optimization+ tcc+ tasm+ meta-

Alright, back to continuing the master.hpp transition started in P0124, and repaying technical debt. The last blog post already announced some ridiculous decompilations… and in fact, not a single one of the functions in these two pushes was decompilable into idiomatic C/C++ code.

As usual, that didn't keep me from trying though. The TH04 and TH05 version of the infamous 16-pixel-aligned, EGC-accelerated rectangle blitting function from page 1 to page 0 was fairly average as far as unreasonable decompilations are concerned.
The big blocker in TH03's MAIN.EXE, however, turned out to be the .MRS functions, used to render the gauge attack portraits and bomb backgrounds. The blitting code there uses the additional FS and GS segment registers provided by the Intel 386… which

  1. are not supported by Turbo C++'s inline assembler, and
  2. can't be turned into pointers, due to a compiler bug in Turbo C++ that generates wrong segment prefix opcodes for the _FS and _GS pseudo-registers.

Apparently I'm the first one to even try doing that with this compiler? I haven't found any other mention of this bug…
Compiling via assembly (#pragma inline) would work around this bug and generate the correct instructions. But that would incur yet another dependency on a 16-bit TASM, for something honestly quite insignificant.

What we can always do, however, is using __emit__() to simply output x86 opcodes anywhere in a function. Unlike spelled-out inline assembly, that can even be used in helper functions that are supposed to inline… which does in fact allow us to fully abstract away this compiler bug. Regular if() comparisons with pseudo-registers wouldn't inline, but "converting" them into C++ template function specializations does. All that's left is some C preprocessor abuse to turn the pseudo-registers into types, and then we do retain a normal-looking poke() call in the blitting functions in the end. 🤯

Yeah… the result is batshit insane. I may have gone too far in a few places…


One might certainly argue that all these ridiculous decompilations actually hurt the preservation angle of this project. "Clearly, ZUN couldn't have possibly written such unreasonable C++ code. So why pretend he did, and not just keep it all in its more natural ASM form?" Well, there are several reasons:

Unfortunately, these pushes also demonstrated a second disadvantage in trying to decompile everything possible: Since Turbo C++ lacks TASM's fine-grained ability to enforce code alignment on certain multiples of bytes, it might actually be unfeasible to link in a C-compiled object file at its intended original position in some of the .EXE files it's used in. Which… you're only going to notice once you encounter such a case. Due to the slightly jumbled order of functions in the 📝 second, shared code segment, that might be long after you decompiled and successfully linked in the function everywhere else.

And then you'll have to throw away that decompilation after all 😕 Oh well. In this specific case (the lookup table generator for horizontally flipping images), that decompilation was a mess anyway, and probably helped nobody. I could have added a dummy .OBJ that does nothing but enforce the needed 2-byte alignment before the function if I really insisted on keeping the C version, but it really wasn't worth it.


Now that I've also described yet another meta-issue, maybe there'll really be nothing to say about the next technical debt pushes? :onricdennat: Next up though: Back to actual progress again, with TH01. Which maybe even ends up pushing that game over the 50% RE mark?

📝 Posted:
🚚 Summary of:
P0059
Commits:
01de290...8b62780
💰 Funded by:
[Anonymous], -Tom-
🏷 Tags:
rec98+ th04+ th05- pc98+ position-independence+ uth05win+ meta-

With no feedback to 📝 last week's blog post, I assume you all are fine with how things are going? Alright then, another one towards position independence, with the same approach as before…

Since -Tom- wanted to learn something about how the PC-98 EGC is used in TH04 and TH05, I took a look at master.lib's egc_shift_*() functions. These simply do a hardware-accelerated memmove() of any VRAM region, and are used for screen shaking effects. Hover over the image below for the raw effect:

Then, I finally wanted to take a look at the bullet structures, but it required way too much reverse-engineering to even start within ¾ of a position independence push. Even with the help of uth05win – bullet handling was changed quite a bit from TH04 to TH05.

What I ultimately settled on was more raw, "boring" PI work based around an already known set of functions. For this one, I looked at vector construction… and this time, that actually made the games a little bit more position-independent, and wasn't just all about removing false positives from the calculation. This was one of the few sets of functions that would also apply to TH01, and it revealed just how chaotically that game was coded. This one commit shows three ways how ZUN stored regular 2D points in TH01:

… yeah. But in more productive news, this did actually lay the groundwork for TH04 and TH05 bullet structures. Which might even be coming up within the next big, 5-push order from Touhou Patch Center? These are the priorities I got from them, let's see how close I can get!

📝 Posted:
🚚 Summary of:
P0057, P0058
Commits:
1cb9731...ac7540d, ac7540d...fef0299
💰 Funded by:
[Anonymous], -Tom-
🏷 Tags:
rec98+ th04+ th05- gameplay+ item+ animation+ uth05win+ meta-

So, here we have the first two pushes with an explicit focus on position independence… and they start out looking barely different from regular reverse-engineering? They even already deduplicate a bunch of item-related code, which was simple enough that it required little additional work? Because the actual work, once again, was in comparing uth05win's interpretations and naming choices with the original PC-98 code? So that we only ended up removing a handful of memory references there?

(Oh well, you can mod item drops now!)

So, continuing to interpret PI as a mere by-product of reverse-engineering might ultimately drive up the total PI cost quite a bit. But alright then, let's systematically clear out some false positives by looking at master.lib function calls instead… and suddenly we get the PI progress we were looking for, nicely spread out over all games since TH02. That kinda makes it sound like useless work, only done because it's dictated by some counting algorithm on a website. But decompilation will want to convert all of these values to decimal anyway. We're merely doing that right now, across all games.

Then again, it doesn't actually make any game more position-independent, and only proves how position-independent it already was. So I'm really wondering right now whether I should just rush actual position independence by simply identifying structures and their sizes, and not bother with members or false positives until that's done. That would certainly get the job done for TH04 and TH05 in just a few more pushes, but then leave all the proving work (and the road to 100% PI on the front page) to reverse-engineering.

I don't know. Would it be worth it to have a game that's „maybe fully position-independent“, only for there to maybe be rare edge cases where it isn't?

Or maybe, continuing to strike a balance between identifying false positives (fast) and reverse-engineering structures (slow) will continue to work out like it did now, and make us end up close to the current estimate, which was attractive enough to sell out the crowdfunding for the first time… 🤔

Please give feedback! If possible, by Friday evening UTC+1, before I start working on the next PI push, this time with a focus on TH04.