⮜ Blog

⮜ List of tags

Showing all posts tagged tasm- and th04-

📝 Posted:
🚚 Summary of:
P0137
Commits:
07bfcf2...8d953dc
💰 Funded by:
[Anonymous]
🏷 Tags:
rec98+ th02+ th03+ th04- th05+ build-process+ meta+ contribution-ideas+ mod+ tasm- tcc+

Whoops, the build was broken again? Since P0127 from mid-November 2020, on TASM32 version 5.3, which also happens to be the one in the DevKit… That version changed the alignment for the default segments of certain memory models when requesting .386 support. And since redefining segment alignment apparently is highly illegal and absolutely has to be a build error, some of the stand-alone .ASM translation units didn't assemble anymore on this version. I've only spotted this on my own because I casually compiled ReC98 somewhere else – on my development system, I happened to have TASM32 version 5.0 in the PATH during all this time.
At least this was a good occasion to get rid of some weird segment alignment workarounds from 2015, and replace them with the superior convention of using the USE16 modifier for the .MODEL directive.

ReC98 would highly benefit from a build server – both in order to immediately spot issues like this one, and as a service for modders. Even more so than the usual open-source project of its size, I would say. But that might be exactly because it doesn't seem like something you can trivially outsource to one of the big CI providers for open-source projects, and quickly set it up with a few lines of YAML.
That might still work in the beginning, and we might get by with a regular 64-bit Windows 10 and DOSBox running the exact build tools from the DevKit. Ideally, though, such a server should really run the optimal configuration of a 32-bit Windows 10, allowing both the 32-bit and the 16-bit build step to run natively, which already is something that no popular CI service out there offers. Then, we'd optimally expand to Linux, every other Windows version down to 95, emulated PC-98 systems, other TASM versions… yeah, it'd be a lot. An experimental project all on its own, with additional hosting costs and probably diminishing returns, the more it expands…
I've added it as a category to the order form, let's see how much interest there is once the store reopens (which will be at the beginning of May, at the latest). That aside, it would 📝 also be a great project for outside contributors!


So, technical debt, part 8… and right away, we're faced with TH03's low-level input function, which 📝 once 📝 again 📝 insists on being word-aligned in a way we can't fake without duplicating translation units. Being undecompilable isn't exactly the best property for a function that has been interesting to modders in the past: In 2018, spaztron64 created an ASM-level mod that hardcoded more ergonomic key bindings for human-vs-human multiplayer mode: 2021-04-04-TH03-WASD-2player.zip However, this remapping attempt remained quite limited, since we hadn't (and still haven't) reached full position independence for TH03 yet. There's quite some potential for size optimizations in this function, which would allow more BIOS key groups to already be used right now, but it's not all that obvious to modders who aren't intimately familiar with x86 ASM. Therefore, I really wouldn't want to keep such a long and important function in ASM if we don't absolutely have to…

… and apparently, that's all the motivation I needed? So I took the risk, and spent the first half of this push on reverse-engineering TCC.EXE, to hopefully find a way to get word-aligned code segments out of Turbo C++ after all.

And there is! The -WX option, used for creating DPMI applications, messes up all sorts of code generation aspects in weird ways, but does in fact mark the code segment as word-aligned. We can consider ourselves quite lucky that we get to use Turbo C++ 4.0, because this feature isn't available in any previous version of Borland's C++ compilers.
That allowed us to restore all the decompilations I previously threw away… well, two of the three, that lookup table generator was too much of a mess in C. :tannedcirno: But what an abuse this is. The subtly different code generation has basically required one creative workaround per usage of -WX. For example, enabling that option causes the regular PUSH BP and POP BP prolog and epilog instructions to be wrapped with INC BP and DEC BP, for some reason:

a_function_compiled_with_wx proc
	inc 	bp    	; ???
	push	bp
	mov 	bp, sp
	    	      	; [… function code …]
	pop 	bp
	dec 	bp    	; ???
	ret
a_function_compiled_with_wx endp

Luckily again, all the functions that currently require -WX don't set up a stack frame and don't take any parameters.
While this hasn't directly been an issue so far, it's been pretty close: snd_se_reset(void) is one of the functions that require word alignment. Previously, it shared a translation unit with the immediately following snd_se_play(int new_se), which does take a parameter, and therefore would have had its prolog and epilog code messed up by -WX. Since the latter function has a consistent (and thus, fakeable) alignment, I simply split that code segment into two, with a new -WX translation unit for just snd_se_reset(void). Problem solved – after all, two C++ translation units are still better than one ASM translation unit. :onricdennat: Especially with all the previous #include improvements.

The rest was more of the usual, getting us 74% done with repaying the technical debt in the SHARED segment. A lot of the remaining 26% is TH04 needing to catch up with TH03 and TH05, which takes comparatively little time. With some good luck, we might get this done within the next push… that is, if we aren't confronted with all too many more disgusting decompilations, like the two functions that ended this push. If we are, we might be needing 10 pushes to complete this after all, but that piece of research was definitely worth the delay. Next up: One more of these.

📝 Posted:
🚚 Summary of:
P0126, P0127
Commits:
6c22af7...8b01657, 8b01657...dc65b59
💰 Funded by:
Blue Bolt, [Anonymous]
🏷 Tags:
rec98+ th03+ th04- th05+ pc98+ micro-optimization+ tcc+ tasm- meta+

Alright, back to continuing the master.hpp transition started in P0124, and repaying technical debt. The last blog post already announced some ridiculous decompilations… and in fact, not a single one of the functions in these two pushes was decompilable into idiomatic C/C++ code.

As usual, that didn't keep me from trying though. The TH04 and TH05 version of the infamous 16-pixel-aligned, EGC-accelerated rectangle blitting function from page 1 to page 0 was fairly average as far as unreasonable decompilations are concerned.
The big blocker in TH03's MAIN.EXE, however, turned out to be the .MRS functions, used to render the gauge attack portraits and bomb backgrounds. The blitting code there uses the additional FS and GS segment registers provided by the Intel 386… which

  1. are not supported by Turbo C++'s inline assembler, and
  2. can't be turned into pointers, due to a compiler bug in Turbo C++ that generates wrong segment prefix opcodes for the _FS and _GS pseudo-registers.

Apparently I'm the first one to even try doing that with this compiler? I haven't found any other mention of this bug…
Compiling via assembly (#pragma inline) would work around this bug and generate the correct instructions. But that would incur yet another dependency on a 16-bit TASM, for something honestly quite insignificant.

What we can always do, however, is using __emit__() to simply output x86 opcodes anywhere in a function. Unlike spelled-out inline assembly, that can even be used in helper functions that are supposed to inline… which does in fact allow us to fully abstract away this compiler bug. Regular if() comparisons with pseudo-registers wouldn't inline, but "converting" them into C++ template function specializations does. All that's left is some C preprocessor abuse to turn the pseudo-registers into types, and then we do retain a normal-looking poke() call in the blitting functions in the end. 🤯

Yeah… the result is batshit insane. I may have gone too far in a few places…


One might certainly argue that all these ridiculous decompilations actually hurt the preservation angle of this project. "Clearly, ZUN couldn't have possibly written such unreasonable C++ code. So why pretend he did, and not just keep it all in its more natural ASM form?" Well, there are several reasons:

Unfortunately, these pushes also demonstrated a second disadvantage in trying to decompile everything possible: Since Turbo C++ lacks TASM's fine-grained ability to enforce code alignment on certain multiples of bytes, it might actually be unfeasible to link in a C-compiled object file at its intended original position in some of the .EXE files it's used in. Which… you're only going to notice once you encounter such a case. Due to the slightly jumbled order of functions in the 📝 second, shared code segment, that might be long after you decompiled and successfully linked in the function everywhere else.

And then you'll have to throw away that decompilation after all 😕 Oh well. In this specific case (the lookup table generator for horizontally flipping images), that decompilation was a mess anyway, and probably helped nobody. I could have added a dummy .OBJ that does nothing but enforce the needed 2-byte alignment before the function if I really insisted on keeping the C version, but it really wasn't worth it.


Now that I've also described yet another meta-issue, maybe there'll really be nothing to say about the next technical debt pushes? :onricdennat: Next up though: Back to actual progress again, with TH01. Which maybe even ends up pushing that game over the 50% RE mark?

📝 Posted:
🚚 Summary of:
P0113, P0114
Commits:
150d2c6...6204fdd, 6204fdd...967bb8b
💰 Funded by:
Lmocinemod
🏷 Tags:
rec98+ th02+ th03+ th04- build-process+ pipeline+ tasm- blitting+ file-format+ input+

Alright, tooling and technical debt. Shouldn't be really much to talk about… oh, wait, this is still ReC98 :tannedcirno:

For the tooling part, I finished up the remaining ergonomics and error handling for the 📝 sprite converter that Jonathan Campbell contributed two months ago. While I familiarized myself with the tool, I've actually ran into some unreported errors myself, so this was sort of important to me. Still got no command-line help in there, but the error messages can now do that job probably even better, since we would have had to write them anyway.

So, what's up with the technical debt then? Well, by now we've accumulated quite a number of 📝 ASM code slices that need to be either decompiled or clearly marked as undecompilable. Since we define those slices as "already reverse-engineered", that decision won't affect the numbers on the front page at all. But for a complete decompilation, we'd still have to do this someday. So, rather than incorporating this work into pushes that were purchased with the expectation of measurable progress in a certain area, let's take the "anything goes" pushes, and focus entirely on that during them.

The second code segment seemed like the best place to start with this, since it affects the largest number of games simultaneously. Starting with TH02, this segment contains a set of random "core" functions needed by the binary. Image formats, sounds, input, math, it's all there in some capacity. You could maybe call it all "libzun" or something like that? But for the time being, I simply went with the obvious name, seg2. Maybe I'll come up with something more convincing in the future.


Oh, but wait, why were we assembling all the previous undecompilable ASM translation units in the 16-bit build part? By moving those to the 32-bit part, we don't even need a 16-bit TASM in our list of dependencies, as long as our build process is not fully 16-bit.
And with that, ReC98 now also builds on Windows 95, and thus, every 32-bit Windows version. 🎉 Which is certainly the most user-visible improvement in all of these two pushes. :onricdennat:


Back in 2015, I already decompiled all of TH02's seg2 functions. As suggested by the Borland compiler, I tried to follow a "one translation unit per segment" layout, bundling the binary-specific contents via #include. In the end, it required two translation units – and that was even after manually inserting the original padding bytes via #pragma codestring… yuck. But it worked, compiled, and kept the linker's job (and, by extension, segmentation worries) to a minimum. And as long as it all matched the original binaries, it still counted as a valid reconstruction of ZUN's code. :zunpet:

However, that idea ultimately falls apart once TH03 starts mixing undecompilable ASM code inbetween C functions. Now, we officially have no choice but to use multiple C and ASM translation units, with maybe only just one or two #includes in them…

…or we finally start reconstructing the actual seg2 library, turning every sequence of related functions into its own translation unit. This way, we can simply reuse the once-compiled .OBJ files for all the binaries those functions appear in, without requiring that additional layer of translation units mirroring the original segmentation.
The best example for this is TH03's almost undecompilable function that generates a lookup table for horizontally flipping 8 1bpp pixels. It's part of every binary since TH03, but only used in that game. With the previous approach, we would have had to add 9 C translation units, which would all have just #included that one file. Now, we simply put the .OBJ file into the correct place on the linker command line, as soon as we can.

💡 And suddenly, the linker just inserts the correct padding bytes itself.

The most immediate gains there also happened to come from TH03. Which is also where we did get some tiny RE% and PI% gains out of this after all, by reverse-engineering some of its sprite blitting setup code. Sure, I should have done even more RE here, to also cover those 5 functions at the end of code segment #2 in TH03's MAIN.EXE that were in front of a number of library functions I already covered in this push. But let's leave that to an actual RE push 😛


All in all though, I was just getting started with this; the real gains in terms of removed ASM files are still to come. But in the meantime, the funding situation has become even better in terms of allowing me to focus on things nobody asked for. 🙂 So here's a slightly better idea: Instead of spending two more pushes on this, let's shoot for TH05 MAINE.EXE position independence next. If I manage to get it done, we'll have a 100% position-independent TH05 by the time -Tom- finishes his MAIN.EXE PI demo, rather than the 94% we'd get from just MAIN.EXE. That's bound to make a much better impression on all the people who will then (re-)discover the project.

📝 Posted:
🚚 Summary of:
P0109
Commits:
dcf4e2c...2c7d86b
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:
rec98+ th04- th05+ gameplay+ bullet+ micro-optimization+ glitch+ uth05win+ tasm-

Back to TH05! Thanks to the good funding situation, I can strike a nice balance between getting TH05 position-independent as quickly as possible, and properly reverse-engineering some missing important parts of the game. Once 100% PI will get the attention of modders, the code will then be in better shape, and a bit more usable than if I just rushed that goal.

By now, I'm apparently also pretty spoiled by TH01's immediate decompilability, after having worked on that game for so long. Reverse-engineering in ASM land is pretty annoying, after all, since it basically boils down to meticulously editing a piece of ASM into something I can confidently call "reverse-engineered". Most of the time, simply decompiling that piece of code would take just a little bit longer, but be massively more useful. So, I immediately tried decompiling with TH05… and it just worked, at every place I tried!? Whatever the issue was that made 📝 segment splitting so annoying at my first attempt, I seem to have completely solved it in the meantime. 🤷 So yeah, backers can now request pretty much any part of TH04 and TH05 to be decompiled immediately, with no additional segment splitting cost.

(Protip for everyone interested in starting their own ReC project: Just declare one segment per function, right from the start, then group them together to restore the original code segmentation…)


Except that TH05 then just throws more of its infamous micro-optimized and undecompilable ASM at you. 🙄 This push covered the function that adjusts the bullet group template based on rank and the selected difficulty, called every time such a group is configured. Which, just like pretty much all of TH05's bullet spawning code, is one of those undecompilable functions. If C allowed labels of other functions as goto targets, it might have been decompilable into something useful to modders… maybe. But like this, there's no point in even trying.

This is such a terrible idea from a software architecture point of view, I can't even. Because now, you suddenly have to mirror your C++ declarations in ASM land, and keep them in sync with each other. I'm always happy when I get to delete an ASM declaration from the codebase once I've decompiled all the instances where it was referenced. But for TH05, we now have to keep those declarations around forever. 😕 And all that for a performance increase you probably couldn't even measure. Oh well, pulling off Galaxy Brain-level ASM optimizations is kind of fun if you don't have portability plans… I guess?

If I started a full fangame mod of a PC-98 Touhou game, I'd base it on TH04 rather than TH05, and backport selected features from TH05 as needed. Just because it was released later doesn't make it better, and this is by far not the only one of ZUN's micro-optimizations that just went way too far.

Dropping down to ASM also makes it easier to introduce weird quirks. Decompiled, one of TH05's tuning conditions for stack groups on Easy Mode would look something like:

case BP_STACK:
	// […]
	if(spread_angle_delta >= 2) {
		stack_bullet_count--;
	}

The fields of the bullet group template aren't typically reset when setting up a new group. So, spread_angle_delta in the context of a stack group effectively refers to "the delta angle of the last spread group that was fired before this stack – whenever that was". uth05win also spotted this quirk, considered it a bug, and wrote fanfiction by changing spread_angle_delta to stack_bullet_count.
As usual for functions that occur in more than one game, I also decompiled the TH04 bullet group tuning function, and it's perfectly sane, with no such quirks.


In the more PI-focused parts of this push, we got the TH05-exclusive smooth boss movement functions, for flying randomly or towards a given point. Pretty unspectacular for the most part, but we've got yet another uth05win inconsistency in the latter one. Once the Y coordinate gets close enough to the target point, it actually speeds up twice as much as the X coordinate would, whereas uth05win used the same speedup factors for both. This might make uth05win a couple of frames slower in all boss fights from Stage 3 on. Hard to measure though – and boss movement partly depends on RNG anyway.


Next up: Shinki's background animations – which are actually the single biggest source of position dependence left in TH05.

📝 Posted:
🚚 Summary of:
P0031, P0032, P0033
Commits:
dea40ad...9f764fa, 9f764fa...e6294c2, e6294c2...6cdd229
💰 Funded by:
zorg
🏷 Tags:
rec98+ th02+ th04- th05+ file-format+ hud+ score+ tasm- tcc+ micro-optimization+ jank+

The glacial pace continues, with TH05's unnecessarily, inappropriately micro-optimized, and hence, un-decompilable code for rendering the current and high score, as well as the enemy health / dream / power bars. While the latter might still pass as well-written ASM, the former goes to such ridiculous levels that it ends up being technically buggy. If you enjoy quality ZUN code, it's definitely worth a read.

In TH05, this all still is at the end of code segment #1, but in TH04, the same code lies all over the same segment. And since I really wanted to move that code into its final form now, I finally did the research into decompiling from anywhere else in a segment.

Turns out we actually can! It's kinda annoying, though: After splitting the segment after the function we want to decompile, we then need to group the two new segments back together into one "virtual segment" matching the original one. But since all ASM in ReC98 heavily relies on being assembled in MASM mode, we then start to suffer from MASM's group addressing quirk. Which then forces us to manually prefix every single function call

with the group name. It's stupidly boring busywork, because of all the function calls you mustn't prefix. Special tooling might make this easier, but I don't have it, and I'm not getting crowdfunded for it.

So while you now definitely can request any specific thing in any of the 5 games to be decompiled right now, it will take slightly longer, and cost slightly more.
(Except for that one big segment in TH04, of course.)

Only one function away from the TH05 shot type control functions now!