⮜ Blog

⮜ List of tags

Showing all posts tagged portability-

📝 Posted:
🚚 Summary of:
💰 Funded by:
🏷 Tags:
rec98+ th05+ blitting+ portability- micro-optimization+ jank+ tasm+ tcc+

Technical debt, part 5… and we only got TH05's stupidly optimized .PI functions this time?

As far as actual progress is concerned, that is. In maintenance news though, I was really hyped for the #include improvements I've mentioned in 📝 the last post. The result: A new x86real.h file, bundling all the declarations specific to the 16-bit x86 Real Mode in a smaller file than Turbo C++'s own DOS.H. After all, DOS is something else than the underlying CPU. And while it didn't speed up build times quite as much as I had hoped, it now clearly indicates the x86-specific parts of PC-98 Touhou code to future port authors.

After another couple of improvements to parameter declaration in ASM land, we get to TH05's .PI functions… and really, why did ZUN write all of them in ASM? Why (re)declare all the necessary structures and data in ASM land, when all these functions are merely one layer of abstraction above master.lib, which does all the actual work?
I get that ZUN might have wanted masked blitting to be faster, which is used for the fade-in effect seen during TH05's main menu animation and the ending artwork. But, uh… he knew how to modify master.lib. In fact, he did already modify the graph_pack_put_8() function used for rendering a single .PI image row, to ignore master.lib's VRAM clipping region. For this effect though, he first blits each row regularly to the invisible 400th row of VRAM, and then does an EGC-accelerated VRAM-to-VRAM blit of that row to its actual target position with the mask enabled. It would have been way more efficient to add another version of this function that takes a mask pattern. No amount of REP MOVSW is going to change the fact that two VRAM writes per line are slower than a single one. Not to mention that it doesn't justify writing every other .PI function in ASM to go along with it…
This is where we also find the most hilarious aspect about this: For most of ZUN's pointless micro-optimizations, you could have maybe made the argument that they do save some CPU cycles here and there, and therefore did something positive to the final, PC-98-exclusive result. But some of the hand-written ASM here doesn't even constitute a micro-optimization, because it's worse than what you would have got out of even Turbo C++ 4.0J with its 80386 optimization flags! :zunpet:

At least it was possible to "decompile" 6 out of the 10 functions here, making them easy to clean up for future modders and port authors. Could have been 7 functions if I also decided to "decompile" pi_free(), but all the C++ code is already surrounded by ASM, resulting in 2 ASM translation units and 2 C++ translation units. pi_free() would have needed a single translation unit by itself, which wasn't worth it, given that I would have had to spell out every single ASM instruction anyway.

void pascal pi_free(int slot)
	if(pi_buffers[slot]) {
		graph_pi_free(&pi_headers[slot], &pi_buffers[slot]);
		pi_buffers[slot] = NULL;

There you go. What about this needed to be written in ASM?!?

The function calls between these small translation units even seemed to glitch out TASM and the linker in the end, leading to one CALL offset being weirdly shifted by 32 bytes. Usually, TLINK reports a fixup overflow error when this happens, but this time it didn't, for some reason? Mirroring the segment grouping in the affected translation unit did solve the problem, and I already knew this, but only thought of it after spending quite some RTFM time… during which I discovered the -lE switch, which enables TLINK to use the expanded dictionaries in Borland's .OBJ and .LIB files to speed up linking. That shaved off roughly another second from the build time of the complete ReC98 repository. The more you know… Binary blobs compiled with non-Borland tools would be the only reason not to use this flag.

So, even more slowdown with this 5th dedicated push, since we've still only repaid 41% of the technical debt in the SHARED segment so far. Next up: Part 6, which hopefully manages to decompile the FM and SSG channel animations in TH05's Music Room, and hopefully ends up being the final one of the slow ones.

📝 Posted:
🚚 Summary of:
P0088, P0089
97ce7b7...da6b856, da6b856...90252cc
💰 Funded by:
-Tom-, [Anonymous], Blue Bolt
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay+ enemy+ hud+ animation+ portability-

As expected, we've now got the TH04 and TH05 stage enemy structure, finishing position independence for all big entity types. This one was quite straightfoward, as the .STD scripting system is pretty simple.

Its most interesting aspect can be found in the way timing is handled. In Windows Touhou, all .ECL script instructions come with a frame field that defines when they are executed. In TH04's and TH05's .STD scripts, on the other hand, it's up to each individual instruction to add a frame time parameter, anywhere in its parameter list. This frame time defines for how long this instruction should be repeatedly executed, before it manually advances the instruction pointer to the next one. From what I've seen so far, these instruction typically apply their effect on the first frame they run on, and then do nothing for the remaining frames.
Oh, and you can't nest the LOOP instruction, since the enemy structure only stores one single counter for the current loop iteration.

Just from the structure, the only innovation introduced by TH05 seems to have been enemy subtypes. These can be used to parametrize scripts via conditional jumps based on this value, as a first attempt at cutting down the need to duplicate entire scripts for similar enemy behavior. And thanks to TH05's favorable segment layout, this game's version of the .STD enemy script interpreter is even immediately ready for decompilation, in one single future push.

As far as I can tell, that now only leaves

  • .MPN file loading
  • player bomb animations
  • some structures specific to the Shinki and EX-Alice battles
  • plus some smaller things I've missed over the years
until TH05's MAIN.EXE is completely position-independent.
Which, however, won't be all it needs for that 100% PI rating on the front page. And with that many false positives, it's quite easy to get lost with immediately reverse-engineering everything around them. This time, the rendering of the text dissolve circles, used for the stage and BGM title popups, caught my eye… and since the high-level code to handle all of that was near the end of a segment in both TH04 and TH05, I just decided to immediately decompile it all. Like, how hard could it possibly be? Sure, it needed another segment split, which was a bit harder due to all the existing ASM referencing code in that segment, but certainly not impossible…

Oh wait, this code depends on 9 other sets of identifiers that haven't been declared in C land before, some of which require vast reorganizations to bring them up to current consistency standards. Whoops! Good thing that this is the part of the project I'm still offering for free…
Among the referenced functions was tiles_invalidate_around(), which marks the stage background tiles within a rectangular area to be redrawn this frame. And this one must have had the hardest function signature to figure out in all of PC-98 Touhou, because it actually seems impossible. Looking at all the ways the game passes the center coordinate to this function, we have

  1. X and Y as 16-bit integer literals, merged into a single PUSH of a 32-bit immediate
  2. X and Y calculated and pushed independently from each other
  3. by-value copies of entire Point instances
Any single declaration would only lead to at most two of the three cases generating the original instructions. No way around separately declaring the function in every translation unit then, with the correct parameter list for the respective calls. That's how ZUN must have also written it.

Oh well, we would have needed to do all of this some time. At least there were quite a bit of insights to be gained from the actual decompilation, where using const references actually made it possible to turn quite a number of potentially ugly macros into wholesome inline functions.

But still, TH04 and TH05 will come out of ReC98's decompilation as one big mess. A lot of further manual decompilation and refactoring, beyond the limits of the original binary, would be needed to make these games portable to any non-PC-98, non-x86 architecture.
And yes, that includes IBM-compatible DOS – which, for some reason, a number of people see as the obvious choice for a first system to port PC-98 Touhou to. This will barely be easier. Sure, you'll save the effort of decompiling all the remaining original ASM. But even with master.lib's MASTER_DOSV setting, these games still very much rely on PC-98 hardware, with corresponding assumptions all over ZUN's code. You will need to provide abstractions for the PC-98's superimposed text mode, the gaiji, and planar 4-bit color access in general, exchanging the use of the PC-98's GRCG and EGC blitter chips with something else. At that point, you might as well port the game to one generic 640×400 framebuffer and away from the constraints of DOS, resulting in that Doom source code-like situation which made that game easily portable to every architecture to begin with. But ZUN just wasn't a John Carmack, sorry.

Or what do I know. I've never programmed for IBM-compatible DOS, but maybe ReC98's audience does include someone who is intimately familiar with IBM-compatible DOS so that the constraints aren't much of an issue for them? But even then, 16-bit Windows would make much more sense as a first porting target if you don't want to bother with that undecompilable ASM.

At least I won't have to look at TH04 and TH05 for quite a while now. :tannedcirno: The delivery delays have made it obvious that my life has become pretty busy again, probably until September. With a total of 9 TH01 pushes from monthly subscriptions now waiting in the backlog, the shop will stay closed until I've caught up with most of these. Which I'm quite hyped for!