⮜ Blog

⮜ List of tags

Showing all posts tagged
and

📝 Posted:
🚚 Summary of:
P0148
Commits:
c940059...e1a26bb
💰 Funded by:
[Anonymous]
🏷 Tags:

Back after taking way too long to get Touhou Patch Center's MediaWiki update feature complete… I'm still waiting for more translators to test and review the new translation interface before delivering and deploying it all, which will most likely lead to another break from ReC98 within the next few months. For now though, I'm happy to have mostly addressed the nagging responsibility I still had after willing that site into existence, and to be back working on ReC98. 🙂

As announced, the next few pushes will focus on TH04's and TH05's bullet spawning code, before I get to put all that accumulated TH01 money towards finishing all of konngara's code in TH01. For a full picture of what's happening with bullets, we'd really also like to have the bullet update function as readable C code though.
Clearing all bullets on the playfield will trigger a Bonus!! popup, displayed as 📝 gaiji in that proportional font. Unfortunately, TLINK refused to link the code as soon as I referenced the function for animating the popups at the top of the playfield? Which can only mean that we have to decompile that function first… :thonk:

So, let's turn that piece of technical debt into a full push, and first decompile another random set of previously reverse-engineered TH04 and TH05 functions. Most of these are stored in a different place within the two MAIN.EXE binaries, and the tried-and-true method of matching segment names would therefore have introduced several unnecessary translation units. So I resorted to a segment splitting technique I should have started using way earlier: Simply creating new segments with names derived from their functions, at the exact positions they're needed. All the new segment start and end directives do bloat the ASM code somewhat, and certainly contributed to this push barely removing any actual lines of code. However, what we get in return is total freedom as far as decompilation order is concerned, 📝 which should be the case for any ReC project, really. And in the end, all these tiny code segments will cancel out anyway.
If only we could do the same with the data segment…


The popup function happened to be the final one I RE'd before my long break in the spring of 2019. Back then, I didn't even bother looking into that 64-frame delay between changing popups, and what that meant for the game.
Each of these popups stays on screen for 128 frames, during which, of course, another popup-worthy event might happen. Handling this cleanly without removing previous popups too early would involve some sort of event queue, whose size might even be meaningfully limited to the number of distinct events that can happen. But still, that'd be a data structure, and we're not gonna have that! :zunpet: Instead, ZUN simply keeps two variables for the new and current popup ID. During an active popup, any change to that ID will only be committed once the current popup has been shown for at least 64 frames. And during that time, that new ID can be freely overwritten with a different one, which drops any previous, undisplayed event. But surely, there won't be more than two events happening within 63 frames, right? :tannedcirno:

The rest was fairly uneventful – no newly RE'd functions in this push, after all – until I reached the widely used helper function for applying the current vertical scrolling offset to a Y coordinate. Its combination of a function parameter, the pascal calling convention, and no stack frame was previously thought to be undecompilable… except that it isn't, and the decompilation didn't even require any new workarounds to be developed? Good thing that I already forgot how impossible it was to decompile the first function I looked at that fell into this category!
Oh well, this discovery wasn't too groundbreaking. Looking back at all the other functions with that combination only revealed a grand total of 1 additional one where a decompilation made sense: TH05's version of snd_kaja_interrupt(), which is now compiled from the same C++ file for all 4 games that use it. And well, looks like some quirks really remain unnoticed and undocumented until you look at a function for the 11th time: Its return value is undefined if BGM is inactive – that is, if the user disabled it, or if no FM board is installed. Not that it matters for the original code, which never uses this function to retrieve anything from KAJA's drivers. But people apparently do copy ReC98 code into their own projects, so it is something to keep in mind.


All in all, nothing quite at jank level in this one, but we were surely grazing that tag. Next up, with that out of the way: The bullet update/step function! Very soon in fact, since I've mostly got it done already.

📝 Posted:
🚚 Summary of:
P0118
Commits:
0bb5bc3...cbf14eb
💰 Funded by:
-Tom-, Ember2528
🏷 Tags:

🎉 TH05 is finally fully position-independent! 🎉 To celebrate this milestone, -Tom- coded a little demo, which we recorded on both an emulator and on real PC-98 hardware:

For all the new people who are unfamiliar with PC-98 Touhou internals: Boss behavior is hardcoded into MAIN.EXE, rather than being scriptable via separate .ECL files like in Windows Touhou. That's what makes this kind of a big deal.


What does this mean?

You can now freely add or remove both data and code anywhere in TH05, by editing the ReC98 codebase, writing your mod in ASM or C/C++, and recompiling the code. Since all absolute memory addresses have now been converted to labels, this will work without causing any instability. See the position independence section in the FAQ for a more thorough explanation about why this was a problem.

By extension, this also means that it's now theoretically possible to use a different compiler on the source code. But:

What does this not mean?

The original ZUN code hasn't been completely reverse-engineered yet, let alone decompiled. As the final PC-98 Touhou game, TH05 also happens to have the largest amount of actual ZUN-written ASM that can't ever be decompiled within ReC98's constraints of a legit source code reconstruction. But a lot of the originally-in-C code is also still in ASM, which might make modding a bit inconvenient right now. And while I have decompiled a bunch of functions, I selected them largely because they would help with PI (as requested by the backers), and not because they are particularly relevant to typical modding interests.

As a result, the code might also be a bit confusingly organized. There's quite a conflict between various goals there: On the one hand, I'd like to only have a single instance of every function shared with earlier games, as well as reduce ZUN's code duplication within a single game. On the other hand, this leads to quite a lot of code being scattered all over the place and then #include-pasted back together, except for the places where 📝 this doesn't work, and you'd have to use multiple translation units anyway… I'm only beginning to figure out the best structure here, and some more reverse-engineering attention surely won't hurt.

Also, keep in mind that the code still targets x86 Real Mode. To work effectively in this codebase, you'd need some familiarity with memory segmentation, and how to express it all in code. This tends to make even regular C++ development about an order of magnitude harder, especially once you want to interface with the remaining ASM code. That part made -Tom- struggle quite a bit with implementing his custom scripting language for the demo above. For now, he built that demo on quite a limited foundation – which is why he also chose to release neither the build nor the source publically for the time being.
So yeah, you're definitely going to need the TASM and Borland C++ manuals there.

tl;dr: We now know everything about this game's data, but not quite as much about this game's code.

So, how long until source ports become a realistic project?

You probably want to wait for 100% RE, which is when everything that can be decompiled has been decompiled.

Unless your target system is 16-bit Windows, in which case you could theoretically start right away. 📝 Again, this would be the ideal first system to port PC-98 Touhou to: It would require all the generic portability work to remove the dependency on PC-98 hardware, thus paving the way for a subsequent port to modern systems, yet you could still just drop in any undecompiled ASM.

Porting to IBM-compatible DOS would only be a harder and less universally useful version of that. You'd then simply exchange one architecture, with its idiosyncrasies and limits, for another, with its own set of idiosyncrasies and limits. (Unless, of course, you already happen to be intimately familiar with that architecture.) The fact that master.lib provides DOS/V support would have only mattered if ZUN consistently used it to abstract away PC-98 hardware at every single place in the code, which is definitely not the case.


The list of actually interesting findings in this push is, 📝 again, very short. Probably the most notable discovery: The low-level part of the code that renders Marisa's laser from her TH04 Illusion Laser shot type is still present in TH05. Insert wild mass guessing about potential beta version shot types… Oh, and did you know that the order of background images in the Extra Stage staff roll differs by character?

Next up: Finally driving up the RE% bar again, by decompiling some TH05 main menu code.

📝 Posted:
🚚 Summary of:
P0088, P0089
Commits:
97ce7b7...da6b856, da6b856...90252cc
💰 Funded by:
-Tom-, [Anonymous], Blue Bolt
🏷 Tags:

As expected, we've now got the TH04 and TH05 stage enemy structure, finishing position independence for all big entity types. This one was quite straightfoward, as the .STD scripting system is pretty simple.

Its most interesting aspect can be found in the way timing is handled. In Windows Touhou, all .ECL script instructions come with a frame field that defines when they are executed. In TH04's and TH05's .STD scripts, on the other hand, it's up to each individual instruction to add a frame time parameter, anywhere in its parameter list. This frame time defines for how long this instruction should be repeatedly executed, before it manually advances the instruction pointer to the next one. From what I've seen so far, these instruction typically apply their effect on the first frame they run on, and then do nothing for the remaining frames.
Oh, and you can't nest the LOOP instruction, since the enemy structure only stores one single counter for the current loop iteration.

Just from the structure, the only innovation introduced by TH05 seems to have been enemy subtypes. These can be used to parametrize scripts via conditional jumps based on this value, as a first attempt at cutting down the need to duplicate entire scripts for similar enemy behavior. And thanks to TH05's favorable segment layout, this game's version of the .STD enemy script interpreter is even immediately ready for decompilation, in one single future push.

As far as I can tell, that now only leaves

until TH05's MAIN.EXE is completely position-independent.
Which, however, won't be all it needs for that 100% PI rating on the front page. And with that many false positives, it's quite easy to get lost with immediately reverse-engineering everything around them. This time, the rendering of the text dissolve circles, used for the stage and BGM title popups, caught my eye… and since the high-level code to handle all of that was near the end of a segment in both TH04 and TH05, I just decided to immediately decompile it all. Like, how hard could it possibly be? Sure, it needed another segment split, which was a bit harder due to all the existing ASM referencing code in that segment, but certainly not impossible…

Oh wait, this code depends on 9 other sets of identifiers that haven't been declared in C land before, some of which require vast reorganizations to bring them up to current consistency standards. Whoops! Good thing that this is the part of the project I'm still offering for free…
Among the referenced functions was tiles_invalidate_around(), which marks the stage background tiles within a rectangular area to be redrawn this frame. And this one must have had the hardest function signature to figure out in all of PC-98 Touhou, because it actually seems impossible. Looking at all the ways the game passes the center coordinate to this function, we have

  1. X and Y as 16-bit integer literals, merged into a single PUSH of a 32-bit immediate
  2. X and Y calculated and pushed independently from each other
  3. by-value copies of entire Point instances

Any single declaration would only lead to at most two of the three cases generating the original instructions. No way around separately declaring the function in every translation unit then, with the correct parameter list for the respective calls. That's how ZUN must have also written it.

Oh well, we would have needed to do all of this some time. At least there were quite a bit of insights to be gained from the actual decompilation, where using const references actually made it possible to turn quite a number of potentially ugly macros into wholesome inline functions.

But still, TH04 and TH05 will come out of ReC98's decompilation as one big mess. A lot of further manual decompilation and refactoring, beyond the limits of the original binary, would be needed to make these games portable to any non-PC-98, non-x86 architecture.
And yes, that includes IBM-compatible DOS – which, for some reason, a number of people see as the obvious choice for a first system to port PC-98 Touhou to. This will barely be easier. Sure, you'll save the effort of decompiling all the remaining original ASM. But even with master.lib's MASTER_DOSV setting, these games still very much rely on PC-98 hardware, with corresponding assumptions all over ZUN's code. You will need to provide abstractions for the PC-98's superimposed text mode, the gaiji, and planar 4-bit color access in general, exchanging the use of the PC-98's GRCG and EGC blitter chips with something else. At that point, you might as well port the game to one generic 640×400 framebuffer and away from the constraints of DOS, resulting in that Doom source code-like situation which made that game easily portable to every architecture to begin with. But ZUN just wasn't a John Carmack, sorry.

Or what do I know. I've never programmed for IBM-compatible DOS, but maybe ReC98's audience does include someone who is intimately familiar with IBM-compatible DOS so that the constraints aren't much of an issue for them? But even then, 16-bit Windows would make much more sense as a first porting target if you don't want to bother with that undecompilable ASM.

At least I won't have to look at TH04 and TH05 for quite a while now. :tannedcirno: The delivery delays have made it obvious that my life has become pretty busy again, probably until September. With a total of 9 TH01 pushes from monthly subscriptions now waiting in the backlog, the shop will stay closed until I've caught up with most of these. Which I'm quite hyped for!

📝 Posted:
🚚 Summary of:
P0086, P0087
Commits:
54ee99b...24b96cd, 24b96cd...97ce7b7
💰 Funded by:
[Anonymous], Blue Bolt, -Tom-
🏷 Tags:

Alright, the score popup numbers shown when collecting items or defeating (mid)bosses. The second-to-last remaining big entity type in TH05… with quite some PI false positives in the memory range occupied by its data. Good thing I still got some outstanding generic RE pushes that haven't been claimed for anything more specific in over a month! These conveniently allowed me to RE most of these functions right away, the right way.

Most of the false positives were boss HP values, passed to a "boss phase end" function which sets the HP value at which the next phase should end. Stage 6 Yuuka, Mugetsu, and EX-Alice have their own copies of this function, in which they also reset certain boss-specific global variables. Since I always like to cover all varieties of such duplicated functions at once, it made sense to reverse-engineer all the involved variables while I was at it… and that's why this was exactly the right time to cover the implementation details of Stage 6 Yuuka's parasol and vanishing animations in TH04. :zunpet:

With still a bit of time left in that RE push afterwards, I could also start looking into some of the smaller functions that didn't quite fit into other pushes. The most notable one there was a simple function that aims from any point to the current player position. Which actually only became a separate function in TH05, probably since it's called 27 times in total. That's 27 places no longer being blocked from further RE progress.

WindowsTiger already did most of the work for the score popup numbers in January, which meant that I only had to review it and bring it up to ReC98's current coding styles and standards. This one turned out to be one of those rare features whose TH05 implementation is significantly less insane than the TH04 one. Both games lazily redraw only the tiles of the stage background that were drawn over in the previous frame, and try their best to minimize the amount of tiles to be redrawn in this way. For these popup numbers, this involves calculating the on-screen width, based on the exact number of digits in the point value. TH04 calculates this width every frame during the rendering function, and even resorts to setting that field through the digit iteration pointer via self-modifying code… yup. TH05, on the other hand, simply calculates the width once when spawning a new popup number, during the conversion of the point value to binary-coded decimal. The "×2" multiplier suffix being removed in TH05 certainly also helped in simplifying that feature in this game.

And that's ⅓ of TH05 reverse-engineered! Next up, one more TH05 PI push, in which the stage enemies hopefully finish all the big entity types. Maybe it will also be accompanied by another RE push? In any case, that will be the last piece of TH05 progress for quite some time. The next TH01 stretch will consist of 6 pushes at the very least, and I currently have no idea of how much time I can spend on ReC98 a month from now…

📝 Posted:
🚚 Summary of:
P0031, P0032, P0033
Commits:
dea40ad...9f764fa, 9f764fa...e6294c2, e6294c2...6cdd229
💰 Funded by:
zorg
🏷 Tags:

The glacial pace continues, with TH05's unnecessarily, inappropriately micro-optimized, and hence, un-decompilable code for rendering the current and high score, as well as the enemy health / dream / power bars. While the latter might still pass as well-written ASM, the former goes to such ridiculous levels that it ends up being technically buggy. If you enjoy quality ZUN code, it's definitely worth a read.

In TH05, this all still is at the end of code segment #1, but in TH04, the same code lies all over the same segment. And since I really wanted to move that code into its final form now, I finally did the research into decompiling from anywhere else in a segment.

Turns out we actually can! It's kinda annoying, though: After splitting the segment after the function we want to decompile, we then need to group the two new segments back together into one "virtual segment" matching the original one. But since all ASM in ReC98 heavily relies on being assembled in MASM mode, we then start to suffer from MASM's group addressing quirk. Which then forces us to manually prefix every single function call

with the group name. It's stupidly boring busywork, because of all the function calls you mustn't prefix. Special tooling might make this easier, but I don't have it, and I'm not getting crowdfunded for it.

So while you now definitely can request any specific thing in any of the 5 games to be decompiled right now, it will take slightly longer, and cost slightly more.
(Except for that one big segment in TH04, of course.)

Only one function away from the TH05 shot type control functions now!

📝 Posted:
🚚 Summary of:
P0051, P0052, P0053
Commits:
6ed8e60...3ba536a
💰 Funded by:
-Tom-
🏷 Tags:

Boss explosions! And… urgh, I really also had to wade through that overly complicated HUD rendering code. Even though I had to pick -Tom-'s 7th push here as well, the worst of that is still to come. TH04 and TH05 exclusively store the current and high score internally as unpacked little-endian BCD, with some pretty dense ASM code involving the venerable x86 BCD instructions to update it.

So, what's actually the goal here. Since I was given no priorities :onricdennat:, I still haven't had to (potentially) waste time researching whether we really can decompile from anywhere else inside a segment other than backwards from the end. So, the most efficient place for decompilation right now still is the end of TH05's main_01_TEXT segment. With maybe 1 or 2 more reverse-engineering commits, we'd have everything for an efficient decompilation up to sub_123AD. And that mass of code just happens to include all the shot type control functions, and makes up 3,007 instructions in total, or 12% of the entire remaining unknown code in MAIN.EXE.

So, the most reasonable thing would be to actually put some of the upcoming decompilation pushes towards reverse-engineering that missing part. I don't think that's a bad deal since it will allow us to mod TH05 shot types in C sooner, but zorg and qp might disagree :thonk:

Next up: thcrap TL notes, followed by finally finishing GhostPhanom's old ReC98 future-proofing pushes. I really don't want to decompile without a proper build system.