⮜ Blog

⮜ List of tags

Showing all posts tagged tcc-, rec98- and gameplay-

📝 Posted:
🚚 Summary of:
P0165, P0166, P0167
Commits:
7a0e5d8...f2bca01, f2bca01...e697907, e697907...c2de6ab
💰 Funded by:
Ember2528
🏷 Tags:
rec98- th01+ gameplay- animation+ blitting+ bullet+ glitch+ waste+ boss+ singyoku+ mima+ elis+ tcc- menu+

OK, TH01 missile bullets. Can we maybe have a well-behaved entity type, without any weirdness? Just once?

Ehh, kinda. Apart from another 150 bytes wasted on unused structure members, this code is indeed more on the low end in terms of overall jank. It does become very obvious why dodging these missiles in the YuugenMagan, Mima, and Elis fights feels so awful though: An unfair 46×46 pixel hitbox around Reimu's center pixel, combined with the comeback of 📝 interlaced rendering, this time in every stage. ZUN probably did this because missiles are the only 16×16 sprite in TH01 that is blitted to unaligned X positions, which effectively ends up touching a 32×16 area of VRAM per sprite.
But even if we assume VRAM writes to be the bottleneck here, it would have been totally possible to render every missile in every frame at roughly the same amount of CPU time that the original game uses for interlaced rendering:

  • Note that all missile sprites only use two colors, white and green.
  • Instead of naively going with the usual four bitplanes, extract the pixels drawn in each of the two used colors into their own bitplanes. master.lib calls this the "tiny format".
  • Use the GRCG to draw these two bitplanes in the intended white and green colors, halving the amount of VRAM writes compared to the original function.
  • (Not using the .PTN format would have also avoided the inconsistency of storing the missile sprites in boss-specific sprite slots.)
That's an optimization that would have significantly benefitted the game, in contrast to all of the fake ones introduced in later games. Then again, this optimization is actually something that the later games do, and it might have in fact been necessary to achieve their higher bullet counts without significant slowdown.

Unfortunately, it was only worth decompiling half of the missile code right now, thanks to gratuitous FPU usage in the other half, where 📝 double variables are compared to float literals. That one will have to wait 📝 until after SinGyoku.


After some effectively unused Mima sprite effect code that is so broken that it's impossible to make sense out of it, we get to the final feature I wanted to cover for all bosses in parallel before returning to Sariel: The separate sprite background storage for moving or animated boss sprites in the Mima, Elis, and Sariel fights. But, uh… why is this necessary to begin with? Doesn't TH01 already reserve the other VRAM page for backgrounds?
Well, these sprites are quite big, and ZUN didn't want to blit them from main memory on every frame. After all, TH01 and TH02 had a minimum required clock speed of 33 MHz, half of the speed required for the later three games. So, he simply blitted these boss sprites to both VRAM pages, leading the usual unblitting calls to only remove the other sprites on top of the boss. However, these bosses themselves want to move across the screen… and this makes it necessary to save the stage background behind them in some other way.

Enter .PTN, and its functions to capture a 16×16 or 32×32 square from VRAM into a sprite slot. No problem with that approach in theory, as the size of all these bigger sprites is a multiple of 32×32; splitting a larger sprite into these smaller 32×32 chunks makes the code look just a little bit clumsy (and, of course, slower).
But somewhere during the development of Mima's fight, ZUN apparently forgot that those sprite backgrounds existed. And once Mima's 🚫 casting sprite is blitted on top of her regular sprite, using just regular sprite transparency, she ends up with her infamous third arm:

Ironically, there's an unused code path in Mima's unblit function where ZUN assumes a height of 48 pixels for Mima's animation sprites rather than the actual 64. This leads to even clumsier .PTN function calls for the bottom 128×16 pixels… Failing to unblit the bottom 16 pixels would have also yielded that third arm, although it wouldn't have looked as natural. Still wouldn't say that it was intentional; maybe this casting sprite was just added pretty late in the game's development?


So, mission accomplished, Sariel unblocked… at 2¼ pushes. :thonk: That's quite some time left for some smaller stage initialization code, which bundles a bunch of random function calls in places where they logically really don't belong. The stage opening animation then adds a bunch of VRAM inter-page copies that are not only redundant but can't even be understood without knowing the hidden internal state of the last VRAM page accessed by previous ZUN code…
In better news though: Turbo C++ 4.0 really doesn't seem to have any complexity limit on inlining arithmetic expressions, as long as they only operate on compile-time constants. That's how we get macro-free, compile-time Shift-JIS to JIS X 0208 conversion of the individual code points in the 東方★靈異伝 string, in a compiler from 1994. As long as you don't store any intermediate results in variables, that is… :tannedcirno:

But wait, there's more! With still ¼ of a push left, I also went for the boss defeat animation, which includes the route selection after the SinGyoku fight.
As in all other instances, the 2× scaled font is accomplished by first rendering the text at regular 1× resolution to the other, invisible VRAM page, and then scaled from there to the visible one. However, the route selection is unique in that its scaled text is both drawn transparently on top of the stage background (not onto a black one), and can also change colors depending on the selection. It would have been no problem to unblit and reblit the text by rendering the 1× version to a position on the invisible VRAM page that isn't covered by the 2× version on the visible one, but ZUN (needlessly) clears the invisible page before rendering any text. :zunpet: Instead, he assigned a separate VRAM color for both the 魔界 and 地獄 options, and only changed the palette value for these colors to white or gray, depending on the correct selection. This is another one of the 📝 rare cases where TH01 demonstrates good use of PC-98 hardware, as the 魔界へ and 地獄へ strings don't need to be reblitted during the selection process, only the Orb "cursor" does.

Then, why does this still not count as good-code? When changing palette colors, you kinda need to be aware of everything else that can possibly be on screen, which colors are used there, and which aren't and can therefore be used for such an effect without affecting other sprites. In this case, well… hover over the image below, and notice how Reimu's hair and the bomb sprites in the HUD light up when Makai is selected:

This push did end on a high note though, with the generic, non-SinGyoku version of the defeat animation being an easily parametrizable copy. And that's how you decompile another 2.58% of TH01 in just slightly over three pushes.


Now, we're not only ready to decompile Sariel, but also Kikuri, Elis, and SinGyoku without needing any more detours into non-boss code. Thanks to the current TH01 funding subscriptions, I can plan to cover most, if not all, of Sariel in a single push series, but the currently 3 pending pushes probably won't suffice for Sariel's 8.10% of all remaining code in TH01. We've got quite a lot of not specifically TH01-related funds in the backlog to pass the time though.

Due to recent developments, it actually makes quite a lot of sense to take a break from TH01: spaztron64 has managed what every Touhou download site so far has failed to do: Bundling all 5 game onto a single .HDI together with pre-configured PC-98 emulators and a nice boot menu, and hosting the resulting package on a proper website. While this first release is already quite good (and much better than my attempt from 2014), there is still a bit of room for improvement to be gained from specific ReC98 research. Next up, therefore:

  • Researching how TH04 and TH05 use EMS memory, together with the cause behind TH04's crash in Stage 5 when playing as Reimu without an EMS driver loaded, and
  • reverse-engineering TH03's score data file format (YUME.NEM), which hopefully also comes with a way of building a file that unlocks all characters without any high scores.

📝 Posted:
🚚 Summary of:
P0160, P0161
Commits:
e491cd7...42ba4a5, 42ba4a5...81dd96e
💰 Funded by:
Yanga, [Anonymous]
🏷 Tags:
rec98- th01+ gameplay- resident+ bullet+ boss+ th01-mima+ animation+ waste+ glitch+ tcc-

Nothing really noteworthy in TH01's stage timer code, just yet another HUD element that is needlessly drawn into VRAM. Sure, ZUN applies his custom boldfacing effect on top of the glyphs retrieved from font ROM, but he could have easily installed those modified glyphs as gaiji.
Well, OK, halfwidth gaiji aren't exactly well documented, and sometimes not even correctly emulated 📝 due to the same PC-98 hardware oddity I was researching last month. I've reserved two of the pending anonymous "anything" pushes for the conclusion of this research, just in case you were wondering why the outstanding workload is now lower after the two delivered here.

And since it doesn't seem to be clearly documented elsewhere: Every 2 ticks on the stage timer correspond to 4 frames.


So, TH01 rank pellet speed. The resident pellet speed value is a factor ranging from a minimum of -0.375 up to a maximum of 0.5 (pixels per frame), multiplied with the difficulty-adjusted base speed for each pellet and added on top of that same speed. This multiplier is modified

  • every time the stage timer reaches 0 and HARRY UP is shown (+0.05)
  • for every score-based extra life granted below the maximum number of lives (+0.025)
  • every time a bomb is used (+0.025)
  • on every frame in which the rand value (shown in debug mode) is evenly divisible by (1800 - (lives × 200) - (bombs × 50)) (+0.025)
  • every time Reimu got hit (set to 0 if higher, then -0.05)
  • when using a continue (set to -0.05 if higher, then -0.125)
Apparently, ZUN noted that these deltas couldn't be losslessly stored in an IEEE 754 floating-point variable, and therefore didn't store the pellet speed factor exactly in a way that would correspond to its gameplay effect. Instead, it's stored similar to Q12.4 subpixels: as a simple integer, pre-multiplied by 40. This results in a raw range of -15 to 20, which is what the undecompiled ASM calls still use. When spawning a new pellet, its base speed is first multiplied by that factor, and then divided by 40 again. This is actually quite smart: The calculation doesn't need to be aware of either Q12.4 or the 40× format, as ((Q12.4 * factor×40) / factor×40) still comes out as a Q12.4 subpixel even if all numbers are integers. The only limiting issue here would be the potential overflow of the 16-bit multiplication at unadjusted base speeds of more than 50 pixels per frame, but that'd be seriously unplayable.
So yeah, pellet speed modifications are indeed gradual, and don't just fall into the coarse three "high, normal, and low" categories.


That's ⅝ of P0160 done, and the continue and pause menus would make good candidates to fill up the remaining ⅜… except that it seemed impossible to figure out the correct compiler options for this code?
The issues centered around the two effects of Turbo C++ 4.0J's -O switch:

  1. Optimizing jump instructions: merging duplicate successive jumps into a single one, and merging duplicated instructions at the end of conditional branches into a single place under a single branch, which the other branches then jump to
  2. Compressing ADD SP and POP CX stack-clearing instructions after multiple successive CALLs to __cdecl functions into a single ADD SP with the combined parameter stack size of all function calls
But how can the ASM for these functions exhibit #1 but not #2? How can it be seemingly optimized and unoptimized at the same time? The only option that gets somewhat close would be -O- -y, which emits line number information into the .OBJ files for debugging. This combination provides its own kind of #1, but these functions clearly need the real deal.

The research into this issue ended up consuming a full push on its own. In the end, this solution turned out to be completely unrelated to compiler options, and instead came from the effects of a compiler bug in a totally different place. Initializing a local structure instance or array like

const uint4_t flash_colors[3] = { 3, 4, 5 };
always emits the { 3, 4, 5 } array into the program's data segment, and then generates a call to the internal SCOPY@ function which copies this data array to the local variable on the stack. And as soon as this SCOPY@ call is emitted, the -O optimization #1 is disabled for the entire rest of the translation unit?!
So, any code segment with an SCOPY@ call followed by __cdecl functions must strictly be decompiled from top to bottom, mirroring the original layout of translation units. That means no TH01 continue and pause menus before we haven't decompiled the bomb animation, which contains such an SCOPY@ call. 😕
Luckily, TH01 is the only game where this bug leads to significant restrictions in decompilation order, as later games predominantly use the pascal calling convention, in which each function itself clears its stack as part of its RET instruction.


What now, then? With 51% of REIIDEN.EXE decompiled, we're slowly running out of small features that can be decompiled within ⅜ of a push. Good that I haven't been looking a lot into OP.EXE and FUUIN.EXE, which pretty much only got easy pieces of code left to do. Maybe I'll end up finishing their decompilations entirely within these smaller gaps?
I still ended up finding one more small piece in REIIDEN.EXE though: The particle system, seen in the Mima fight.

I like how everything about this animation is contained within a single function that is called once per frame, but ZUN could have really consolidated the spawning code for new particles a bit. In Mima's fight, particles are only spawned from the top and right edges of the screen, but the function in fact contains unused code for all other 7 possible directions, written in quite a bloated manner. This wouldn't feel quite as unused if ZUN had used an angle parameter instead… :thonk: Also, why unnecessarily waste another 40 bytes of the BSS segment?

But wait, what's going on with the very first spawned particle that just stops near the bottom edge of the screen in the video above? Well, even in such a simple and self-contained function, ZUN managed to include an off-by-one error. This one then results in an out-of-bounds array access on the 80th frame, where the code attempts to spawn a 41st particle. If the first particle was unlucky to be both slow enough and spawned away far enough from the bottom and right edges, the spawning code will then kill it off before its unblitting code gets to run, leaving its pixel on the screen until something else overlaps it and causes it to be unblitted.
Which, during regular gameplay, will quickly happen with the Orb, all the pellets flying around, and your own player movement. Also, the RNG can easily spawn this particle at a position and velocity that causes it to leave the screen more quickly. Kind of impressive how ZUN laid out the structure of arrays in a way that ensured practically no effect of this bug on the game; this glitch could have easily happened every 80 frames instead. He almost got close to all bugs canceling out each other here! :godzun:

Next up: The player control functions, including the second-biggest function in all of PC-98 Touhou.

📝 Posted:
🚚 Summary of:
P0149, P0150, P0151, P0152
Commits:
e1a26bb...05e4c4a, 05e4c4a...768251d, 768251d...4d24ca5, 4d24ca5...81fc861
💰 Funded by:
Blue Bolt, Ember2528, -Tom-, [Anonymous]
🏷 Tags:
rec98- th04+ th05+ gameplay- bullet+ animation+ score+ glitch+ jank+ waste+ micro-optimization+ tcc- uth05win+

…or maybe not that soon, as it would have only wasted time to untangle the bullet update commits from the rest of the progress. So, here's all the bullet spawning code in TH04 and TH05 instead. I hope you're ready for this, there's a lot to talk about!

(For the sake of readability, "bullets" in this blog post refers to the white 8×8 pellets and all 16×16 bullets loaded from MIKO16.BFT, nothing else.)


But first, what was going on 📝 in 2020? Spent 4 pushes on the basic types and constants back then, still ended up confusing a couple of things, and even getting some wrong. Like how TH05's "bullet slowdown" flag actually always prevents slowdown and fires bullets at a constant speed instead. :tannedcirno: Or how "random spread" is not the best term to describe that unused bullet group type in TH04.
Or that there are two distinct ways of clearing all bullets on screen, which deserve different names:

Bullets are zapped at the end of most midboss and boss phases, and cleared everywhere else – most notably, during bombs, when losing a life, or as rewards for extends or a maximized Dream bonus. The Bonus!! points awarded for zapping bullets are calculated iteratively, so it's not trivial to give an exact formula for these. For a small number 𝑛 of bullets, it would exactly be 5𝑛³ - 10𝑛² + 15𝑛 points – or, using uth05win's (correct) recursive definition, Bonus(𝑛) = Bonus(𝑛-1) + 15𝑛² - 5𝑛 + 10. However, one of the internal step variables is capped at a different number of points for each difficulty (and game), after which the points only increase linearly. Hence, "semi-exponential".


On to TH04's bullet spawn code then, because that one can at least be decompiled. And immediately, we have to deal with a pointless distinction between regular bullets, with either a decelerating or constant velocity, and special bullets, with preset velocity changes during their lifetime. That preset has to be set somewhere, so why have separate functions? In TH04, this separation continues even down to the lowest level of functions, where values are written into the global bullet array. TH05 merges those two functions into one, but then goes too far and uses self-modifying code to save a grand total of two local variables… Luckily, the rest of its actual code is identical to TH04.

Most of the complexity in bullet spawning comes from the (thankfully shared) helper function that calculates the velocities of the individual bullets within a group. Both games handle each group type via a large switch statement, which is where TH04 shows off another Turbo C++ 4.0 optimization: If the range of case values is too sparse to be meaningfully expressed in a jump table, it usually generates a linear search through a second value table. But with the -G command-line option, it instead generates branching code for a binary search through the set of cases. 𝑂(log 𝑛) as the worst case for a switch statement in a C++ compiler from 1994… that's so cool. But still, why are the values in TH04's group type enum all over the place to begin with? :onricdennat:
Unfortunately, this optimization is pretty rare in PC-98 Touhou. It only shows up here and in a few places in TH02, compared to at least 50 switch value tables.

In all of its micro-optimized pointlessness, TH05's undecompilable version at least fixes some of TH04's redundancy. While it's still not even optimal, it's at least a decently written piece of ASM… if you take the time to understand what's going on there, because it certainly took quite a bit of that to verify that all of the things which looked like bugs or quirks were in fact correct. And that's how the code for this function ended up with 35% comments and blank lines before I could confidently call it "reverse-engineered"…
Oh well, at least it finally fixes a correctness issue from TH01 and TH04, where an invalid bullet group type would fill all remaining slots in the bullet array with identical versions of the first bullet.

Something that both games also share in these functions is an over-reliance on globals for return values or other local state. The most ridiculous example here: Tuning the speed of a bullet based on rank actually mutates the global bullet template… which ZUN then works around by adding a wrapper function around both regular and special bullet spawning, which saves the base speed before executing that function, and restores it afterward. :zunpet: Add another set of wrappers to bypass that exact tuning, and you've expanded your nice 1-function interface to 4 functions. Oh, and did I mention that TH04 pointlessly duplicates the first set of wrapper functions for 3 of the 4 difficulties, which can't even be explained with "debugging reasons"? That's 10 functions then… and probably explains why I've procrastinated this feature for so long.

At this point, I also finally stopped decompiling ZUN's original ASM just for the sake of it. All these small TH05 functions would look horribly unidiomatic, are identical to their decompiled TH04 counterparts anyway, except for some unique constant… and, in the case of TH05's rank-based speed tuning function, actually become undecompilable as soon as we want to return a C++ class to preserve the semantic meaning of the return value. Mainly, this is because Turbo C++ does not allow register pseudo-variables like _AX or _AL to be cast into class types, even if their size matches. Decompiling that function would have therefore lowered the quality of the rest of the decompiled code, in exchange for the additional maintenance and compile-time cost of another translation unit. Not worth it – and for a TH05 port, you'd already have to decompile all the rest of the bullet spawning code anyway!


The only thing in there that was still somewhat worth being decompiled was the pre-spawn clipping and collision detection function. Due to what's probably a micro-optimization mistake, the TH05 version continues to spawn a bullet even if it was spawned on top of the player. This might sound like it has a different effect on gameplay… until you realize that the player got hit in this case and will either lose a life or deathbomb, both of which will cause all on-screen bullets to be cleared anyway. So it's at most a visual glitch.

But while we're at it, can we please stop talking about hitboxes? At least in the context of TH04 and TH05 bullets. The actual collision detection is described way better as a kill delta of 8×8 pixels between the center points of the player and a bullet. You can distribute these pixels to any combination of bullet and player "hitboxes" that make up 8×8. 4×4 around both the player and bullets? 1×1 for bullets, and 8×8 for the player? All equally valid… or perhaps none of them, once you keep in mind that other entity types might have different kill deltas. With that in mind, the concept of a "hitbox" turns into just a confusing abstraction.

The same is true for the 36×44 graze box delta. For some reason, this one is not exactly around the center of a bullet, but shifted to the right by 2 pixels. So, a bullet can be grazed up to 20 pixels right of the player, but only up to 16 pixels left of the player. uth05win also spotted this… and rotated the deltas clockwise by 90°?!


Which brings us to the bullet updates… for which I still had to research a decompilation workaround, because 📝 P0148 turned out to not help at all? Instead, the solution was to lie to the compiler about the true segment distance of the popup function and declare its signature far rather than near. This allowed ZUN to save that ridiculous overhead of 1 additional far function call/return per frame, and those precious 2 bytes in the BSS segment that he didn't have to spend on a segment value. 📝 Another function that didn't have just a single declaration in a common header file… really, 📝 how were these games even built???

The function itself is among the longer ones in both games. It especially stands out in the indentation department, with 7 levels at its most indented point – and that's the minimum of what's possible without goto. Only two more notable discoveries there:

  1. Bullets are the only entity affected by Slow Mode. If the number of bullets on screen is ≥ (24 + (difficulty * 8) + rank) in TH04, or (42 + (difficulty * 8)) in TH05, Slow Mode reduces the frame rate by 33%, by waiting for one additional VSync event every two frames.
    The code also reveals a second tier, with 50% slowdown for a slightly higher number of bullets, but that conditional branch can never be executed :zunpet:
  2. Bullets must have been grazed in a previous frame before they can be collided with. (Note how this does not apply to bullets that spawned on top of the player, as explained earlier!)

Whew… When did ReC98 turn into a full-on code review?! 😅 And after all this, we're still not done with TH04 and TH05 bullets, with all the special movement types still missing. That should be less than one push though, once we get to it. Next up: Back to TH01 and Konngara! Now have fun rewriting the Touhou Wiki Gameplay pages 😛

📝 Posted:
🚚 Summary of:
P0096, P0097, P0098
Commits:
8ddb778...8283c5e, 8283c5e...600f036, 600f036...ad06748
💰 Funded by:
Ember2528, Yanga
🏷 Tags:
rec98- th01+ file-format+ pc98+ blitting+ gameplay- player+ shot+ jank+ mod+ tcc-

So, let's finally look at some TH01 gameplay structures! The obvious choices here are player shots and pellets, which are conveniently located in the last code segment. Covering these would therefore also help in transferring some first bits of data in REIIDEN.EXE from ASM land to C land. (Splitting the data segment would still be quite annoying.) Player shots are immediately at the beginning…

…but wait, these are drawn as transparent sprites loaded from .PTN files. Guess we first have to spend a push on 📝 Part 2 of this format.
Hm, 4 functions for alpha-masked blitting and unblitting of both 16×16 and 32×32 .PTN sprites that align the X coordinate to a multiple of 8 (remember, the PC-98 uses a planar VRAM memory layout, where 8 pixels correspond to a byte), but only one function that supports unaligned blitting to any X coordinate, and only for 16×16 sprites? Which is only called twice? And doesn't come with a corresponding unblitting function? :thonk:

Yeah, "unblitting". TH01 isn't double-buffered, and uses the PC-98's second VRAM page exclusively to store a stage's background and static sprites. Since the PC-98 has no hardware sprites, all you can do is write pixels into VRAM, and any animated sprite needs to be manually removed from VRAM at the beginning of each frame. Not using double-buffering theoretically allows TH01 to simply copy back all 128 KB of VRAM once per frame to do this. :tannedcirno: But that would be pretty wasteful, so TH01 just looks at all animated sprites, and selectively copies only their occupied pixels from the second to the first VRAM page.


Alright, player shot class methods… oh, wait, the collision functions directly act on the Yin-Yang Orb, so we first have to spend a push on that one. And that's where the impression we got from the .PTN functions is confirmed: The orb is, in fact, only ever displayed at byte-aligned X coordinates, divisible by 8. It's only thanks to the constant spinning that its movement appears at least somewhat smooth.
This is purely a rendering issue; internally, its position is tracked at pixel precision. Sadly, smooth orb rendering at any unaligned X coordinate wouldn't be that trivial of a mod, because well, the necessary functions for unaligned blitting and unblitting of 32×32 sprites don't exist in TH01's code. Then again, there's so much potential for optimization in this code, so it might be very possible to squeeze those additional two functions into the same C++ translation unit, even without position independence…

More importantly though, this was the right time to decompile the core functions controlling the orb physics – probably the highlight in these three pushes for most people.
Well, "physics". The X velocity is restricted to the 5 discrete states of -8, -4, 0, 4, and 8, and gravity is applied by simply adding 1 to the Y velocity every 5 frames :zunpet: No wonder that this can easily lead to situations in which the orb infinitely bounces from the ground.
At least fangame authors now have a reference of how ZUN did it originally, because really, this bad approximation of physics had to have been written that way on purpose. But hey, it uses 64-bit floating-point variables! :onricdennat:

…sometimes at least, and quite randomly. This was also where I had to learn about Turbo C++'s floating-point code generation, and how rigorously it defines the order of instructions when mixing double and float variables in arithmetic or conditional expressions. This meant that I could only get ZUN's original instruction order by using literal constants instead of variables, which is impossible right now without somehow splitting the data segment. In the end, I had to resort to spelling out ⅔ of one function, and one conditional branch of another, in inline ASM. 😕 If ZUN had just written 16.0 instead of 16.0f there, I would have saved quite some hours of my life trying to decompile this correctly…

To sort of make up for the slowdown in progress, here's the TH01 orb physics debug mod I made to properly understand them: 2020-06-13-TH01OrbPhysicsDebug.zip To use it, simply replace REIIDEN.EXE, and run the game in debug mode, via game d on the DOS prompt.
Its code might also serve as an example of how to achieve this sort of thing without position independence.


Alright, now it's time for player shots though. Yeah, sure, they don't move horizontally, so it's not too bad that those are also always rendered at byte-aligned positions. But, uh… why does this code only use the 16×16 alpha-masked unblitting function for decaying shots, and just sloppily unblits an entire 16×16 square everywhere else?

The worst part though: Unblitting, moving, and rendering player shots is done in a single function, in that order. And that's exactly where TH01's sprite flickering comes from. Since different types of sprites are free to overlap each other, you'd have to first unblit all types, then move all types, and then render all types, as done in later PC-98 Touhou games. If you do these three steps per-type instead, you will unblit sprites of other types that have been rendered before… and therefore end up with flicker.
Oh, and finally, ZUN also added an additional sloppy 16×16 square unblit call if a shot collides with a pellet or a boss, for some guaranteed flicker. Sigh.


And that's ⅓ of all ZUN code in TH01 decompiled! Next up: Pellets!

📝 Posted:
🚚 Summary of:
P0036, P0037
Commits:
a533b5d...82b0e1d, 82b0e1d...e7e1cbc
💰 Funded by:
zorg
🏷 Tags:
rec98- th04+ th05+ gameplay- player+ shot+ tcc-

And just in time for zorg's last outstanding pushes, the TH05 shot type control functions made the speedup happen!

  • TH05 as a whole is now 20% reverse-engineered, and 50% position independent,
  • TH05's MAIN.EXE is now even below TH02's in terms of not yet RE'd instructions,
  • and all price estimates have now fallen significantly.

It would have been really nice to also include Reimu's shot control functions in this last push, but figuring out this entire system, with its weird bitflags and switch statement micro-optimizations, was once again taking way longer than it should have. Especially with my new-found insistence on turning this obvious copy-pasta into something somewhat readable and terse…

But with such a rather tabular visual structure, things should now be moddable in hopefully easily consistent way. Of course, since we're only at 54% position independence for MAIN.EXE, this isn't possible yet without crashing the game, but modifiying damage would already work.

Despite my earlier claims of ZUN only having used C++ in TH01, as it's the only game using new and delete, it's now pretty much confirmed that ZUN used it for all games, as inlined functions (and by extension, C++ class methods) are the only way to get certain instructions out of the Turbo C++ code generator. Also, I've kept my promise and started really filling that decompilation pattern file.

And now, with the reverse-engineering backlog finally being cleared out, we wait for the next orders, and the direction they might focus on…

📝 Posted:
🚚 Summary of:
P0029, P0030
Commits:
6ff427a...c7fc4ca, c7fc4ca...dea40ad
💰 Funded by:
zorg
🏷 Tags:
rec98- th02+ th04+ th05+ blitting+ gameplay- midboss+ boss+ tcc-

Here we go, new C code! …eh, it will still take a bit to really get decompilation going at the speeds I was hoping for. Especially with the sheer amount of stuff that is set in the first few significant functions we actually can decompile, which now all has to be correctly declared in the C world. Turns out I spent the last 2 years screwing up the case of exported functions, and even some of their names, so that it didn't actually reflect their calling convention… yup. That's just the stuff you tend to forget while it doesn't matter.

To make up for that, I decided to research whether we can make use of some C++ features to improve code readability after all. Previously, it seemed that TH01 was the only game that included any C++ code, whereas TH02 and later seemed to be 100% C and ASM. However, during the development of the soon to be released new build system, I noticed that even this old compiler from the mid-90's, infamous for prioritizing compile speeds over all but the most trivial optimizations, was capable of quite surprising levels of automatic inlining with class methods…

…leading the research to culminate in the mindblow that is 9d121c7 – yes, we can use C++ class methods and operator overloading to make the code more readable, while still generating the same code than if we had just used C and preprocessor macros.

Looks like there's now the potential for a few pull requests from outside devs that apply C++ features to improve the legibility of previously decompiled and terribly macro-ridden code. So, if anyone wants to help without spending money…