⮜ Blog

⮜ List of tags

Showing all posts tagged shinki-

📝 Posted:
🚚 Summary of:
P0146
⌨ Commits:
08bc188...456b621
💰 Funded by:
Ember2528, -Tom-
🏷 Tags:
rec98+ th05+ tcc+ animation+ boss+ shinki- micro-optimization+ waste+ uth05win+

Y'know, I kinda prefer the pending crowdfunded workload to stay more near the middle of the cap, rather than being sold out all the time. So to reach this point more quickly, let's do the most relaxing thing that can be easily done in TH05 right now: The boss backgrounds, starting with Shinki's, 📝 now that we've got the time to look at it in detail.

… Oh come on, more things that are borderline undecompilable, and require new workarounds to be developed? Yup, Borland C++ always optimizes any comparison of a register with a literal 0 to OR reg, reg, no matter how many calculations and inlined function calls you replace the 0 with. Shinki's background particle rendering function contains a CMP AX, 0 instruction though… so yeah, 📝 yet another piece of custom ASM that's worse than what Turbo C++ 4.0J would have generated if ZUN had just written readable C. This was probably motivated by ZUN insisting that his modified master.lib function for blitting particles takes its X and Y parameters as registers. If he had just used the __fastcall convention, he also would have got the sprite ID passed as a register. 🤷
So, we really don't want to be forced into inline assembly just because of the third comparison in the otherwise perfectly decompilable four-comparison if() expression that prevents invisible particles from being drawn. The workaround: Comparing to a pointer instead, which only the linker gets to resolve to the actual value of 0. :tannedcirno: This way, the compiler has to make room for any 16-bit literal, and can't optimize anything.


And then we go straight from micro-optimization to waste, with all the duplication in the code that animates all those particles together with the zooming and spinning lines. This push decompiled 1.31% of all code in TH05, and thanks to alignment, we're still missing Shinki's high-level background rendering function that calls all the subfunctions I decompiled here.
With all the manipulated state involved here, it's not at all trivial to see how this code produces what you see in-game. Like:

  1. If all lines have the same Y velocity, how do the other three lines in background type B get pushed down into this vertical formation while the top one stays still? (Answer: This velocity is only applied to the top line, the other lines are only pushed based on some delta.)
  2. How can this delta be calculated based on the distance of the top line with its supposed target point around Shinki's wings? (Answer: The velocity is never set to 0, so the top line overshoots this target point in every frame. After calculating the delta, the top line itself is pushed down as well, canceling out the movement. :zunpet:)
  3. Why don't they get pushed down infinitely, but stop eventually? (Answer: We only see four lines out of 20, at indices #0, #6, #12, and #18. In each frame, lines [0..17] are copied to lines [1..18], before anything gets moved. The invisible lines are pushed down based on the delta as well, which defines a distance between the visible lines of (velocity * array gap). And since the velocity is capped at -14 pixels per frame, this also means a maximum distance of 84 pixels between the midpoints of each line.)
  4. And why are the lines moving back up when switching to background type C, before moving down? (Answer: Because type C increases the velocity rather than decreasing it. Therefore, it relies on the previous velocity state from type B to show a gapless animation.)
So yeah, it's a nice-looking effect, just very hard to understand. 😵

With the amount of effort I'm putting into this project, I typically gravitate towards more descriptive function names. Here, however, uth05win's simple and seemingly tiny-brained "background type A/B/C/D" was quite a smart choice. It clearly defines the sequence in which these animations are intended to be shown, and as we've seen with point 4 from the list above, that does indeed matter.

Next up: At least EX-Alice's background animations, and probably also the high-level parts of the background rendering for all the other TH05 bosses.

📝 Posted:
🚚 Summary of:
P0110
⌨ Commits:
2c7d86b...8b5c146
💰 Funded by:
[Anonymous], Blue Bolt
🏷 Tags:
rec98+ th02+ th03+ th04+ th05+ animation+ tcc+ shinki- ex-alice+

… and just as I explained 📝 in the last post how decompilation is typically more sensible and efficient than ASM-level reverse-engineering, we have this push demonstrating a counter-example. The reason why the background particles and lines in the Shinki and EX-Alice battles contributed so much to position dependence was simply because they're accessed in a relatively large amount of functions, one for each different animation. Too many to spend the remaining precious crowdfunded time on reverse-engineering or even decompiling them all, especially now that everyone anticipates 100% PI for TH05's MAIN.EXE.

Therefore, I only decompiled the two functions of the line structure that also demonstrate best how it works, which in turn also helped with RE. Sadly, this revealed that we actually can't 📝 overload operator =() to get that nice assignment syntax for 12.4 fixed-point values, because one of those new functions relies on Turbo C++'s built-in optimizations for trivially copyable structures. Still, impressive that this abstraction caused no other issues for almost one year.

As for the structures themselves… nope, nothing to criticize this time! Sure, one good particle system would have been awesome, instead of having separate structures for the Stage 2 "starfield" particles and the one used in Shinki's battle, with hardcoded animations for both. But given the game's short development time, that was quite an acceptable compromise, I'd say.
And as for the lines, there just has to be a reason why the game reserves 20 lines per set, but only renders lines #0, #6, #12, and #18. We'll probably see once we get to look at those animation functions more closely.

This was quite a 📝 TH03-style RE push, which yielded way more PI% than RE%. But now that that's done, I can finally not get distracted by all that stuff when looking at the list of remaining memory references. Next up: The last few missing structures in TH05's MAIN.EXE!

📝 Posted:
🚚 Summary of:
P0078, P0079
⌨ Commits:
f4eb7a8...9e52cb1, 9e52cb1...cd48aa3
💰 Funded by:
iruleatgames, -Tom-
🏷 Tags:
rec98+ th02+ th04+ th05+ gameplay+ animation+ bullet+ boss+ alice+ mai+ yuki+ yumeko+ shinki- ex-alice+ uth05win+

To finish this TH05 stretch, we've got a feature that's exclusive to TH05 for once! As the final memory management innovation in PC-98 Touhou, TH05 provides a single static (64 * 26)-byte array for storing up to 64 entities of a custom type, specific to a stage or boss portion. The game uses this array for

  1. the Stage 2 star particles,
  2. Alice's puppets,
  3. the tip of curve ("jello") bullets,
  4. Mai's snowballs and Yuki's fireballs,
  5. Yumeko's swords,
  6. and Shinki's 32×32 bullets,

which makes sense, given that only one of those will be active at any given time.

On the surface, they all appear to share the same 26-byte structure, with consistently sized fields, merely using its 5 generic fields for different purposes. Looking closer though, there actually are differences in the signedness of certain fields across the six types. uth05win chose to declare them as entirely separate structures, and given all the semantic differences (pixels vs. subpixels, regular vs. tiny master.lib sprites, …), it made sense to do the same in ReC98. It quickly turned out to be the only solution to meet my own standards of code readability.

Which blew this one up to two pushes once again… But now, modders can trivially resize any of those structures without affecting the other types within the original (64 * 26)-byte boundary, even without full position independence. While you'd still have to reduce the type-specific number of distinct entities if you made any structure larger, you could also have more entities with fewer structure members.

As for the types themselves, they're full of redundancy once again – as you might have already expected from seeing #4, #5, and #6 listed as unrelated to each other. Those could have indeed been merged into a single 32×32 bullet type, supporting all the unique properties of #4 (destructible, with optional revenge bullets), #5 (optional number of twirl animation frames before they begin to move) and #6 (delay clouds). The *_add(), *_update(), and *_render() functions of #5 and #6 could even already be completely reverse-engineered from just applying the structure onto the ASM, with the ones of #3 and #4 only needing one more RE push.

But perhaps the most interesting discovery here is in the curve bullets: TH05 only renders every second one of the 17 nodes in a curve bullet, yet hit-tests every single one of them. In practice, this is an acceptable optimization though – you only start to notice jagged edges and gaps between the fragments once their speed exceeds roughly 11 pixels per second:

And that brings us to the last 20% of TH05 position independence! But first, we'll have more cheap and fast TH01 progress.