- ๐ Posted:
- ๐ Summary of:
- P0244
- โจ Commits:
ac33bd2...97f0c3b
- ๐ฐ Funded by:
- Blue Bolt, [Anonymous]
- ๐ท Tags:
๐ After almost 3 years, TH04 finally caught up to TH05 and is now 100% position-independent as well! ๐
For a refresher on what this means and does not mean, check the
announcements from back in 2019 and 2020 when we chased the goal for TH05's
๐ OP.EXE
and
๐ the rest of the game. These also feature
some demo videos that show off the kind of mods you were able to efficiently
code back then. With the occasional reverse-engineering attention it
received over the years, TH04's code should now be slightly easier to work
with than TH05's was back in the day. Although not by much โ TH04 has
remained relatively unpopular among backers, and only received more than the
funded attention because it shares most of its core code with the more
popular TH05. Which, coincidentally, ended up becoming
๐ the reason for getting this done now.
Not that it matters a lot. Ever since we reached 100% PI for TH05, community
and backer interest in position independence has dropped to near zero. We
just didn't end up seeing the expected large amount of community-made mods
that PI was meant to facilitate, and even the
๐ 100% decompilation of TH01 changed nothing
about that. But that's OK; after all, I do appreciate the business of
continually getting commissioned for all the
๐ large-scale mods. Not focusing on PI is
also the correct choice for everyone who likes reading these blog posts, as
it often means that I can't go that much into detail due to cutting corners
and piling up technical debt left and right.
Surprisingly, this only took 1.25 pushes, almost twice as fast as expected. As that's closer to 1 push than it is to 2, I'm OK with releasing it like this โ especially since it was originally meant to come out three days ago. ๐ Unfortunately, it was delayed thanks to surprising website bugs and a certain piece of code that was way more difficult to document than it was to decompileโฆ The next push will have slightly less content in exchange, though.
๐ P0240 and P0241 already covered the final remaining structures, so I only needed to do some superficial RE to prove the remaining numeric literals as either constants or memory addresses. For example, I initially thought I'd have to decompile the dissolve animations in the staff roll, but I only needed to identify a single function pointer type to prove all false positives as screen coordinates there. Now, the TH04 staff roll would be another fast and cheap decompilation, similar to the custom entity types of TH04. (And TH05 as well!)
The one piece of code I did have to decompile was Stage 4's carpet
lighting animation, thanks to hex literals that were way too complicated to
leave in ASM. And this one probably takes the crown for TH04's worst set of
landmines and bloat that still somehow results in no observable bugs or
quirks.
This animation starts at frame 1664, roughly 29.5 seconds into the stage,
and quickly turns the stage background into a repeated row of dark-red plaid
carpet tiles by moving out from the center of the playfield towards the
edges. Afterward, the animation repeats with a brighter set of tiles that is
then used for the rest of the stage. As I explained
๐ a while ago in the context of TH02, the
stage tile and map formats in PC-98 Touhou can't express animations, so all
of this needed to be hardcoded in the binary.
And ZUN did start out making the right decision by only using fully-lit
carpet tiles for all tile sections defined in ST03.MAP
. This
way, the animation can simply disable itself after it completed, letting the
rest of the stage render normally and use new tile sections that are only
defined for the final light level. This means that the "initial" dark
version of the carpet is as much a result of hardcoded tile manipulation as
the animation itself.
But then, ZUN proceeded to implement it all by directly manipulating the
ring buffer of on-screen tiles. This is the lowest level before the tiles
are rendered, and rather detached from the defined content of the
๐ .MAP tile sections. Which leads to a whole
lot of problems:
If you decide to do this kind of tile ring modification, it should ideally happen at a very specific point: after scrolling in new tiles into the ring buffer, but before blitting any scrolled or invalidated tiles to VRAM based on the ring buffer. Which is not where ZUN chose to put it, as he placed the call to the stage-specific render function after both of those operations. By the time the function is called, the tile renderer has already blitted a few lines of the fully-lit carpet tiles from the defined .MAP tile section, matching the scroll speed. Fortunately, these are hidden behind the black TRAM cells above and below the playfieldโฆ
Still, the code needs to get rid of them before they would become visible. ZUN uses the regular tile invalidation function for this, which will only cause actual redraws on the next frame. Again, the tile rendering call has already happened by the time the Stage 4-specific rendering function gets called.
But wait, this game also flips VRAM pages between frames to provide a tear-free gameplay experience. This means that the intended redraw of the new tiles actually hits the wrong VRAM page. And sure, the code does attempt to invalidate these newly blitted lines every frame โ but only relative to the current VRAM Y coordinate that represents the top of the hardware-scrolled screen. Once we're back on the original VRAM page on the next frame, the lines we initially set out to remove could have already scrolled past that point, making it impossible to ever catch up with them in this way.
The only real "solution": Defining the height of the tile invalidation rectangle at 3ร the scroll speed, which ensures that each invalidation call covers 3 frames worth of newly scrolled-in lines. This is not intuitive at all, and requires an understanding of everything I have just written to even arrive at this conclusion. Needless to say that ZUN didn't comprehend it either, and just hardcoded an invalidation height that happened to be enough for the small scroll speeds defined inST03.STD
for the first 30 seconds of the stage.- The effect must consistently modify the tile ring buffer to "fix" any new tiles, overriding them with the intended light level. During the animation, the code not only needs to set the old light level for any tiles that are still waiting to be replaced, but also the new light level for any tiles that were replaced โ and ZUN forgot the second part. As a result, newly scrolled-in tiles within the already animated area will "remain" untouched at light level 2 if the scroll speed is fast enough during the transition from light level 0 to 1.
All that means that we only have to raise the scroll speed for the effect to fall apart. Let's try, say, 4 pixels per frame rather than the original 0.25:
All of this could have been so much simpler and actually stable if ZUN applied the tile changes directly onto the .MAP. This is a much more intuitive way of expressing what is supposed to happen to the map, and would have reduced the code to the actually necessary tile changes for the first frame and each individual frame of the animation. It would have still required a way to force these changes into the tile ring buffer, but ZUN could have just used his existing full-playfield redraw functions for that. In any case, there would have been no need for any per-frame tile fixing and redrawing. The CPU cycles saved this way could have then maybe been put towards writing the tile-replacing part of the animation in C++ rather than ASMโฆ
Wow, that was an unreasonable amount of research into a feature that superficially works fine, just because its decompiled code didn't make sense. To end on a more positive note, here are some minor new discoveries that might actually matter to someone:
-
The laser part of Marisa's Illusion Laser shot type always does 3
points of damage per frame, regardless of the player's power level. Its
hitbox also remains identical on all power levels, no matter how wide the
laser appears on screen. The strength difference between the levels purely
comes from the number of frames the laser stays active before a fixed
non-damaging 32-frame cooldown time:
Power level Frames per cycle (including 32-frame cooldown) 2 64 3 72 4 88 5 104 6 128 7 144 8 168 9 192 - The decay animation for player shots is faster in TH05 (12 frames) than in TH04 (16 frames).
- In the first phase of her Stage 6 fight, Yuuka moves along one of two randomly chosen hardcoded paths, defined as a set of 5 movement angles. After reaching the final point and firing a danmaku pattern, she teleports back to her initial position to repeat the path one more time before the phase times out.
- Similarly, TH04's Stage 3 midboss also goes through 12 fixed movement angles before flying off the playfield.
- The formulas for calculating the skill rating on both TH04's and TH05's final verdict screen are going to be very long and complicated.
Next up: ยพ of a push filled with random boilerplate, finalization, and TH01 code cleanup work, while I finish the preparations for Shuusou Gyoku's OpenGL backend. This month, everything should finally work out as intended: I'll complete both tasks in parallel, ship the former to free up the cap, and then ship the latter once its 5th push is fully funded.