That was quick: In a surprising turn of events, Romantique Tp themselves came in just one day after the last blog post went up, updated me with their current and much more positive opinion on Sound Canvas VA, and confirmed that real SC-88Pro hardware clamps invalid Reverb Macro values to the specified range. I promised to release a new Sound Canvas VA BGM pack for free once I knew the exact behavior of real hardware, so let's go right back to Seihou and also integrate the necessary SysEx patches into the game's MIDI player behind a toggle. This would also be a great occasion to quickly incorporate some long overdue code maintenance and build system improvements, and a migration to C++ modules in particular. When I started the Shuusou Gyoku Linux port a year ago, the combination of modules and <windows.h> threw lots of weird errors and even crashed the Visual Studio compiler. But nowadays, Microsoft even uses modules in the Office code base. This must mean that these issues are fixed by now, right?
Well, there's still a bug that causes the modularized C++ standard library to be basically unusable in combination with the static analyzer, and somehow, I was the first one to report it. So it's 3½ years after C++20 was finalized, and somehow, modules are still a bleeding-edge feature and a second-class citizen in even the compiler that supports them the best. I want fast compile times already! 😕
Thankfully, Microsoft agrees that this is a bug, and will work on it at some point. While we're waiting, let's return to the original plan of decompiling the endings of the one PC-98 Touhou game that still needed them decompiled.
After the textless slideshows of TH01, TH02 was the first Touhou game to feature lore text in its endings. Given that this game stores its 📝 in-game dialog text in fixed-size plaintext files, you wouldn't expect anything more fancy for the endings either, so it's not surprising to see that the END?.TXT files use the same concept, with 44 visible bytes per line followed by two bytes of padding for the CR/LF newline sequence. Each of these lines is typed to the screen in full, with all whitespace and a fixed time for each 2-byte chunk.
As a result, everything surrounding the text is just as hardcoded as TH01's endings were, which once again opens up the possibility of freely integrating all sorts of creative animations without the overhead of an interpreter. Sadly, TH02 only makes use of this freedom in a mere two cases: the picture scrolling effect from Reimu's head to Marisa's head in the Bad Endings, and a single hardware palette change in the Good Endings.
Hardcoding also still made sense for this game because of how the ending text is structured. The Good and Bad Endings for the individual shot types respectively share 55% and 77% of their text, and both only diverge after the first 27 lines. In straight-line procedural code, this translates to one branch for each shot type at a single point, neatly matching the high-level structure of these endings.
But that's the end of the positive or neutral aspects I can find in these scripts. The worst part, by far, is ZUN's approach to displaying the text in alternating colors, and how it impacts the entire structure of the code.
The simplest solution would have involved a hardcoded array with the color of each line, just like how the in-game dialogs store the face IDs for each text box. But for whatever reason, ZUN did not apply this piece of wisdom to the endings and instead hardcoded these color changes by… mutating a global variable before calling the text typing function for every individual line. This approach ruins any possibility of compressing the script code into loops. While ZUN did use loops, all of them are very short because they can only last until the next color change. In the end, the code contains 90 explicitly spelled-out calls to the 5-parameter line typing function that only vary in the pointer to each line and in the slower speed used for the one or two final lines of each ending. As usual, I've deduplicated the code in the ReC98 repository down to a sensible level, but here's the full inlined and macro-expanded horror:
It's highly likely that this is what ZUN hacked into his PC-98 and was staring at back in 1997.
All this redundancy bloats the two script functions for the 6 endings to a whopping 3,344 bytes inside TH02's MAINE.EXE. In particular, the single function that covers the three Good Endings ends up with a total of 631 x86 ASM instructions, making it the single largest function in TH02 and the 7th longest function in all of PC-98 Touhou. If the 📝 single-executable build for TH02's debloated and anniversary branches ends up needing a few more KB to reduce its size below the original MAIN.EXE, there are lots of opportunities to compress it all.
The ending text can also be fast-forwarded by holding any key. As we've come to expect for this sort of ZUN code, the text typing function runs its own rendering loop with VSync delays and input detection, which means that we 📝 once📝 again have to talk about the infamous quirk of the PC-98 keyboard controller in relation to held keys. We've still got 54 not yet decompiled calls to input detection functions left in this codebase, are you excited yet?!
Holding any key speeds up the text of all ending lines before the last one by displaying two kana/kanji instead of one per rendered frame and reducing the delay between the rendered frames to 1/3 of its regular length. In pseudocode:
for(i = 0; i < number_of_2_byte_chunks_on_displayed_line; i++) {
input = convert_current_pc98_bios_input_state_to_game_specific_bitflags();
add_chunk_to_internal_text_buffer(i);
blit_internal_text_buffer_from_the_beginning();
if(input == INPUT_NONE) {
// Basic case, no key pressed
frame_delay(frames_per_chunk);
} else if((i % 2) == 1) {
// Key pressed, chunk number is odd.
frame_delay(frames_per_chunk / 3);
} else {
// Key pressed, chunk number is even.
// No delay; next iteration adds to the same frame.
}
}
This is exactly the kind of code you would write if you wanted to deliberately maximize the impact of this hardware quirk. If the game happens to read the current input state right after a key up scancode for the last previously held and game-relevant key, it will then wrongly take the branch that uninterruptibly waits for the regular, non-divided amount of VSync interrupts. In my tests, this broke the rhythm of the fast-forwarded text about once per line. Note how this branch can also be taken on an even chunk: Rendering glyphs straight from font ROM to VRAM is not exactly cheap, and if each iteration (needlessly) blits one more full-width glyph than the last one, the probability of a key up scancode arriving in the middle of a frame only increases.
The fact that TH02 allows any of the supported input keys to be held points to another detail of this quirk I haven't mentioned so far. If you press multiple keys at once, the PC-98's keyboard controller only sends the periodic key up scancodes as long as you are holding the last key you pressed. Because the controller only remembers this last key, pressing and releasing any other key would get rid of these scancodes for all keys you are still holding.
As usual, this ZUN bug only occurs on real hardware and with DOSBox-X's correct emulation of the PC-98 keyboard controller.
After the ending, we get to witness the most seamless transition between ending and Staff Roll in any Touhou game as the BGM immediately changes to the Staff Roll theme, and the ending picture is shifted into the same place where the Staff Roll pictures will appear. Except that the code misses the exact position by four pixels, and cuts off another four pixels at the right edge of the picture:
Also, note the green 1-pixel line at the right edge of this specific picture. This is a bug in the .PI file where the picture is indeed shifted one pixel to the left.
What follows is a comparatively large amount of unused content for a single scene. It starts right at the end of this underappreciated 11-frame animation loaded from ENDFT.BFT:
Wastefully using the 4bpp BFNT format. The single frame at the end of the animation is unused; while it might look identical to the ZUN glyphs later on in the Staff Roll, that's only because both are independently rendered boldfaced versions of the same font ROM glyphs. Then again, it does prove that ZUN created this animation on a PC-98 model made by NEC, as the Epson clones used a font ROM with a distinctly different look.
TH02's Staff Roll is also unique for the pre-made screenshots of all 5 stages that get shown together with a fancy rotating rectangle animation while the Staff Roll progresses in sync with the BGM. The first interesting detail shows up immediately after the first image, where the code jumps over one of the 320×200 quarters in ED06.PI, leaving the screenshot of the Stage 2 midboss unused.
All of the cutscenes in PC-98 Touhou store their pictures as 320×200 quarters within a single 640×400 .PI file. Anywhere else, all four quarters are supposed to be displayed with the same palette specified in the .PI header, but TH02's Staff Roll screenshots are also unique in how all quarters beyond the top-left one require palettes loaded from external .RGB files to look right. Consequently, the game doesn't clearly specify the intended palette of this unused screenshot, and leaves two possibilities:
The unused second 320×200 quarter of TH02's ED06.PI, displayed in the Stage 2 color palette used in-game.
The unused second 320×200 quarter of TH02's ED06.PI, displayed in the palette specified in the .PI header. These are the colors you'd see when looking at the file in a .PI viewer, when converting it into another format with the usual tools, or in sprite rips that don't take TH02's hardcoded palette changes into account. These colors are only intended for the Stage 1 screenshot in the top-left quarter of the file.
The unused second 320×200 quarter of TH02's ED06.PI, displayed in the palette from ED06B.RGB, which the game uses for the following screenshot of the Meira fight. As it's from the same stage, it almost matches the in-game colors seen in 1️⃣, and only differs in the white color (#FFF) being slightly red-tinted (#FCC).
It might seem obvious that the Stage 2 palette in 1️⃣ is the correct one, but ZUN indeed uses ED06B.RGB with the red-tinted white color for the following screenshot of the Meira fight. Not only does this palette not match Meira's in-game appearance, but it also discolors the rectangle animation and the surrounding Staff Roll text:
Also, that tearing on frame #1 is not a recording artifact, but the expected result of yet another VSync-related landmine. 💣 This time, it's caused by the combination of 1) the entire sequence from the ending to the verdict screen being single-buffered, and 2) this animation always running immediately after an expensive operation (640×400 .PI image loading and blitting to VRAM, 320×200 VRAM inter-page copy, or hardware palette loading from a packed file), without waiting for the VSync interrupt. This makes it highly likely for the first frame of this animation to start rendering at a point where the (real or emulated) electron beam has already traveled over a significant portion of the screen.
But when I went into Stage 2 to compare these colors to the in-game palette, I found something even more curious. ZUN obviously made this screenshot with the Reimu-C shot type, but one of the shot sprites looks slightly different from how it does in-game. These screenshots must have been made earlier in development when the sprite didn't yet feature the second ring at the top. The same applies to the Stage 4 screenshot later on:
Finally, the rotating rectangle animation delivers one more minor rendering bug. Each of the 20 frames removes the largest and outermost rectangle from VRAM by redrawing it in the same black color of the background before drawing the remaining rectangles on top. The corners of these rectangles are placed on a shrinking circle that starts with a radius of 256 pixels and is centered at (192, 200), which results in a maximum possible X coordinate of 448 for the rightmost corner of the rectangle. However, the Staff Roll text starts at an X coordinate of 416, causing the first two full-width glyphs to still fall within the area of the circle. Each line of text is also only rendered once before the animation. So if any of the rectangles then happens to be placed at an angle that causes its edges to overlap the text, its removal will cut small holes of black pixels into the glyphs:
The green dotted circle corresponds to the newest/smallest rectangle. Note how ZUN only happened to avoid the holes for the two final animations by choosing an initial angle and angular velocity that causes the resulting rectangles to just barely avoid touching the TEST PLAYER glyphs.
At least the following verdict screen manages to have no bugs aside from the slightly imperfect centering of its table values, and only comes with a small amount of additional bloat. Let's get right to the mapping from skill points to the 12 title strings from END3.TXT, because one of them is not like the others:
Skill
Title
≥100
神を超えた巫女!!
90 - 99
もはや神の領域!!
80 - 99
A級シューター!!
78 - 79
うきうきゲーマー!
77
バニラはーもにー!
70 - 76
うきうきゲーマー!
60 - 69
どきどきゲーマー!
50 - 59
要練習ゲーマー
40 - 49
非ゲーマー級
30 - 39
ちょっとだめ
20 - 29
非人間級
10 - 19
人間でない何か
≤9
死んでいいよ、いやいやまじで
Looks like I'm the first one to document the required skill points as well? Everyoneelse just copy-pastes END3.TXT without providing context.
So how would you get exactly 77 and achieve vanilla harmony? Here's the formula:
* Ranges from 0 (Easy) to 3 (Lunatic). † Across all 5 stages.
With Easy Mode capping out at 85, this is possible on every difficulty, although it requires increasingly perfect play the lower you go. Reaching 77 on purpose, however, pretty much demands a careful route through the entire game, as every collected and missed item will influence the item_skill in some way. This almost feels it's like the ultimate challenge that this game has to offer. Looking forward to the first Vanilla Harmony% run!
And with that, TH02's MAINE.EXE is both fully position-independent and ready for translation. There's a tiny bit of undecompiled bit of code left in the binary, but I'll leave that for rounding up a future TH02 decompilation push.
With one of the game's skill-based formulas decompiled, it's fitting to round out the second push with the other two. The in-game bonus tables at the end of a stage also have labels that we'd eventually like to translate, after all.
The bonus formula for the 4 regular stages is also the first place where we encounter TH02's rank value, as well as the only instance in PC-98 Touhou where the game actually displays a rank-derived value to the player. KirbyComment and Colin Douglas Howell accurately documented the rank mechanics over at Touhou Wiki two years ago, which helped quite a bit as rank would have been slightly out of scope for these two pushes. 📝 Similar to TH01, TH02's rank value only affects bullet speed, but the exact details of how rank is factored in will have to wait until RE progress arrives at this game's bullet system.
These bonuses are calculated by taking a sum of various gameplay metrics and multiplying it with the amount of point items collected during the stage. In the 4 regular stages, the sum consists of:
難易度
Difficulty level* × 2,000
ステージ
(Rank + 16) × 200
ボム
max((2,500 - (Bombs used* × 500)), 0)
ミス
max((3,000 - (Lives lost* × 1,000)), 0)
靈撃初期数
(4 - Starting bombs) × 800
靈夢初期数
(5 - Starting lives) × 1,000
* Within this stage, across all continues.
Yup, 封魔録.TXT does indeed document this correctly.
As rank can range from -6 to +4 on Easy and +16 on the other difficulties, this sum can range between:
Easy
Normal
Hard
Lunatic
Minimum
2,800
4,800
6,800
8,800
Maximum
16,700
21,100
23,100
25,100
The sum for the Extra Stage is not documented in 封魔録.TXT:
クリア
10,000
ミス回数
max((20,000 - (Lives lost × 4,000)), 0)
ボム回数
max((20,000 - (Bombs used × 4,000)), 0)
クリアタイム
⌊max((20,000 - Boss fight frames*), 0) ÷ 10⌋ × 10
* Amount of frames spent fighting Evil Eye Σ, counted from the end of the pre-boss dialog until the start of the defeat animation.
And that's two pushes packed full of the most bloated and copy-pasted code that's unique to TH02! So bloated, in fact, that TH02 RE as a whole jumped by almost 7%, which in turn finally pushed overall RE% over the 60% mark. 🎉 It's been a while since we hit a similar milestone; 50% overall RE happened almost 2 years ago during 📝 P0204, a month before I completed the TH01 decompilation.
Next up: Continuing to wait for Microsoft to fix the static analyzer bug until May at the latest, and working towards the newly popular dreams of TH03 netplay by looking at some of its foundational gameplay code.
So, TH02! Being the only game whose main binary hadn't seen any dedicated
attention ever, we get to start the TH02-related blog posts at the very
beginning with the most foundational pieces of code. The stage tile system
is the best place to start here: It not only blocks every entity that is
rendered on top of these tiles, but is curiously placed right next to
master.lib code in TH02, and would need to be separated out into its own
translation unit before we can do the same with all the master.lib
functions.
In late 2018, I already RE'd
📝 TH04's and TH05's stage tile implementation, but haven't properly documented it on this
blog yet, so this post is also going to include the details that are unique
to those games. On a high level, the stage tile system works identically in
all three games:
The tiles themselves are 16×16 pixels large, and a stage can use 100 of
them at the same time.
The optimal way of blitting tiles would involve VRAM-to-VRAM copies
within the same page using the EGC, and that's exactly what the games do.
All tiles are stored on both VRAM pages within the rightmost 64×400 pixels
of the screen just right next to the HUD, and you only don't see them
because the games cover the same area in text RAM with black cells:
The initial screen of TH02's Stage 1, with the tile source
area uncovered by filling the same area in text RAM with transparent
cells instead of black ones. In TH02, this also reveals how the tile
area ends with a bunch of glitch tiles, tinted blue in the image. These
are the result of ZUN unconditionally blitting 100 tile images every
time, regardless of how many are actually contained in an
.MPN file.
These glitch tiles are another good example of a ZUN
landmine. Their appearance is the result of reading heap memory
outside allocated boundaries, which can easily cause segmentation faults
when porting the game to a system with virtual memory. Therefore, these
would not just be removed in this game's Anniversary Edition, but on the
more conservative debloated branch as well. Since the game
never uses these tiles and you can't observe them unless you manipulate
text RAM from outside the confines of the game, it's not a bug
according to our definition.
To reduce the memory required for a map, tiles are arranged into fixed
vertical sections of a game-specific constant size.
The 6 24×8-tile sections defined in TH02's STAGE0.MAP, in
reverse order compared to how they're defined in the file. Note the
duplicated row at the top of the final section: The boss fight starts
once the game scrolled the last full row of tiles onto the top of the
screen, not the playfield. But since the PC-98 text chip
covers the top tile row of the screen with black cells, this final row
is never visible, which effectively reduces a map's final tile section
to 7 rows rather than 8.
The actual stage map then is simply a list of these tile sections,
ordered from the start/bottom to the top/end.
Any manipulation of specific tiles within the fixed tile sections has to
be hardcoded. An example can be found right in Stage 1, where the Shrine
Tank leaves track marks on the tiles it appears to drive over:
This video also shows off the two issues with Touhou's first-ever
midboss: The replaced tiles are rendered below the midboss
during their first 4 frames, and maybe ZUN should have stopped the
tile replacements one row before the timeout. The first one is
clearly a bug, but it's not so clear-cut with the second one. I'd
need to look at the code to tell for sure whether it's a quirk or a
bug.
The differences between the three games can best be summarized in a table:
TH02
TH04
TH05
Tile image file extension
.MPN
Tile section format
.MAP
Tile section order defined as part of
.DT1
.STD
Tile section index format
0-based ID
0-based ID × 2
Tile image index format
Index between 0 and 100, 1 byte
VRAM offset in tile source area, 2 bytes
Scroll speed control
Hardcoded
Part of the .STD format, defined per referenced tile
section
Redraw granularity
Full tiles (16×16)
Half tiles (16×8)
Rows per tile section
8
5
Maximum number of tile sections
16
32
Lowest number of tile sections used
5 (Stage 3 / Extra)
8 (Stage 6)
11 (Stage 2 / 4)
Highest number of tile sections used
13 (Stage 4)
19 (Extra)
24 (Stage 3)
Maximum length of a map
320 sections (static buffer)
256 sections (format limitation)
Shortest map
14 sections (Stage 5)
20 sections (Stage 5)
15 sections (Stage 2)
Longest map
143 sections (Stage 4)
95 sections (Stage 4)
40 sections (Stage 1 / 4 / Extra)
The most interesting part about stage tiles is probably the fact that some
of the .MAP files contain unused tile sections. 👀 Many
of these are empty, duplicates, or don't really make sense, but a few
are unique, fit naturally into their respective stage, and might have
been part of the map during development. In TH02, we can find three unused
sections in Stage 5:
The non-empty tile sections defined in TH02's STAGE4.MAP,
showing off three unused ones.
These unused tile sections are much more common in the later games though,
where we can find them in TH04's Stage 3, 4, and 5, and TH05's Stage 1, 2,
and 4. I'll document those once I get to finalize the tile rendering code of
these games, to leave some more content for that blog post. TH04/TH05 tile
code would be quite an effective investment of your money in general, as
most of it is identical across both games. Or how about going for a full-on
PC-98 Touhou map viewer and editor GUI?
Compared to TH04 and TH05, TH02's stage tile code definitely feels like ZUN
was just starting to understand how to pull off smooth vertical scrolling on
a PC-98. As such, it comes with a few inefficiencies and suboptimal
implementation choices:
The redraw flag for each tile is stored in a 24×25 bool
array that does nothing with 7 of the 8 bits.
During bombs and the Stage 4, 5, and Extra bosses, the game disables the
tile system to render more elaborate backgrounds, which require the
playfield to be flood-filled with a single color on every frame. ZUN uses
the GRCG's RMW mode rather than TDW mode for this, leaving almost half of
the potential performance on the table for no reason. Literally,
changing modes only involves changing a single constant.
The scroll speed could theoretically be changed at any time. However,
the function that scrolls in new stage tiles can only ever blit part of a
single tile row during every call, so it's up to the caller to ensure
that scrolling always ends up on an exact 16-pixel boundary. TH02 avoids
this problem by keeping the scroll speed constant across a stage, using 2
pixels for Stage 4 and 1 pixel everywhere else.
Since the scroll speed is given in pixels, the slowest speed would be 1
pixel per frame. To allow the even slower speeds seen in the final game,
TH02 adds a separate scroll interval variable that only runs the
scroll function every 𝑛th frame, effectively adding a prescaler to the
scroll speed. In TH04 and TH05, the speed is specified as a Q12.4 value
instead, allowing true fractional speeds at any multiple of
1/16 pixels. This also necessitated a fixed algorithm
that correctly blits tile lines from two rows.
Finally, we've got a few inconsistencies in the way the code handles the
two VRAM pages, which cause a few unnecessary tiles to be rendered to just
one of the two pages. Mentioning that just in case someone tries to play
this game with a fully cleared text RAM and wonders where the flickering
tiles come from.
Even though this was ZUN's first attempt at scrolling tiles, he already saw
it fit to write most of the code in assembly. This was probably a reaction
to all of TH01's performance issues, and the frame rate reduction
workarounds he implemented to keep the game from slowing down too much in
busy places. "If TH01 was all C++ and slow, TH02 better contain more ASM
code, and then it will be fast, right?"
Another reason for going with ASM might be found in the kind of
documentation that may have been available to ZUN. Last year, the PC-98
community discovered and scanned two new game programming tutorial books
from 1991 (1, 2).
Their example code is not only entirely written in assembly, but restricts
itself to the bare minimum of x86 instructions that were available on the
8086 CPU used by the original PC-9801 model 9 years earlier. Such code is
not only suboptimal
on the 486, but can often be actually worse than what your C++
compiler would generate. TH02 is where the trend of bad hand-written ASM
code started, and it
📝 only intensified in ZUN's later games. So,
don't copy code from these books unless you absolutely want to target the
earlier 8086 and 286 models. Which,
📝 as we've gathered from the recent blitting benchmark results,
are not all too common among current real-hardware owners.
That said, all that ASM code really only impacts readability and
maintainability. Apart from the aforementioned issues, the algorithms
themselves are mostly fine – especially since most EGC and GRCG operations
are decently batched this time around, in contrast to TH01.
Luckily, the tile functions merely use inline assembly within a
typical C function and can therefore be at least part of a C++ source file,
even if the result is pretty ugly. This time, we can actually be sure that
they weren't written directly in a .ASM file, because they feature x86
instruction encodings that can only be generated with Turbo C++ 4.0J's
inline assembler, not with TASM. The same can't unfortunately be said about
the following function in the same segment, which marks the tiles covered by
the spark sprites for redrawing. In this one, it took just one dumb hand-written ASM
inconsistency in the function's epilog to make the entire function
undecompilable.
The standard x86 instruction sequence to set up a stack frame in a function prolog looks like this:
PUSH BP
MOV BP, SP
SUB SP, ?? ; if the function needs the stack for local variables
When compiling without optimizations, Turbo C++ 4.0J will
replace this sequence with a single ENTER instruction. That one
is two bytes smaller, but much slower on every x86 CPU except for the 80186
where it was introduced.
In functions without local variables, BP and SP
remain identical, and a single POP BP is all that's needed in
the epilog to tear down such a stack frame before returning from the
function. Otherwise, the function needs an additional MOV SP,
BP instruction to pop all local variables. With x86 being the helpful
CISC architecture that it is, the 80186 also introduced the
LEAVE instruction to perform both tasks. Unlike
ENTER, this single instruction
is faster than the raw two instructions on a lot of x86 CPUs (and
even current ones!), and it's always smaller, taking up just 1 byte instead
of 3. So what if you use LEAVE even if your function
doesn't use local variables? The fact that the
instruction first does the equivalent of MOV SP, BP doesn't
matter if these registers are identical, and who cares about the additional
CPU cycles of LEAVE compared to just POP BP,
right? So that's definitely something you could theoretically do, but
not something that any compiler would ever generate.
And so, TH02 MAIN.EXE decompilation already hits the first
brick wall after two pushes. Awesome! Theoretically,
we could slowly mash through this wall using the 📝 code generator. But having such an inconsistency in the
function epilog would mean that we'd have to keep Turbo C++ 4.0J from
emitting any epilog or prolog code so that we can write our
own. This means that we'd once again have to hide any use of the
SI and DI registers from the compiler… and doing
that requires code generation macros for 22 of the 49 instructions of
the function in question, almost none of which we currently have. So, this
gets quite silly quite fast, especially if we only need to do it
for one single byte.
Instead, wouldn't it be much better if we had a separate build step between
compile and link time that allowed us to replicate mistakes like these by
just patching the compiled .OBJ files? These files still contain the names
of exported functions for linking, which would allow us to look up the code
of a function in a robust manner, navigate to specific instructions using a
disassembler, replace them, and write the modified .OBJ back to disk before
linking. Such a system could then naturally expand to cover all other
decompilation issues, culminating in a full-on optimizer that could even
recreate ZUN's self-modifying code. At that point, we would have sealed away
all of ZUN's ugly ASM code within a separate build step, and could finally
decompile everything into readable C++.
Pulling that off would require a significant tooling investment though.
Patching that one byte in TH02's spark invalidation function could be done
within 1 or 2 pushes, but that's just one issue, and we currently have 32
other .ASM files with undecompilable code. Also, note that this is
fundamentally different from what we're doing with the
debloated branch and the Anniversary Editions. Mistake patching
would purely be about having readable code on master that
compiles into ZUN's exact binaries, without fixing weird
code. The Anniversary Editions go much further and rewrite such code in
a much more fundamental way, improving it further than mistake patching ever
could.
Right now, the Anniversary Editions seem much more
popular, which suggests that people just want 100% RE as fast as
possible so that I can start working on them. In that case, why bother with
such undecompilable functions, and not just leave them in raw and unreadable
x86 opcode form if necessary… But let's first
see how much backer support there actually is for mistake patching before
falling back on that.
The best part though: Once we've made a decision and then covered TH02's
spark and particle systems, that was it, and we will have already RE'd
all ZUN-written PC-98-specific blitting code in this game. Every further
sprite or shape is rendered via master.lib, and is thus decently abstracted.
Guess I'll need to update
📝 the assessment of which PC-98 Touhou game is the easiest to port,
because it sure isn't TH01, as we've seen with all the work required for the first Anniversary Edition build.
Until then, there are still enough parts of the game that don't use any of
the remaining few functions in the _TEXT segment. Previously, I
mentioned in the 📝 status overview blog post
that TH02 had a seemingly weird sprite system, but the spark and point popup
() structures showed that the game just
stores the current and previous position of its entities in a slightly
different way compared to the rest of PC-98 Touhou. Instead of having
dedicated structure fields, TH02 uses two-element arrays indexed with the
active VRAM page. Same thing, and such a pattern even helps during RE since
it's easy to spot once you know what to look for.
There's not much to criticize about the point popup system, except for maybe
a landmine that causes sprite glitches when trying to display more than
99,990 points. Sadly, the final push in this delivery was rounded out by yet
another piece of code at the opposite end of the quality spectrum. The
particle and smear effects for Reimu's bomb animations consist almost
entirely of assembly bloat, which would just be replaced with generic calls
to the generic blitter in this game's future Anniversary Edition.
If I continue to decompile TH02 while avoiding the brick wall, items would
be next, but they probably require two pushes. Next up, therefore:
Integrating Stripe as an alternative payment provider into the order form.
There have been at least three people who reported issues with PayPal, and
Stripe has been working much better in tests. In the meantime, here's a temporary Stripe
order link for everyone. This one is not connected to the cap yet, so
please make sure to stay within whatever value is currently shown on the
front page – I will treat any excess money as donations.
If there's some time left afterward, I might
also add some small improvements to the TH01 Anniversary Edition.
More than three months without any reverse-engineering progress! It's been
way too long. Coincidentally, we're at least back with a surprising 1.25% of
overall RE, achieved within just 3 pushes. The ending script system is not
only more or less the same in TH04 and TH05, but actually originated in
TH03, where it's also used for the cutscenes before stages 8 and 9. This
means that it was one of the final pieces of code shared between three of
the four remaining games, which I got to decompile at roughly 3× the usual
speed, or ⅓ of the price.
The only other bargains of this nature remain in OP.EXE. The
Music Room is largely equivalent in all three remaining games as well, and
the sound device selection, ZUN Soft logo screens, and main/option menus are
the same in TH04 and TH05. A lot of that code is in the "technically RE'd
but not yet decompiled" ASM form though, so it would shift Finalized% more
significantly than RE%. Therefore, make sure to order the new
Finalization option rather than Reverse-engineering if you
want to make number go up.
So, cutscenes. On the surface, the .TXT files look simple enough: You
directly write the text that should appear on the screen into the file
without any special markup, and add commands to define visuals, music, and
other effects at any place within the script. Let's start with the basics of
how text is rendered, which are the same in all three games:
First off, the text area has a size of 480×64 pixels. This means that it
does not correspond to the tiled area painted into TH05's
EDBK?.PI images:
The yellow area is designated for character names.
Since the font weight can be customized, all text is rendered to VRAM.
This also includes gaiji, despite them ignoring the font weight
setting.
The system supports automatic line breaks on a per-glyph basis, which
move the text cursor to the beginning of the red text area. This might seem like a piece of long-forgotten
ancient wisdom at first, considering the absence of automatic line breaks in
Windows Touhou. However, ZUN probably implemented it more out of pure
necessity: Text in VRAM needs to be unblitted when starting a new box, which
is way more straightforward and performant if you only need to worry
about a fixed area.
The system also automatically starts a new (key press-separated) text
box after the end of the 4th line. However, the text cursor is
also unconditionally moved to the top-left corner of the yellow name
area when this happens, which is almost certainly not what you expect, given
that automatic line breaks stay within the red area. A script author might
as well add the necessary text box change commands manually, if you're
forced to anticipate the automatic ones anyway…
Due to ZUN forgetting an unblitting call during the TH05 refactoring of the
box background buffer, this feature is even completely broken in that game,
as any new text will simply be blitted on top of the old one:
Wait, why are we already talking about game-specific differences after
all? Also, note how the ⏎ animation appears one line below where you'd
expect it.
Overall, the system is geared toward exclusively full-width text. As
exemplified by the 2014 static English patches and the screenshots in this
blog post, half-width text is possible, but comes with a lot of
asterisks attached:
Each loop of the script interpreter starts by looking at the next
byte to distinguish commands from text. However, this step also skips
over every ASCII space and control character, i.e., every byte
≤ 32. If you only intend to display full-width glyphs anyway, this
sort of makes sense: You gain complete freedom when it comes to the
physical layout of these script files, and it especially allows commands
to be freely separated with spaces and line breaks for improved
readability. Still, enforcing commands to be separated exclusively by
line breaks might have been even better for readability, and would have
freed up ASCII spaces for regular text…
Non-command text is blindly processed and rendered two bytes at a
time. The rendering function interprets these bytes as a Shift-JIS
string, so you can use half-width characters here. While the
second byte can even be an ASCII 0x20 space due to the
parser's blindness, all half-width characters must still occur in pairs
that can't be interrupted by commands:
As a workaround for at least the ASCII space issue, you can replace
them with any of the unassigned
Shift-JIS lead bytes – 0x80, 0xA0, or
anything between 0xF0 and 0xFF inclusive.
That's what you see in all screenshots of this post that display
half-width spaces.
Finally, did you know that you can hold ESC to fast-forward
through these cutscenes, which skips most frame delays and reduces the rest?
Due to the blocking nature of all commands, the ESC key state is
only updated between commands or 2-byte text groups though, so it can't
interrupt an ongoing delay.
Superficially, the list of game-specific differences doesn't look too long,
and can be summarized in a rather short table:
It's when you get into the implementation that the combined three systems
reveal themselves as a giant mess, with more like 56 differences between the
games. Every single new weird line of code opened up
another can of worms, which ultimately made all of this end up with 24
pieces of bloat and 14 bugs. The worst of these should be quite interesting
for the general PC-98 homebrew developers among my audience:
The final official 0.23 release of master.lib has a bug in
graph_gaiji_put*(). To calculate the JIS X 0208 code point for
a gaiji, it is enough to ADD 5680h onto the gaiji ID. However,
these functions accidentally use ADC instead, which incorrectly
adds the x86 carry flag on top, causing weird off-by-one errors based on the
previous program state. ZUN did fix this bug directly inside master.lib for
TH04 and TH05, but still needed to work around it in TH03 by subtracting 1
from the intended gaiji ID. Anyone up for maintaining a bug-fixed master.lib
repository?
The worst piece of bloat comes from TH03 and TH04 needlessly
switching the visibility of VRAM pages while blitting a new 320×200 picture.
This makes it much harder to understand the code, as the mere existence of
these page switches is enough to suggest a more complex interplay between
the two VRAM pages which doesn't actually exist. Outside this visibility
switch, page 0 is always supposed to be shown, and page 1 is always used
for temporarily storing pixels that are later crossfaded onto page 0. This
is also the only reason why TH03 has to render text and gaiji onto both VRAM
pages to begin with… and because TH04 doesn't, changing the picture in the
middle of a string of text is technically bugged in that game, even though
you only get to temporarily see the new text on very underclocked PC-98
systems.
These performance implications made me wonder why cutscenes even bother with
writing to the second VRAM page anyway, before copying each crossfade step
to the visible one.
📝 We learned in June how costly EGC-"accelerated" inter-page copies are;
shouldn't it be faster to just blit the image once rather than twice?
Well, master.lib decodes .PI images into a packed-pixel format, and
unpacking such a representation into bitplanes on the fly is just about the
worst way of blitting you could possibly imagine on a PC-98. EGC inter-page
copies are already fairly disappointing at 42 cycles for every 16 pixels, if
we look at the i486 and ignore VRAM latencies. But under the same
conditions, packed-pixel unpacking comes in at 81 cycles for every 8
pixels, or almost 4× slower. On lower-end systems, that can easily sum up to
more than one frame for a 320×200 image. While I'd argue that the resulting
tearing could have been an acceptable part of the transition between two
images, it's understandable why you'd want to avoid it in favor of the
pure effect on a slower framerate.
Really makes me wonder why master.lib didn't just directly decode .PI images
into bitplanes. The performance impact on load times should have been
negligible? It's such a good format for
the often dithered 16-color artwork you typically see on PC-98, and
deserves better than master.lib's implementation which is both slow to
decode and slow to blit.
That brings us to the individual script commands… and yes, I'm going to
document every single one of them. Some of their interactions and edge cases
are not clear at all from just looking at the code.
Almost all commands are preceded by… well, a 0x5C lead byte.
Which raises the question of whether we should
document it as an ASCII-encoded \ backslash, or a Shift-JIS-encoded
¥ yen sign. From a gaijin perspective, it seems obvious that it's a
backslash, as it's consistently displayed as one in most of the editors you
would actually use nowadays. But interestingly, iconv
-f shift-jis -t utf-8 does convert any 0x5C
lead bytes to actual ¥ U+00A5 YEN SIGN code points
.
Ultimately, the distinction comes down to the font. There are fonts
that still render 0x5C as ¥, but mainly do so out
of an obvious concern about backward compatibility to JIS X 0201, where this
mapping originated. Unsurprisingly, this group includes MS Gothic/Mincho,
the old Japanese fonts from Windows 3.1, but even Meiryo and Yu
Gothic/Mincho, Microsoft's modern Japanese fonts. Meanwhile, pretty much
every other modern font, and freely licensed ones in particular, render this
code point as \, even if you set your editor to Shift-JIS. And
while ZUN most definitely saw it as a ¥, documenting this code
point as \ is less ambiguous in the long run. It can only
possibly correspond to one specific code point in either Shift-JIS or UTF-8,
and will remain correct even if we later mod the cutscene system to support
full-blown Unicode.
Now we've only got to clarify the parameter syntax, and then we can look at
the big table of commands:
Numeric parameters are read as sequences of up to 3 ASCII digits. This
limits them to a range from 0 to 999 inclusive, with 000 and
0 being equivalent. Because there's no further sentinel
character, any further digit from the 4th one onwards is
interpreted as regular text.
Filename parameters must be terminated with a space or newline and are
limited to 12 characters, which translates to 8.3 basenames without any
directory component. Any further characters are ignored and displayed as
text as well.
Each .PI image can contain up to four 320×200 pictures ("quarters") for
the cutscene picture area. In the script commands, they are numbered like
this:
0
1
2
3
\@
Clears both VRAM pages by filling them with VRAM color 0. 🐞
In TH03 and TH04, this command does not update the internal text area
background used for unblitting. This bug effectively restricts usage of
this command to either the beginning of a script (before the first
background image is shown) or its end (after no more new text boxes are
started). See the image below for an
example of using it anywhere else.
\b2
Sets the font weight to a value between 0 (raw font ROM glyphs) to 3
(very thicc). Specifying any other value has no effect.
🐞 In TH04 and TH05, \b3 leads to glitched pixels when
rendering half-width glyphs due to a bug in the newly micro-optimized
ASM version of
📝 graph_putsa_fx(); see the image below for an example.
In these games, the parameter also directly corresponds to the
graph_putsa_fx() effect function, removing the sanity check
that was present in TH03. In exchange, you can also access the four
dissolve masks for the bold font (\b2) by specifying a
parameter between 4 (fewest pixels) to 7 (most
pixels). Demo video below.
\c15
Changes the text color to VRAM color 15.
\c=字,15
Adds a color map entry: If 字 is the first code point
inside the name area on a new line, the text color is automatically set
to 15. Up to 8 such entries can be registered
before overflowing the statically allocated buffer.
🐞 The comma is assumed to be present even if the color parameter is omitted.
\e0
Plays the sound effect with the given ID.
\f
(no-op)
\fi1
\fo1
Calls master.lib's palette_black_in() or
palette_black_out() to play a hardware palette fade
animation from or to black, spending roughly 1 frame on each of the 16 fade steps.
\fm1
Fades out BGM volume via PMD's AH=02h interrupt call,
in a non-blocking way. The fade speed can range from 1 (slowest) to 127 (fastest).
Values from 128 to 255 technically correspond to
AH=02h's fade-in feature, which can't be used from cutscene
scripts because it requires BGM volume to first be lowered via
AH=19h, and there is no command to do that.
\g8
Plays a blocking 8-frame screen shake
animation.
\ga0
Shows the gaiji with the given ID from 0 to 255
at the current cursor position. Even in TH03, gaiji always ignore the
text delay interval configured with \v.
@3
TH05's replacement for the \ga command from TH03 and
TH04. The default ID of 3 corresponds to the
gaiji. Not to be confused with \@, which starts with a backslash,
unlike this command.
@h
Shows the gaiji.
@t
Shows the gaiji.
@!
Shows the gaiji.
@?
Shows the gaiji.
@!!
Shows the gaiji.
@!?
Shows the gaiji.
\k0
Waits 0 frames (0 = forever) for an advance key to be pressed before
continuing script execution. Before waiting, TH05 crossfades in any new
text that was previously rendered to the invisible VRAM page…
🐞 …but TH04 doesn't, leaving the text invisible during the wait time.
As a workaround, \vp1 can be
used before \k to immediately display that text without a
fade-in animation.
\m$
Stops the currently playing BGM.
\m*
Restarts playback of the currently loaded BGM from the
beginning.
\m,filename
Stops the currently playing BGM, loads a new one from the given
file, and starts playback.
\n
Starts a new line at the leftmost X coordinate of the box, i.e., the
start of the name area. This is how scripts can "change" the name of the
currently speaking character, or use the entire 480×64 pixels without
being restricted to the non-name area.
Note that automatic line breaks already move the cursor into a new line.
Using this command at the "end" of a line with the maximum number of 30
full-width glyphs would therefore start a second new line and leave the
previously started line empty.
If this command moved the cursor into the 5th line of a box,
\s is executed afterward, with
any of \n's parameters passed to \s.
\p
(no-op)
\p-
Deallocates the loaded .PI image.
\p,filename
Loads the .PI image with the given file into the single .PI slot
available to cutscenes. TH04 and TH05 automatically deallocate any
previous image, 🐞 TH03 would leak memory without a manual prior call to
\p-.
\pp
Sets the hardware palette to the one of the loaded .PI image.
\p@
Sets the loaded .PI image as the full-screen 640×400 background
image and overwrites both VRAM pages with its pixels, retaining the
current hardware palette.
\p=
Runs \pp followed by \p@.
\s0
\s-
Ends a text box and starts a new one. Fades in any text rendered to
the invisible VRAM page, then waits 0 frames
(0 = forever) for an advance key to be
pressed. Afterward, the new text box is started with the cursor moved to
the top-left corner of the name area. \s- skips the wait time and starts the new box
immediately.
\t100
Sets palette brightness via master.lib's
palette_settone() to any value from 0 (fully black) to 200
(fully white). 100 corresponds to the palette's original colors.
Preceded by a 1-frame delay unless ESC is held.
\v1
Sets the number of frames to wait between every 2 bytes of rendered
text.
Sets the number of frames to spend on each of the 4 fade
steps when crossfading between old and new text. The game-specific
default value is also used before the first use of this command.
\v2
\vp0
Shows VRAM page 0. Completely useless in
TH03 (this game always synchronizes both VRAM pages at a command
boundary), only of dubious use in TH04 (for working around a bug in \k), and the games always return to
their intended shown page before every blitting operation anyway. A
debloated mod of this game would just remove this command, as it exposes
an implementation detail that script authors should not need to worry
about. None of the original scripts use it anyway.
\w64
\w and \wk wait for the given number
of frames
\wm and \wmk wait until PMD has played
back the current BGM for the total number of measures, including
loops, given in the first parameter, and fall back on calling
\w and \wk with the second parameter as
the frame number if BGM is disabled.
🐞 Neither PMD nor MMD reset the internal measure when stopping
playback. If no BGM is playing and the previous BGM hasn't been
played back for at least the given number of measures, this command
will deadlock.
Since both TH04 and TH05 fade in any new text from the invisible VRAM
page, these commands can be used to simulate TH03's typing effect in
those games. Demo video below.
Contrary to \k and \s, specifying 0 frames would
simply remove any frame delay instead of waiting forever.
The TH03-exclusive k variants allow the delay to be
interrupted if ⏎ Return or Shot are held down.
TH04 and TH05 recognize the k as well, but removed its
functionality.
All of these commands have no effect if ESC is held.
\wm64,64
\wk64
\wmk64,64
\wi1
\wo1
Calls master.lib's palette_white_in() or
palette_white_out() to play a hardware palette fade
animation from or to white, spending roughly 1 frame on each of the 16 fade steps.
\=4
Immediately displays the given quarter of the loaded .PI image in
the picture area, with no fade effect. Any value ≥ 4 resets the picture area to black.
\==4,1
Crossfades the picture area between its current content and quarter
#4 of the loaded .PI image, spending 1 frame on each of the 4 fade steps unless
ESC is held. Any value ≥ 4 is
replaced with quarter #0.
\$
Stops script execution. Must be called at the end of each file;
otherwise, execution continues into whatever lies after the script
buffer in memory.
TH05 automatically deallocates the loaded .PI image, TH03 and TH04
require a separate manual call to \p- to not leak its memory.
Bold values signify the default if the parameter
is omitted; \c is therefore
equivalent to \c15.
The \@ bug. Yes, the ¥ is fake. It
was easier to GIMP it than to reword the sentences so that the backslashes
landed on the second byte of a 2-byte half-width character pair.
The font weights and effects available through \b, including the glitch with
\b3 in TH04 and TH05.
Font weight 3 is technically not rendered correctly in TH03 either; if
you compare 1️⃣ with 4️⃣, you notice a single missing column of pixels
at the left side of each glyph, which would extend into the previous
VRAM byte. Ironically, the TH04/TH05 version is more correct in
this regard: For half-width glyphs, it preserves any further pixel
columns generated by the weight functions in the high byte of the 16-dot
glyph variable. Unlike TH03, which still cuts them off when rendering
text to unaligned X positions (3️⃣), TH04 and TH05 do bit-rotate them
towards their correct place (4️⃣). It's only at byte-aligned X positions
(2️⃣) where they remain at their internally calculated place, and appear
on screen as these glitched pixel columns, 15 pixels away from the glyph
they belong to. It's easy to blame bugs like these on micro-optimized
ASM code, but in this instance, you really can't argue against it if the
original C++ version was equally incorrect.
Combining \b and s- into a partial dissolve
animation. The speed can be controlled with \v.
Simulating TH03's typing effect in TH04 and TH05 via \w. Even prettier in TH05 where we
also get an additional fade animation
after the box ends.
So yeah, that's the cutscene system. I'm dreading the moment I will have to
deal with the other command interpreter in these games, i.e., the
stage enemy system. Luckily, that one is completely disconnected from any
other system, so I won't have to deal with it until we're close to finishing
MAIN.EXE… that is, unless someone requests it before. And it
won't involve text encodings or unblitting…
The cutscene system got me thinking in greater detail about how I would
implement translations, being one of the main dependencies behind them. This
goal has been on the order form for a while and could soon be implemented
for these cutscenes, with 100% PI being right around the corner for the TH03
and TH04 cutscene executables.
Once we're there, the "Virgin" old-school way of static translation patching
for Latin-script languages could be implemented fairly quickly:
Establish basic UTF-8 parsing for less painful manual editing of the
source files
Procedurally generate glyphs for the few required additional letters
based on existing font ROM glyphs. For example, we'd generate ä
by painting two short lines on top of the font ROM's a glyph,
or generate ¿ by vertically flipping the question mark. This
way, the text retains a consistent look regardless of whether the translated
game is run with an NEC or EPSON font ROM, or the that Neko Project II auto-generates if you
don't provide either.
(Optional) Change automatic line breaks to work on a per-word
basis, rather than per-glyph
That's it – script editing and distribution would be handled by your local
translation group. It might seem as if this would also work for Greek and
Cyrillic scripts due to their presence in the PC-98 font ROM, but I'm not
sure if I want to attempt procedurally shrinking these glyphs from 16×16 to
8×16… For any more thorough solution, we'd need to go for a more "Chad" kind
of full-blown translation support:
Implement text subdivisions at a sensible granularity while retaining
automatic line and box breaks
Compile translatable text into a Japanese→target language dictionary
(I'm too old to develop any further translation systems that would overwrite
modded source text with translations of the original text)
Implement a custom Unicode font system (glyphs would be taken from GNU
Unifont unless translators provide a different 8×16 font for their
language)
Combine the text compiler with the font compiler to only store needed
glyphs as part of the translation's font file (dealing with a multi-MB font
file would be rather ugly in a Real Mode game)
Write a simple install/update/patch stacking tool that supports both
.HDI and raw-file DOSBox-X scenarios (it's different enough from thcrap to
warrant a separate tool – each patch stack would be statically compiled into
a single package file in the game's directory)
Add a nice language selection option to the main menu
(Optional) Support proportional fonts
Which sounds more like a separate project to be commissioned from
Touhou Patch Center's Open Collective funds, separate from the ReC98 cap.
This way, we can make sure that the feature is completely implemented, and I
can talk with every interested translator to make sure that their language
works.
It's still cheaper overall to do this on PC-98 than to first port the games
to a modern system and then translate them. On the other hand, most
of the tasks in the Chad variant (3, 4, 5, and half of 2) purely deal with
the difficulty of getting arbitrary Unicode characters to work natively in a
PC-98 DOS game at all, and would be either unnecessary or trivial if we had
already ported the game. Depending on where the patrons' interests lie, it
may not be worth it. So let's see what all of you think about which
way we should go, or whether it's worth doing at all. (Edit
(2022-12-01): With Splashman's
order towards the stage dialogue system, we've pretty much confirmed that it
is.) Maybe we want to meet in the middle – using e.g. procedural glyph
generation for dynamic translations to keep text rendering consistent with
the rest of the PC-98 system, and just not support non-Latin-script
languages in the beginning? In any case, I've added both options to the
order form. Edit (2023-07-28):Touhou Patch Center has agreed to fund
a basic feature set somewhere between the Virgin and Chad level. Check the
📝 dedicated announcement blog post for more
details and ideas, and to find out how you can support this goal!
Surprisingly, there was still a bit of RE work left in the third push after
all of this, which I filled with some small rendering boilerplate. Since I
also wanted to include TH02's playfield overlay functions,
1/15 of that last push went towards getting a
TH02-exclusive function out of the way, which also ended up including that
game in this delivery.
The other small function pointed out how TH05's Stage 5 midboss pops into
the playfield quite suddenly, since its clipping test thinks it's only 32
pixels tall rather than 64:
Good chance that the pop-in might have been intended. Edit (2023-06-30): Actually, it's a
📝 systematic consequence of ZUN having to work around the lack of clipping in master.lib's sprite functions.
There's even another quirk here: The white flash during its first frame
is actually carried over from the previous midboss, which the
game still considers as actively getting hit by the player shot that
defeated it. It's the regular boilerplate code for rendering a
midboss that resets the responsible damage variable, and that code
doesn't run during the defeat explosion animation.
Next up: Staying with TH05 and looking at more of the pattern code of its
boss fights. Given the remaining TH05 budget, it makes the most sense to
continue in in-game order, with Sara and the Stage 2 midboss. If more money
comes in towards this goal, I could alternatively go for the Mai & Yuki
fight and immediately develop a pretty fix for the cheeto storage
glitch. Also, there's a rather intricate
pull request for direct ZMBV decoding on the website that I've still got
to review…
🎉 TH05 is finally fully position-independent! 🎉 To celebrate this
milestone, -Tom- coded a little demo, which we recorded on
both an emulator and on real PC-98 hardware:
You can now freely add or remove both data and code anywhere in TH05, by
editing the ReC98 codebase, writing your mod in ASM or C/C++, and
recompiling the code. Since all absolute memory addresses have now been
converted to labels, this will work without causing any instability. See
the position independence section in the FAQ
for a more thorough explanation about why this was a problem.
By extension, this also means that it's now theoretically possible
to use a different compiler on the source code. But:
What does this not mean?
The original ZUN code hasn't been completely reverse-engineered yet, let
alone decompiled. As the final PC-98 Touhou game, TH05 also happens to
have the largest amount of actual ZUN-written ASM that can't ever
be decompiled within ReC98's constraints of a legit source code
reconstruction. But a lot of the originally-in-C code is also still in
ASM, which might make modding a bit inconvenient right now. And while I
have decompiled a bunch of functions, I selected them largely
because they would help with PI (as requested by the backers), and not
because they are particularly relevant to typical modding interests.
As a result, the code might also be a bit confusingly organized. There's
quite a conflict between various goals there: On the one hand, I'd like to
only have a single instance of every function shared with earlier games,
as well as reduce ZUN's code duplication within a single game. On the
other hand, this leads to quite a lot of code being scattered all over the
place and then #include-pasted back together, except for the
places where
📝 this doesn't work, and you'd have to use multiple translation units anyway…
I'm only beginning to figure out the best structure here, and some more
reverse-engineering attention surely won't hurt.
Also, keep in mind that the code still targets x86 Real Mode. To work
effectively in this codebase, you'd need some familiarity with
memory
segmentation, and how to express it all in code. This tends to make
even regular C++ development about an order of magnitude harder,
especially once you want to interface with the remaining ASM code. That
part made -Tom- struggle quite a bit with implementing his
custom scripting language for the demo above. For now, he built that demo
on quite a limited foundation – which is why he also chose to release
neither the build nor the source publically for the time being.
So yeah, you're definitely going to need the TASM and Borland C++ manuals
there.
tl;dr: We now know everything about this game's data, but not quite
as much about this game's code.
So, how long until source ports become a realistic project?
You probably want to wait for 100% RE, which is when everything
that can be decompiled has been decompiled.
Unless your target system is 16-bit Windows, in which case you could
theoretically start right away. 📝 Again,
this would be the ideal first system to port PC-98 Touhou to: It would
require all the generic portability work to remove the dependency on PC-98
hardware, thus paving the way for a subsequent port to modern systems,
yet you could still just drop in any undecompiled ASM.
Porting to IBM-compatible DOS would only be a harder and less universally
useful version of that. You'd then simply exchange one architecture, with
its idiosyncrasies and limits, for another, with its own set of
idiosyncrasies and limits. (Unless, of course, you already happen to be
intimately familiar with that architecture.) The fact that master.lib
provides DOS/V support would have only mattered if ZUN consistently used
it to abstract away PC-98 hardware at every single place in the code,
which is definitely not the case.
The list of actually interesting findings in this push is,
📝 again, very short. Probably the most
notable discovery: The low-level part of the code that renders Marisa's
laser from her TH04 Illusion Laser shot type is still present in
TH05. Insert wild mass guessing about potential beta version shot types…
Oh, and did you know that the order of background images in the Extra
Stage staff roll differs by character?
Next up: Finally driving up the RE% bar again, by decompiling some TH05
main menu code.