FAQ

πŸ”—

What is this about?

For now, the aim is to perfectly reconstruct the lost[citation needed] source code of the first five Touhou Project games by ZUN Soft (now Team Shanghai Alice), which were originally released exclusively for the NEC PC-9801 platform.

The original games being:

TH01:
ζ±ζ–Ήιˆη•°δΌγ€€ο½ž The Highly Responsive to Prayers (1997)
TH02:
ζ±ζ–Ήε°ι­”ιŒ²γ€€ο½ž the Story of Eastern Wonderland (1997)
TH03:
ζ±ζ–Ήε€’ζ™‚η©Ίγ€€ο½ž Phantasmagoria of Dim.Dream (1997)
TH04:
ζ±ζ–ΉεΉ»ζƒ³ιƒ·γ€€ο½ž Lotus Land Story (1998)
TH05:
東方ζ€ͺηΆΊθ«‡γ€€ο½ž Mystic Square (1998)

Since we only have the binaries, we obviously can't know how ZUN named any variables and functions, and which comments the original code was surrounded with. Perfect therefore means that the binaries compiled from the code in the ReC98 repository are indistinguishable from ZUN's original builds, making it impossible to disprove that the original code couldn't have looked like this. This property is maintained for every Git commit along the way.

Aside from the preservation angle and the resulting deep insight into the games' mechanics, the code can then serve as the foundation for any type of mod, or any port to non-PC-98 platforms, developed by the community.

πŸ”—

Who are you?

I created the Touhou Community Reliant Automatic Patcher and Touhou Patch Center in 2012, and remained a core developer of both before retiring in March 2019. Older Touhou fans might also remember me for Touhou Music Room (2010/2011) and the Touhou Vorbis Compressor (2011).

Check my GitHub page as well as the crowdfunding log here for more proof of my track record.

πŸ”—

What's the split between OP, MAIN and MAINE? What things are they responsible for? How do they interact with each other?

Since only one executable can be active at any given time, there has to be some way of sharing data between them. This is done using a resident "Config" structure, kept at the top of conventional DOS RAM. πŸ“ As of 2020-02-23, we've reverse-engineered the contents of this structure for all 5 games. (TH01, TH02, TH03, TH04, TH05)

πŸ”—

What is this "position independence" thing about?

Position dependence means that a binary's references to global variables are expressed as raw number constants, rather than being named with identifiers:

mov ax, some_data   ; Position-independent
mov ax, 1234h       ; Position-dependent; assumes that
                    ; [some_data] is at address 1234h
If you increase or decrease the number of bytes anywhere in the non-header parts of an executable, you'll end up breaking most of these position-dependent references, since global variables no longer are where the game expects them to be. This will lead to quite some instability.

Now, why is this such an issue for PC-98 Touhou? 16-bit x86 code has to take segmentation into account for all its memory accesses. This means that each actual address is built out of two 16-bit values, the segment and the offset. Since offsets therefore can only range from 0 to 216-1, the line between actual memory offsets and numeric constants becomes blurred. Most disassemblers I know of that target this architecture therefore only do a very superficial attempt at identifying data references, and give up once arrays are involved, just leaving a numeric constant in place of such a reference. And for good reason: Doing this properly effectively requires an emulator, running the game and performing control flow analysis. Anything less than that – especially anything parsing individual lines of ASM – and you're bound to

And even with an emulator, you're still faced with the fact that on the low level of ASM and C, the declared size of an array is simply advisory anyway. So what do you put, especially when being confronted with out-of-bounds array access bugs in the original game itself?

So if you've chosen to deliver quality instead of delivering experimental research, the best choice is to give up, don't pretend to be position-independent in the first place, and treat every numeric constant that falls within the range of any data segment as a possible memory reference. Sure, this means that the actual number of memory references are lower, and thus, the actual percentage of position-independence is higher than the front page may suggest. But we can't tell, and erring on the side of caution is, in my opinion, better than pretending that the code is more position-independent than it actually is, just because it ran through some sort of experimental tool.

While position-dependent code is still a significant step up from modding game binaries via hex-editing, it effectively still suffers from most of the same constraints, despite looking like regular source code that you can just arbitrarily edit and recompile. So while modding the game in all sorts of ways is definitely possible right now, it's definitely harder than it needs to be. Once a binary reaches 100% position independence though, developing any sort of mod, in either C/C++ or ASM, will become trivial.

πŸ”—

How is position independence calculated?

The absolute number is the sum of all remaining hexadecimal literals in all code segments of a binary's big .asm dump file that fall into all of these categories:

1) Matches the regex (-?[0-9][0-9a-fA-F]{1,4})h for hexadecimal literals in TASM/MASM syntax

IDA dumps all number literals β‰₯10 as hex by default. Restricting the PI calculation to hex numbers allows us to clearly mark false positives by simply converting those numbers to decimal. Having to do this manually further communicates that every such conversion was a conscious decision, based on the newly RE'd context the number is used in.

This might seem like useless work at first, only necessary because it's dictated by some counting algorithm on a website. However, most of those false positives turn out to be things like (sub)pixel coordinates, number of score points, frame counts… which the typical person does prefer and expect to be expressed in decimal. Thus, this conversion turns into quite a quality-of-life improvement for anyone reading and modding the code. Especially with the fixed-point 12.4 "subpixel" type used for playfield-space coordinates in TH03-TH05, which we can abstract away even at the ASM level.

2) Falls within the data segment ranges occupied by ZUN data

This means that structure size independence is an explicit non-goal of PI. The reason becomes clear if you look at all the things a 16-bit number literal can represent:

If we don't limit the value range to ZUN data, all the low numbers would vastly drown out the actual memory references we are trying to identify, resulting in a number that's even less representative of the code's actual position independence. And since structures can have any size, we'll necessarily have to leave them to reverse-engineering.

3) Not the argument to any x86 instruction unrelated to memory accesses

These include

πŸ”—

Why crowdfunding?

Personally, I gain nothing from this project – neither in the fun/happiness domain, given that I don't particularly like Touhou these days, nor in the professional/employment prospect domain. At this rate, serving a weeb audience any longer with ever more elaborate projects won't get me anywhere else in life that it already doesn't get me in after having done it for so long. All things considered, most real-life IT companies I talked to don't see these past Touhou-related projects as anything particularly special. Lately, I've been getting the feeling that I should have just written them in COBOL – now that would have certainly given me the reaction from RL I was hoping for!

That leaves crowdfunding as an ethical way to balance the fandom's interest in this project with me getting old and slowly but surely wanting to have more of a RL.

Another advantage is that it's you, the patrons, who decide which game to focus on. This is particularly relevant given that I don't really care which game I end up covering – see above.

πŸ”—

Can't a machine automate all this work? It all seems very blue-collar and mechanical.

Maybe. While it would have been an option to collect lots of money for developing an automated decompilation solution, that would have been a huge risk, and my previous attempts at it failed spectacularly. In contrast, selling small chunks of progress for an hourly wage leads to a stream of tiny, but immediate results. It may take longer in the end, but even partially reverse-engineered game code can be a tremendous help to modders. Also, naming variables, contextualizing numeric constants, and the resulting insights into the game mechanics, is something you simply can't get out of an automated solution.

Consider this piece of ASM:

; Somewhere…
	mov	byte_2CEC2, 40

; Somewhere else…
	cmp	byte_25351, 0
	jz	@@return_from_function
	; …
	cmp	byte_2CEC2, 0
	jz	@@down
	cmp	byte_2CEC2, 32
	jbe	@@return_from_function
	mov	byte_2CEC2, 0
	; …
@@down:
	dec	byte_25351

Now, I could simply decompile this into

// Somewhere…
	byte_2CEC2 = 40;

// Somewhere else…
	if(byte_25351 == 0) {
		return;
	}
	; …
	if(byte_2CEC2 != 0) {
		if(byte_2CEC2 <= 32) {
			return;
		}
		byte_2CEC2 = 0;
	}
	byte_25351--;

However, that doesn't really tell you anything that you couldn't already tell from looking at the assembly. After manually reverse-engineering the meaning of these variables, we learn that

And lo and behold, we just proved the existence of an 8-frame deathbomb window, ending up with an insight that's immediately valuable to many fans. Finally, let's define some symbols:

MISS_FRAMES = 32
DEATHBOMB_WINDOW = 8

; Somewhere…
	mov	_miss_time, MISS_FRAMES + DEATHBOMB_WINDOW

; Somewhere else…
	cmp	_bombs, 0
	jz	@@return_from_function
	; …
	cmp	_miss_time, 0
	jz	@@down
	cmp	_miss_time, MISS_FRAMES
	jbe	@@return_from_function
	mov	_miss_time, 0
	; …
@@down:
	dec	_bombs

And suddenly, it becomes both obvious and easily moddable to whoever reads the code, even while it's still assembly. This is the level I operate at. Decompilation only becomes mere syntactic sugar at this point.

πŸ”—

How long is this crowdfunding campaign going to run?

Indefinitely – and that's the beauty of it. Whenever someone is interested, they can insert a coin, and every in the backlog will then be turned into tangible progress.

Effectively, this project will run for as long as the market deems it valuable. Maybe we get enough to complete one game, maybe we won't. Maybe there will be no interest whatsoever for a few months, and then a small number of big transactions. Who knows.

In a way, this is therefore closer to art commissions than it is to your typical video game crowdfunding campaign.

πŸ”—

PC-98 emulation is getting better and better, DOSBox-X even has dynamic recompilation now. Are source ports of a single game series even worth it?

Again, you decide.

πŸ”—

Isn't uth05win already what you wanted to achieve? We even have source code for it.

Initially, I thought the same, and had the impression that uth05win's source code release would immediately obsolete ReC98.

However, while uth05win did indeed legitimately reverse-engineer most of TH05, the final port seems to have taken quite some liberties. I myself wouldn't know – for me, uth05win is still a tremendous help in reverse-engineering not only TH05, but also TH04 and, to a lesser extent, even the previous three games. Die-hard PC-98 Touhou fans, on the other hand, tend to immediately dismiss it as "not the real thing". Which, ironically, led to ReC98's approach of a provably legit source code reconstruction being appreciated more, not less, among this group of people.

Also, the obvious reason I don't restrict myself to just one game.

πŸ”—

Why do pushes that are geared towards one specific game also tend to come with progress in other games? Aren't you wasting time there by not focusing 100% on what your patrons wanted you to do?

If the same function appears in more than one game, more or less unchanged, I'd only be wasting time re-familiarizing myself with all the involved concepts months later. I think it makes more sense to immediately cover identical functionality in all games. It's basically free progress for everyone else.

Then again, the more progress is made, the more infrequently this will happen, as the amount of not yet reverse-engineered code shared between the games approaches zero.

πŸ”—

Do you have a refund policy?

Yes! You can request refunds for every push I haven't started working on yet. I will keep the money after having delivered a push though.

πŸ”—

Can't Team Shanghai Alice take down this project and crowdfunding at any time?

While I can't promise that they won't, the same kind of source code reconstruction has been done for the Generation I-III PokΓ©mon games, Super Mario 64, and Diablo, all of which still generate revenue for their right holders.

Keep in mind that the product is the code, in the form of new commits in a Git repository. Once again, I do not sell the promise of a finished Windows/Linux/phone port of any of the games.

Perform a git clone after I pushed the commits you bought, and you now have a DRM-free digital copy of the progress you paid for. Nothing I produce will ever be put behind a paywall. The only thing that is behind a paywall is the time it takes to make it all happen.

Also, consider this idea: Once the project is done, anyone can feel free to burn the reconstructed source code on a CD, and hand it to ZUN during some convention. Then, of course, ZUN has every right to commercially exploit it – which would be fine by me, as I will have been paid my fair share at that point.

πŸ”—

Can I still help out with the reverse-engineering by contributing to the ReC98 repository?

Yes! As stated above, this is not about me making lots of profit. The community will certainly thank you for driving the total estimate further down.

Although I'd really recommend you to please spend your time on a different project, which will bring you further in life that anything related to the main Touhou series ever will.

πŸ”—

Why a cap?

The cap corresponds to the maximum time I can healthily allocate to this project within the next 4 weeks. It is meant to

πŸ”—

With the current rate of progress, and the cap being at the low level that it is, the project is never going to finish!

Good news, as of 2019-10-14, progress seems to be speeding up now!

Aside from that, we'll just have to wait until I have more free time, I'm afraid.

πŸ”—

Some of the 2018 pushes were delivered months or even years after they were paid…

Back then, I not only didn't have a cap, but also vastly undersold myself, while also offering crowdfunded features for thcrap in parallel. That's why the latter are sometimes referred to in the old blog posts here. But compare that to now:

However, if you absolutely request me to prioritize an element of a game that requires a ton of not yet reverse-engineered knowledge to fully grasp, and you absolutely don't accept your money going to anything else, I will have to put that on the back burner. It will be made clear in the backlog whenever that happens, though.

πŸ”—

I'd like to see PC-98 Policenauts (or any other DOS program compiled using Borland/Turbo C++) decompiled. What's in it for me?

The ReC98 repository includes a currently incomplete file with the ASMβ†’C++ patterns, as well as information about the limits of decompilability. This file will be continuously updated with new insights. So while you probably wouldn't want to support this project until the very end, it might be worth supporting ReC98 for just a bit – at least until it becomes obvious that I completely figured out Turbo C++, and that other decompilation project you wanted to see made significant progress.

And who knows, maybe we will see a somewhat automated decompilation solution come out of this.

πŸ”—

I want replays! What's in it for me?

Make sure you know someone willing to implement it. Then, tell me that replays are your goal when placing your order, and I'll keep you updated once it's trivial to implement and you can stop supporting the project. Shouldn't take all too long.

πŸ”—

I want translations into languages with non-ASCII characters! What's in it for me?

Unless you found someone who's really willing to dig deep into PC-98 hardware details, you'd probably want to support, or wait for, the entire completed decompilation of your game of choice, since you'd probably first want a port to a modern system that supports Unicode and fonts.

Or, y'know, you can just always replace the font ROM of your PC-98 emulator of choice, because who cares about real hardware anyway, right? In that case, it will take considerably less time. Still, make sure to tell me that translations are your goal when placing your order.

πŸ”—

I want TH03 netplay! What's in it for me?

I'm not a low-level networking person, so who knows whether doing this natively on PC-98 is actually as impractical as it sounds. Porting the game to a modern OS with a network stack first (which, again, requires a complete decompilation) will certainly be a lot more convenient to whoever ends up trying their hand on it, though.

πŸ”—

Do you sell ad space on this site?

Every contributor, no matter how much they paid, has the option to have their name be turned into a link to a URL of their choice. So if you consider that to be advertising, then yes. If you had more than that in mind, hit me up, and we might make it happen. No JavaScript or remote content, though!

πŸ”—

Alright! I have understood what this project is about, and am convinced that I want to support it. Take me to the order form!