Let's start working on a solution.
Okay, start with
the file dumping method posted by zero.
Fire up the game and generate the first line of dialog.
Type it out in Japanese. There are character names in there. I really want a non-variable part of it for searching.
もっと急いでったら
Search for this in the dumped files using MadEdit. Use the "find in files" option (CTRL+F).
It shows up in 0017e1b0.log. That is named after the location the compressed data was found in the big file, so let's look there. You never know what you might find.
Okay, that file (that is currently loaded) is in the LICE (archive) file at 0x17d800. The file entries table starts at 0x17da00.
Also, the file that is currently loaded is the first sub-file in that archive. The table entry for it is at 0x17da08. That's important.
I wasn't sure about that second value in the table, but it looks like it is decompressed size.
What we need now is to find the end of the compressed data for that file.
If we go back to the entries table at 0x17da00, the next entry is offset 0x2cb0. 0x17da00 + 0x2cb0 = 0x1806b0, so let's look there.
Well, that looks like a miss. Let's explore why:
The position of our first sub-file is 0x17da00 + 0x7b0 = 0x17e1b0. Looking there, we see the first two bytes are 78 9C. Then,
looking here at stackoverflow, we see that a common header for zlib is 78 9C. So we expect that's what headers should look like.
Just searching in hex for the next header, we see it's at 0x180327. Then, doing math, 0x180327 - 0x17da00 = 0x2927. That value doesn't make a lot of sense either. Just randomly guessing, let's do another math problem: 0x7b0 + 0x2cb0. = 0x3460. That doesn't make much sense either.
Looks like the next file is at 0x180e60. 0x180e60 - 0x17da00 = 0x3460. Things are starting to make more sense.
Okay, so there are three entries for each file in the header:
1) Offset 2) Decompressed Size 3) Compressed Size
Progress!
Okay, now I need to get myself that file, the compressed version, so I can work with a little more closely. I know Python and zlib is a Python standard library. That means someone wrote a functions that do zlib compression and decompress so I don't have to write them.
First I need to get the file though. So the compressed data for that file is at offset 0x17e1b0 and its size is 0x2cb0. Let's write a short program that dumps that file. That will help us learn how to do file I/O in Python. Take a look at this program:
http://pastebin.com/J9yTDmN6
Okay, I have the file. Now I need to see if I can decompress it. Time for a Python decompress test:
http://pastebin.com/WgRHdkNG
That test is a pass.
Okay, now we need to see if we can re-compress it. If we can get exactly the same result that HobbyJapan did, then we can make things work, I think. I did a quick check with the commands:
x = zlib.compress(decompressedfiledata)
x == filedata
This returns true, so it means we have the exact same data.
We're not even to the point where a test can be generated. Let's just do a mental exercise where we figure out what's needed.
1) This game uses SHIFT-JIS. Both SHIFT-JIS and UTF-8 will accept ASCII text. So we need to replace some SHIFT-JIS text with ASCII text to see if it will display. It looks like it might because the name entry will let you pick English letters both upper and lower case. We know where the first line is, so it will be easy to test.
2) Recompress. The key here will be to recompress it to smaller than the original size. From my quick test, it doesn't look like that will be possible. I compressed English text with a ratio of .41 and Japanese with a ratio of .48 (English compresses better) but you need 50% more characters for the same thing in English... The English ended up being a little bigger. We can try using a higher compression setting than default but it is unlikely to work in-game.
3) Re-insert and test. Not too hard. We just replace the original file contents with our new compressed file. Just pad with 00 for however much smaller our file is than the original.
Will need to read up those other places about where the LICE file offsets are.