Anyone good at guessing filenames?

Discuss classic and favorite computer or console games here.
User avatar
Malvineous
Shikadi Webmaster
Posts: 382
Joined: Wed Oct 31, 2007 21:48
Location: Brisbane, Australia
Contact:

Anyone good at guessing filenames?

Post by Malvineous »

Hi all,

I need some help finalising the reverse engineering of the .LBR archive files used by Vinyl Goddess From Mars. This file format does not store filenames, using a hash (number) derived from the filename instead. This is very easy to calculate if you know the filename, but if you only know the hash it is impossible to go back the other way and figure out the filename.

This means the only way to figure out what some of the files are called is by guessing, and seeing if the hash of the guess matches one of the existing hashes. I've already matched up all the files I can using this method, so I'm hoping some creative people might spend a few minutes and see if they can guess a few more filenames.

I have put up a quick web page which lets you type in filenames and it automatically calculates the hash and checks to see whether it matches one of the unknown values. If so it will display a message, and you can post the newly discovered filename here so I can add it to the list of known names.

Any help with this would be much appreciated! There are only 11 unknown filenames to go, so it would be great to get them all figured out.
User avatar
VikingBoyBilly
Vorticon Elite
Posts: 4158
Joined: Sat Jan 05, 2008 2:06
Location: The spaghetti island of the faces of dinosaur world for a vacation

Post by VikingBoyBilly »

Can the program methodically make a list of hashes from file names alphabetically limited by the character space of the filename? Like if you know the file name can only have 8 characters and it can only be letters, numbers, and a few symbols, you can have it methodically go through a process of "guessing" by calculating every possible combination until it finds a match to the hash you put in.
Image
"I don't trust players. Not one bit." - Levellass
Keening_Product
Kuliwho?
Posts: 2167
Joined: Fri Jan 20, 2012 7:02
Location: Tied up in the Oracle Chamber's basement
Contact:

Post by Keening_Product »

Bloody hell that's hard... but fun! No luck so far but I'll keep coming back to it if needed.

How on earth did you figure out SAVEBOXO and SAVEBOXG?
Keening_Product was defeated before the game.

"Wise words. One day I may even understand what they mean." - Levellass
NY00123
Vorticon Elite
Posts: 511
Joined: Sun Sep 06, 2009 19:36

Post by NY00123 »

VikingBoyBilly wrote:Can the program methodically make a list of hashes from file names alphabetically limited by the character space of the filename? Like if you know the file name can only have 8 characters and it can only be letters, numbers, and a few symbols, you can have it methodically go through a process of "guessing" by calculating every possible combination until it finds a match to the hash you put in.
Given these limits, maybe a simple program that guesses by the way of iterating over all possible filenames (i.e., the bruteforce way) can solve this, possibly taking advantage of GPGPU code for acceleration.
User avatar
Roobar
Vorticon Elite
Posts: 3276
Joined: Tue Jan 08, 2008 16:12
Contact:

Post by Roobar »

I've tried all eeXX and b4XX combinations. Is that the kind of work you are asking?

Edit: found this program to speed the things up:

Image

Haven't discovered anything new yet.
User avatar
lemm
Blorb
Posts: 696
Joined: Fri Jul 03, 2009 10:18
Location: canada lol

Post by lemm »

wiivn wrote:I've tried all eeXX and b4XX combinations. Is that the kind of work you are asking?
You're supposed to type in a filename, and if it's correct, the four digit hash code that is generated as you type will match one of the four digit hash codes highlighted in yellow.
ny00123 wrote: Given these limits, maybe a simple program that guesses by the way of iterating over all possible filenames (i.e., the bruteforce way) can solve this, possibly taking advantage of GPGPU code for acceleration
.

Just some back of the envelope calculations here:

An 8 character string gives 26^7*36 = 289 billion possible file names (assuming only letters and numbers are used, only the last character can be a number and the extension is known). We can knock off the first character as most of these files are in alphabetical order, leaving 11 billion possible choices, and if you restrict the second letter to allowable English combinations, then you get maybe a billion strings to start from, which seems like it should be brute-forceable.

Given a billion possible strings and one potential hash match, a list of around 170k filenames will be generated. I bet at least a couple of those will be the dictionary word you're looking for, though.

Keening_Product wrote:Bloody hell that's hard... but fun! No luck so far but I'll keep coming back to it if needed.

How on earth did you figure out SAVEBOXO and SAVEBOXG?
A lot of these filenames are in the executable. The ones that we need to guess were put in the archive, but never used in the game.
Last edited by lemm on Sat Oct 19, 2013 20:38, edited 1 time in total.
User avatar
lemm
Blorb
Posts: 696
Joined: Fri Jul 03, 2009 10:18
Location: canada lol

Post by lemm »

Image

I FOUND STUFF



Bapple0.cmp was in the .exe, but for some reason not in the archive.
User avatar
Roobar
Vorticon Elite
Posts: 3276
Joined: Tue Jan 08, 2008 16:12
Contact:

Post by Roobar »

Found one:
Image

"You're supposed to type in a filename, and if it's correct, the four digit hash code that is generated as you type will match one of the four digit hash codes highlighted in yellow."

Then why if you type cdwn or eeg` it will mark them as green?
User avatar
MoffD
Vorticon Elite
Posts: 1220
Joined: Thu Jul 05, 2012 17:30
Location: /dev/null
Contact:

Post by MoffD »

can you give me the method of hashing? I'm willing to try brute forcing all the max length filenames.
mortimermcmirestinks wrote: Now I wish MoffD wasn't allergic to me.
Levellass wrote:You're an evil man.
Image
User avatar
lemm
Blorb
Posts: 696
Joined: Fri Jul 03, 2009 10:18
Location: canada lol

Post by lemm »

wiivn wrote:Found one:
Image

"You're supposed to type in a filename, and if it's correct, the four digit hash code that is generated as you type will match one of the four digit hash codes highlighted in yellow."

Then why if you type cdwn or eeg` it will mark them as green?
Oh nice. I had guessed "swish.snd"


The hash function will take any string of characters as potential file names and output a 4-digit code. But, since there's trillions of strings and only 65536 4-digit hex numbers, many different filenames are going to result in the same 4-digit code. So, we're trying to guess what the most likely filename is, even though there are many strings of garble that will produce a "correct" result.
MoffD wrote:can you give me the method of hashing? I'm willing to try brute forcing all the max length filenames.

This is the JavaScript function from Malv's website.

Code: Select all

function calcHash(str)
{
	var hash = 0;
	var len = str.length;
	for (var i = 0; i < len; i++) {
		hash ^= str.charCodeAt(i) << 8;
		for (var j = 0; j < 8; j++) {
			hash <<= 1;
			if (hash & 0x10000) hash ^= 0x1021;
		}
	}
	return hash & 0xffff;
}
If you want to decipher x86 ASM, this is the assembly function:

Code: Select all

sub_24C22       proc far                ; CODE XREF: sub_1CAAE+9DP

arg_0           = dword ptr  6
arg_4           = word ptr  0Ah

                push    bp
                mov     bp, sp
                push    ds
                push    si
                pushf
                cld
                xor     dx, dx
                lds     si, [bp+arg_0]
                mov     cx, [bp+arg_4]

loc_24C31:                              ; CODE XREF: sub_24C22+2Bj
                lodsb
                sub     ah, ah
                xchg    ah, al
                xor     dx, ax
                push    cx
                mov     cx, 8

loc_24C3C:                              ; CODE XREF: sub_24C22:loc_24C4Aj
                mov     bx, dx
                shl     dx, 1
                and     bx, 8000h
                jz      short loc_24C4A
                xor     dx, 1021h

loc_24C4A:                              ; CODE XREF: sub_24C22+22j
                loop    loc_24C3C
                pop     cx
                loop    loc_24C31
                mov     ax, dx
                popf
                pop     si
                pop     ds
                pop     bp
                retf
sub_24C22       endp
User avatar
lemm
Blorb
Posts: 696
Joined: Fri Jul 03, 2009 10:18
Location: canada lol

Post by lemm »

Another clue:

Image

These both have the same file size, they're adjacent, and they're the same size as GAMEOPT.GRA, which is the game options window. I'd guess that they're both GRA files, and that the last character is a digit.


Malv, can your tool extract the graphics to help with guesses?


I also wonder if running through the game and logging the dosbox output to the file would turn up anything interesting...
User avatar
Roobar
Vorticon Elite
Posts: 3276
Joined: Tue Jan 08, 2008 16:12
Contact:

Post by Roobar »

lemm wrote:Another clue:

Image

These both have the same file size, they're adjacent, and they're the same size as GAMEOPT.GRA, which is the game options window. I'd guess that they're both GRA files, and that the last character is a digit.


Malv, can your tool extract the graphics to help with guesses?


I also wonder if running through the game and logging the dosbox output to the file would turn up anything interesting...
Thanks to the clues, I found it!

Image

I also searched in google for images as a clues, since this game isn't one of my favorites.

More clues for a0f? (this thing kinda looks like a text adventure game :))
User avatar
Malvineous
Shikadi Webmaster
Posts: 382
Joined: Wed Oct 31, 2007 21:48
Location: Brisbane, Australia
Contact:

Post by Malvineous »

Holy crap guys, this is fantastic!! I can't believe you figured out so many so quickly! This is amazing. Many, many thanks! I've updated the page with the newly discovered filenames. I've also removed some of the filenames I guessed that I suspect are wrong, to see if you can come up with any better suggestions. These are now listed in brackets after the hash. Be aware that once a hash is marked green (e.g. if you type in my old guess) you'll have to reload the page again so it goes back to yellow, otherwise that hash will no longer be checked as you type.

@Keening_Product: As Lemm said, I scoured the .exe, as well as all files inside the .LBR, and extracted anything that looked like a filename. This got about 90% of the names. I'm not sure whether the remaining files are used or not, given that their names don't appear anywhere in the game!

@wiivn: Thanks for your correct guesses :-)

@lemm: I haven't yet reversed the .gra files, so I can't look at them to figure out what they might be. They seem to be in some kind of planar EGA-like arrangement, which is odd for VGA graphics... And since bapple0.omp was in the .exe but not in the archive, maybe it's one of those cases where they add a number to a character already in a string to construct the filename?

@MoffD: The problem isn't so much brute forcing the filenames, rather it's figuring out which of the matching filenames is the correct one.

I have actually written a program which does a slightly better job of brute-forcing the algorithm, instead calculating the hash backwards and printing all possible matches. This means you can restrict it to e.g. files ending in ".GRA" and beginning with "S", and it's quite a bit faster than brute-forcing every possible filename. However there are still thousands of matches. For example, here are all the filenames that match the a0f hash, where the first character is an S, and the second is from A to H, the extension is .GRA and a digit can only appear as the last character in the filename. Even with these restrictions there are a lot of matches.

The correct filename for the a0f hash is probably in that list, but since *all* those filenames match, which one is the correct one??
User avatar
Roobar
Vorticon Elite
Posts: 3276
Joined: Tue Jan 08, 2008 16:12
Contact:

Post by Roobar »

There are 13 matches with SAVE in this list and 7 with SHWR. Maybe it's one of these.

Another suggestion is SCRFONT.GRA since there are files with extension SCR, but more probably might be some screen font as there are some other GRA font files.

Some clues for dc79? Is it something with weapons or with objects to collect?
User avatar
lemm
Blorb
Posts: 696
Joined: Fri Jul 03, 2009 10:18
Location: canada lol

Post by lemm »

I think the best course of action would be to reverse the files to get the graphics, and then to filter out the candidate file names by throwing away those that don't have an English word in them. Almost every file in the archive contains a word with three letters or more. I think you might be able to narrow your search by an order of magnitude if you did that, and combined with the information from the picture, the answer should be obvious.

@wiivn: The filesize would suggest that it's not a WEAP variant.
Post Reply