Post ASvaKph7VBuvCizdqq by misty@digipres.club
 (DIR) More posts by misty@digipres.club
 (DIR) Post #ASvYZzttdearIF4YuO by misty@digipres.club
       2023-02-22T05:07:32Z
       
       0 likes, 0 repeats
       
       Anyone out there any good at guessing compression formats? I've got what *might* be compressed data, with separate dictionary/data files, but I don't have the first idea how I'd go about identifying the format.
       
 (DIR) Post #ASvYpL2dJreLCjTT3g by halozeta@wandering.shop
       2023-02-22T05:10:20Z
       
       0 likes, 0 repeats
       
       @misty Did you try opening it in 7zip?
       
 (DIR) Post #ASvYvK7ldtCM65jJ4K by misty@digipres.club
       2023-02-22T05:11:25Z
       
       0 likes, 0 repeats
       
       @halozeta This is a bespoke format, not anything standard. But I assume it's using a standard compression algorithm under the hood.
       
 (DIR) Post #ASvZ7jzvGsRZojyukC by halozeta@wandering.shop
       2023-02-22T05:13:39Z
       
       0 likes, 0 repeats
       
       @misty it's worth a shot. I use it to open up docx files to remove password protection by deleting a file inside.
       
 (DIR) Post #ASva7rPbnyBelacIvw by misty@digipres.club
       2023-02-22T05:24:52Z
       
       0 likes, 0 repeats
       
       @ellenor It's from a game, and it's bespoke to that game. It's not a standard file format, and `file` and standard compression programs don't recognize it.
       
 (DIR) Post #ASvaJbRD1S22hMxrV2 by moralrecordings@digipres.club
       2023-02-22T05:27:01Z
       
       0 likes, 0 repeats
       
       @misty Which game is this from?
       
 (DIR) Post #ASvaKph7VBuvCizdqq by misty@digipres.club
       2023-02-22T05:27:13Z
       
       0 likes, 0 repeats
       
       @moralrecordings Neverossa, a Hong Kong CRPG from 2005.
       
 (DIR) Post #ASvaM3K2Jubtl8mHQG by misty@digipres.club
       2023-02-22T05:27:26Z
       
       0 likes, 0 repeats
       
       @ellenor Neverossa, a Hong Kong CRPG from 2005.
       
 (DIR) Post #ASvalKWdUolyLziy7E by ClutchMark@mstdn.ca
       2023-02-22T05:32:00Z
       
       0 likes, 0 repeats
       
       @misty If you’ve got access to a Linux system you can try running the file command which identifies file types based on byte patterns in the data headers.
       
 (DIR) Post #ASvb2Kyl3xXIgQzJHU by misty@digipres.club
       2023-02-22T05:35:04Z
       
       0 likes, 0 repeats
       
       @ClutchMark I'm familiar! But since it's not a standard container `file` doesn't recognize it.
       
 (DIR) Post #ASvbMtr36bx6dkBemO by ClutchMark@mstdn.ca
       2023-02-22T05:38:49Z
       
       0 likes, 0 repeats
       
       @misty 😪​
       
 (DIR) Post #ASvbqPzqgKLSiyNWYy by tursiae@meow.social
       2023-02-22T05:44:07Z
       
       0 likes, 0 repeats
       
       @misty Does binwalk give any hints, on either the compressed file or dict?
       
 (DIR) Post #ASvc39CYR8JoOrOJPM by RL_Dane@fosstodon.org
       2023-02-22T05:46:24Z
       
       0 likes, 0 repeats
       
       @mistyProbably some form of Lempel-Ziv, but there are so many variants, to say nothing of implementations. The forensics tool FOREMOST might be able to extract something, but it's highly unlikely. Best bet would be to find a (lossless) data compression forum or subreddit and ask for help with it (providing sample files) or undergo the slow process of learning how various compressors work on a low level. ...#DataCompression #AskFedi
       
 (DIR) Post #ASvc3BMuNbmz7YFbpA by RL_Dane@fosstodon.org
       2023-02-22T05:46:24Z
       
       0 likes, 0 repeats
       
       @misty...Thirty years of tinkering with compression (only as a user) on several platforms, and I've never heard of anything storing the dictionary as a separate file, so the likelihood of finding an out-of-the-box solution is kind of slim, but someone who specializes in that field (data compression) might know of some useful shortcuts.
       
 (DIR) Post #ASvc8UKpI4brOMo5ku by moralrecordings@digipres.club
       2023-02-22T05:47:25Z
       
       0 likes, 0 repeats
       
       @misty Guessing can only take you so far... if it's custom you usually need to disassemble the EXE and see what it's doing. It might be useful to throw the file at a histogram tool; I use mrchist which I released as part of https://github.com/moralrecordings/mrcrowbar .Some rules of thumb- Low entropy, interesting byte patterns - uncompressed structured data- Entropy around 7.97, with 4 visible bands - DEFLATE- Entropy 7.99 or greater, pure noise - encrypted/obfuscated
       
 (DIR) Post #ASvcQ1W5AwURE3QviK by oopsallnaps@tech.lgbt
       2023-02-22T05:50:35Z
       
       0 likes, 0 repeats
       
       @misty Could be zlib. Check the file with a hex editor program and look for 4 aligned bytes starting with `0x78 0x9C`. If you find that, you can probably just open it in Python, jump to that magic header position and decompress it with `zlib.decompress`.
       
 (DIR) Post #ASveGojdWj82t3VkQa by misty@digipres.club
       2023-02-22T06:11:18Z
       
       0 likes, 0 repeats
       
       @dressupgeekout Haha, yeah, you guessed it. I was starting at the disassembly of the EXE and not getting anywhere so I figured I'd change track.
       
 (DIR) Post #ASveeEOEJEl578f9eK by RL_Dane@fosstodon.org
       2023-02-22T05:51:51Z
       
       0 likes, 0 repeats
       
       @mistyAlso, the very common DEFLATE algorithm (which I believe is also based on LZ/LZW) is a good place to start. Almost everything uses that, except for newer high-compression programs like xz. Also, if posting elsewhere, definitely do include the year it was published like you did here, because that can help narrow things down.
       
 (DIR) Post #ASveeExgBR7At5FSu8 by RL_Dane@fosstodon.org
       2023-02-22T05:58:34Z
       
       0 likes, 0 repeats
       
       @mistyAnother thought, if you can get comfy with zlib, you might try brute-forcing various parameters to see if any of them successfully decompresses anything. Based on the age and the separate dictionary file, I'm guessing some variant of Lempel-Ziv/Welch à la Zip, DEFLATE, zlib, gzip, etc. But the devil's in the details: implementation specifics, window size, floating dictionary or two-pass*, and any other variables I might not know about. ...
       
 (DIR) Post #ASveeFQ2TzneJ2W76m by RL_Dane@fosstodon.org
       2023-02-22T05:58:35Z
       
       0 likes, 0 repeats
       
       @misty...*Given a separate dictionary, a two-pass/static dictionary sounds likely. So zlib may or may not be able to read it, as it's a stream compressor by design.Also, the separate dictionary file could also be indicative of something more exotic, but I don't know.
       
 (DIR) Post #ASveeFxiSmjpzUH0bI by misty@digipres.club
       2023-02-22T06:15:32Z
       
       0 likes, 0 repeats
       
       @RL_Dane Thank you! Really appreciate all the tips.
       
 (DIR) Post #ASvfB1zlPfksEGrKu8 by RL_Dane@fosstodon.org
       2023-02-22T06:21:29Z
       
       0 likes, 0 repeats
       
       @misty Welcome! Good luck! :)Oh, I was going to add that based on the structure, it's definitely not a variant of RLE or just plain Huffman coding. So definitely start reading up on Lempel-Ziv and see where that takes you.https://en.wikipedia.org/wiki/LZ77_and_LZ78https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welchhttps://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithmhttps://en.wikipedia.org/wiki/Deflatehttps://en.wikipedia.org/wiki/Zlibhttps://en.wikipedia.org/wiki/Dictionary_coderFrom the sound of it though, a (presumably) fixed dictionary sounds a lot like LZ78 -- just based on my quick skim so far.
       
 (DIR) Post #ASvgiI7mZ7PXlQ28MS by XerShadowTail@chitter.xyz
       2023-02-22T06:38:41Z
       
       0 likes, 0 repeats
       
       @misty Do you have the hex bytes, like the first 1K?
       
 (DIR) Post #ASvhT34qKSawhnxDlI by misty@digipres.club
       2023-02-22T06:47:07Z
       
       0 likes, 0 repeats
       
       @moralrecordings Aha. Calling it on the LUP file makes it look like, yeah - obfuscated more likely than compressed
       
 (DIR) Post #ASviXStYNePLtILLt2 by moralrecordings@digipres.club
       2023-02-22T06:59:10Z
       
       0 likes, 0 repeats
       
       @misty Yep EXE needs disassembling, how unluppy 😬
       
 (DIR) Post #ASvlx1Vf7YBmLf2zFA by misty@digipres.club
       2023-02-22T07:37:22Z
       
       0 likes, 0 repeats
       
       @moralrecordings That Sonic guy lied to me, it’s not very happy *or* lucky!I’ve found the function where it’s reading in the two files, just got to brush up on my assembly enough to understand what it’s doing
       
 (DIR) Post #ASvmVZy5VlfL46ovw0 by moralrecordings@digipres.club
       2023-02-22T07:43:39Z
       
       0 likes, 0 repeats
       
       @misty Dunno if you're using Ghidra, but if not the assembly-to-C decompiler that comes with it is indistinguishable from magic
       
 (DIR) Post #ASvmaUauuzMBqZKEfA by misty@digipres.club
       2023-02-22T07:44:30Z
       
       0 likes, 0 repeats
       
       @moralrecordings I haven’t used that yet, I’ll give it a go! Thank you!