[HN Gopher] PSD is not my favourite file format (2009)
       ___________________________________________________________________
        
       PSD is not my favourite file format (2009)
        
       Author : kruuuder
       Score  : 199 points
       Date   : 2021-01-28 18:10 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | mmastrac wrote:
       | This used to show up often on HN but I haven't seen it in years.
       | The best comment was from this old thread [1]:                 >
       | I enjoyed the commit message most:       >       > r11  by
       | paracelsus on Sep 11, 2007   Diff       > Photoshop loader is
       | DONE for now, fuck you Adobe
       | 
       | https://news.ycombinator.com/item?id=575122
        
       | hallarempt wrote:
       | PSD is horrible, that is certainly true. And underdocumented. And
       | dumb... And we needed to implement read/write support for that in
       | Krita.
       | 
       | But Painttool Sai's format is actively evil, with different kinds
       | of encryption for different size layers and things like that:
       | https://github.com/Wunkolo/libsai/issues/6 (We tried to use that
       | library to implement sai file format support for Krita, but then
       | ran into trouble...)
       | 
       | Manga Studio/Clip Studio Paint's file format is interesting as
       | well: it's just an SQLite database, with raster data stored as
       | blobs. Not going to implement that for Krita either.
        
         | Gualdrapo wrote:
         | Thank you for working on Krita.
        
       | krylon wrote:
       | While I totally empathize with the poor author who had to suffer
       | through this madness, I thoroughly enjoyed reading this rant. My
       | favorite part was this:                 PSD is not a good format.
       | PSD is not even a bad format.       Calling it such would be an
       | insult to other bad formats, such as ...
        
         | stagger87 wrote:
         | I'd love to hear why they think JPEG is a worse format.
        
           | eyesee wrote:
           | Well for one thing it's not a file format. For that there's
           | JFIF and EXIF.
        
             | stagger87 wrote:
             | I actually thought about that when I typed my reply, but
             | decided the meaning of my question would still be clear the
             | way it was phrased. Despite that, my question is more or
             | less irrelevant since I misunderstood the author.
        
           | user-the-name wrote:
           | Not worse, that is the point. It is merely a bad format,
           | which is better than what PSD is.
           | 
           | As for why JPEG is bad: Did you know there is no one defined
           | way to store the width and height of a JPEG image?
        
             | stagger87 wrote:
             | Thank you, I totally misread that!
        
             | userbinator wrote:
             | As someone who has written a JPEG decoder, I am compelled
             | to say that you are wrong, and the width and height are
             | stored in the SOF marker segment.
             | 
             | http://vip.sugovica.hu/Sardi/kepnezo/JPEG%20File%20Layout%2
             | 0...
        
             | just_for_you wrote:
             | To my knowledge, I see most JPEG images encode their height
             | in an "APP1" EXIF segment, and in their SOF1 segment (which
             | is basically just a basic-info segment for the core JPEG
             | bitstream). It's also possible to store the size in an APP1
             | XMP section, but that's mostly non-essential metadata for
             | interchange purposes, and I usually don't see the size
             | duplicated there for JPEGs anyway (PNG does though, if it
             | has an XMP chunk).
             | 
             | It's possible that some of the other officially-defined
             | segments (see https://exiftool.org/TagNames/JPEG.html)
             | might also contain duplicate info, but then that'd be
             | specific to the app that uses it, not for general-purpose
             | use.
             | 
             | Storing the size in easily-parseable EXIF metadata and then
             | once again in the JPEG bitstream seems pretty reasonable to
             | me. Not to mention if you run a metadata/EXIF stripper on a
             | JPEG file, then you remove all duplicate info and the only
             | remaining image-dimensions will be stored in the JPEG
             | bitstream anyway.
        
       | jolmg wrote:
       | So, what's this project about? There's no README nor description
       | nor anything.
       | 
       | I'm curious as to why they'd find it so worthwhile to try to
       | parse PSD despite those troubles. They seem to already have
       | parsers for other image formats.
        
         | krylon wrote:
         | Xee is an image viewer. If you follow the link to the landing
         | page of the repo, it says in the About section it is "Xee
         | source code for xCode 4.5"
        
         | folkrav wrote:
         | I've seen that rant a while ago. IIRC, Xee is an image viewer
         | for macOS.
        
         | masklinn wrote:
         | > So, what's this project about? There's no README nor
         | description nor anything.
         | 
         | It's Xee, a lightweight (and excellent) image viewer for macOS.
         | 
         | That's not the original repo (or author), used to be on google
         | code, moved to bitbucket, then abandoned / sold the project: it
         | was open-source until version 2.2 or something, v3 is is closed
         | source.
        
           | Hamuko wrote:
           | Unfortunately Xee3 is basically abandoned at this point.
           | MacPaw bought it alongside The Unarchiver and they haven't
           | really been doing anything with it. Last update was three
           | years ago. I wonder if it's just going to stop working in
           | some future macOS version.
        
       | bloudermilk wrote:
       | I had a good laugh reading this. Of course the irony is that the
       | author, after having gone through the hellish process of learning
       | the spec, didn't documented any of the code (in that file).
        
       | hprotagonist wrote:
       | This is one of my favourite rants.
       | 
       | The other one is the SO answer about X/HTML parsing with regex.
       | https://stackoverflow.com/questions/1732348/regex-match-open...
        
         | Blikkentrekker wrote:
         | I find it weird that the comment is locked rather than deleted.
         | 
         | Funny it be, it does not seem _a propos_ for answering a
         | quaestion.
        
           | _jal wrote:
           | At this point, it is a cultural artifact.
        
           | Alupis wrote:
           | It's from a time before StackOverflow (and StackExchange at
           | large) got very ridged with all the rules. Back then, there
           | were a lot of "fun" questions and answers, including the
           | infamous "What is the best comment in source code you have
           | ever encountered?"[1]
           | 
           | Some old timers might remember those days... SO was a much
           | more fun place back then. Now it's rife with down-voters,
           | close-voters and hostile-towards-newbie folks.
           | 
           | [1] https://stackoverflow.com/questions/184618/what-is-the-
           | best-...
        
           | NobodyNada wrote:
           | The post was locked because it had received hundreds of flags
           | and comments complaining that the post was broken -- even
           | _with_ the moderator note at the bottom -- and several edits
           | trying to  "fix" the post. See https://meta.stackoverflow.com
           | /questions/250099#comment637_2...
        
             | Blikkentrekker wrote:
             | My point is that the comment should be deleted altogether
             | as it's more of a personal rant than a serious answer.
        
         | Hamuko wrote:
         | My favourite is wm4's rant about C locales. It's quite
         | impressive.
         | 
         | https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...
        
           | stickfigure wrote:
           | "Those not comfortable with toxic language should pretend
           | this is a religious text."
           | 
           | I love it.
        
           | cataphract wrote:
           | Unsurprisingly, this kind of attitude got him kicked of his
           | own project.
        
             | Hamuko wrote:
             | Clearly I haven't been paying attention to mpv close
             | enough.
        
             | awestroke wrote:
             | Really? Where can I read more?
        
         | dspillett wrote:
         | "Please do not flag it for our attention." is the perfect
         | finishing touch too.
        
       | Triv888 wrote:
       | Plain text file is my favorite file format... because in the end,
       | portability is what matters the most.
        
         | Arnt wrote:
         | By plain text, do you have in mind CRLF or LF, and for
         | structured data, do you have in mind YAML, JSON or one of the
         | umpty others?
        
         | gjvr wrote:
         | ASCII? UTF-8? UTF-16? ... ...
        
           | Triv888 wrote:
           | lol... yes. UTF-512 should be enough for everybody!
        
             | _kst_ wrote:
             | UTF-640K?
        
           | Dylan16807 wrote:
           | Anyone using the words "plain text" for a file had _better_
           | mean either ASCII or UTF-8.
           | 
           | And for almost all purposes these days it should be the
           | latter.
        
           | user-the-name wrote:
           | Shift-JIS? GB18030? KOI8-R?
           | 
           | "Plain text" does not really exist.
        
       | bluedino wrote:
       | Joel on Software, _" Why are Microsoft Office File Formats So
       | Complicated"_
       | 
       | https://www.joelonsoftware.com/2008/02/19/why-are-the-micros...
       | 
       | They were designed to be fast on very old computers.
       | 
       | They were designed to use libraries.
       | 
       | They were not designed with interoperability in mind.
       | 
       | They have to reflect the history of the applications,
        
       | mst wrote:
       | Personally, rather than printing things out and then setting them
       | on fire, I've preferred printing things out then when I'm done
       | with it donating them to be shredded and used as horse bedding.
       | 
       | Some code just deserves to end its life being shat on by a horse.
        
       | shamyl_zakariya wrote:
       | I wrote a PSD v5 parser for BeOS back in 1999 or 2000 (it acted
       | as a plug-in for the OS's image format subsystem, and was also
       | usable by 3rd party apps to write PSD files).
       | 
       | I was one of the suckers who faxed a formal request to Adobe to
       | get the file spec.
       | 
       | I was young, and foolish, and spent the better part of a month or
       | so in a hex editor trying to understand why a single file format
       | could have like 3 different string encodings.
        
       | 1f60c wrote:
       | For posterity's sake, the link should probably be changed to
       | https://github.com/zepouet/Xee-xCode-4.5/blob/83394493f51991...
        
       | just_for_you wrote:
       | I would actually beg to disagree, as someone who was bored a few
       | weekends ago and decided to go through a few reasonably-complex
       | PSD files (eg, ones with different layer types, bezier curves,
       | etc.) in a hex editor with the official specification in-hand in
       | an attempt to understand the format. (The specification can be
       | found here: https://www.adobe.com/devnet-
       | apps/photoshop/fileformatashtml...).
       | 
       | After a few hours, I was able to grok it without too much
       | difficulty, and found the format was reasonably well laid-out.
       | Yes, it does suffer from some "let's take the in-memory
       | representation and bake it into the on-disk format"-isms, and
       | there were a few things were not covered by the specification
       | (eg, a couple Resource IDs aren't mentioned at all, or they are,
       | but there's no documentation on how to interpret their content) -
       | but it's not anywhere close to insane.
       | 
       | To give an example, partially going off memory from what I
       | remember seeing in real files, but mostly skimming the spec, a
       | very basic gist of a typical PSD files might look something like
       | this. (I can only assume HN is going to mangle the formatting of
       | the following text, so expect to see multiple edits to this
       | comment):
       | 
       | EDIT: I was right about the formatting getting messed up. I've
       | posted it to Pastebin and will call it a day:
       | 
       | https://pastebin.com/raw/WZNrCSAP
       | 
       | So that (the Pastebin link above) is the basic and oversimplified
       | gist of a PSD file. It's mostly just reading 4-byte tags and
       | length fields. Now, _rendering_ a complex PSD file (or worse:
       | generating a complex one) is another matter, but the on-disk
       | format is pretty understandable IMO. Creating a bitmap-to-PSD
       | conversion program from scratch is also something I could see
       | being doable in a weekend if you followed the spec, and, as long
       | as the PSD only contained simple rasterized data (eg, no filters
       | and other fanciness) and only the bare-minimum ResourceID /Layer
       | tags.
       | 
       | Honestly, I really trust PSD as a general interchange format for
       | images. It's quirky in some places, but it's pretty logically
       | laid-out, is universally-recognized, and I can rest assured that
       | a rasterized copy of my images will always be stapled to the end
       | of the file if worse comes to worse. There really is no
       | alternative, because while you can contain Photoshop data in a
       | format like TIFF (Photoshop just staples the entire PSD file into
       | a TIFF/EXIF tag anyway), and if an application unaware of that
       | PSD data opens the TIFF files and saves it, the Photoshop data
       | might go poof or have a mismatch with the other TIFF data. And
       | then there's Gimp's XCF format, where the on-disk format is
       | allowed to be changed willy-nilly because you are only supposed
       | to use Gimp's official library for reading/writing to it (not to
       | mention no application really supports XCF aside from Gimp and
       | maybe a few open-source projects). And let's not even bother with
       | $application-specific formats, because they are meant only for
       | that specific application.
       | 
       | The moral of the story is PSD is an alright format.
        
         | hallarempt wrote:
         | No, it's nothing of the kind. It's not alright, it's not a
         | general interchange format for images that you can trust, and
         | the specification isn't complete or correct in any case.
        
       | oseph wrote:
       | Here's a "cool" PSD quirk.
       | 
       | Take a PSD that has many layers. Look at its filesize (mine is
       | ~70mb). Add one layer to the PSD, fill it with white, and make it
       | the topmost layer. Save it as a new PSD and compare the
       | filesizes.
       | 
       | The new PSD with the white layer is 55mb. Why?
        
         | mockery wrote:
         | Most* PSD files contain a "preview" copy of the fully-flattened
         | document (which is compressed.) Flat white image compresses far
         | better, so that portion of the file doesn't take as much space.
         | 
         | Depending on what your layers look like (how many, how much
         | they cover, etc.) it's not too surprising that the preview
         | image could take a substantial fraction of the total file size
         | (sounds like ~20% in this case!)
         | 
         | * I believe this behavior can be toggled off with an option.
        
           | Dylan16807 wrote:
           | Is the preview the full resolution? That seems very overkill.
        
       | w0mbat wrote:
       | I wrote the classic-Mac image-display app, "Shomi" (originally
       | DePICT) which tried to display anything as fast as possible.
       | 
       | In modern times such an app would be just a bunch of API or
       | library calls, but in the early days I had to write all the image
       | format readers myself.
       | 
       | At the time TIFF was the worst, so complicated, so many options,
       | and you even had to write your own LZW codec that matched theirs.
       | 
       | Next worse was BMP. It's upside-down and you get random zeroes
       | where you expect sensible values.
       | 
       | I managed with a minimalist Photoshop parser (the 8BIM format
       | then) that didn't support everything but coped with the real
       | world files my designer friends tried. It's got more complicated
       | since then.
        
         | just_for_you wrote:
         | I would agree on BMP being slightly insane, since there's a
         | number of things you have to magically know about it:
         | 
         | 1) If using 8 bits per pixel or less, there's always a color
         | palette between the header and bitmap. If more than 8bpp, then
         | there's none.
         | 
         | 2) If more or equal to 8BPP, you must know that you must
         | _always_ use Bitfields to specify which bytes correspond to
         | which RGB(A) channels, and that there cannot be any overlap
         | between the channel masks.
         | 
         | 3) There's no intuitive way to tell which version of the Bitmap
         | file you have (there's like 5 major versions).
         | 
         | 4) You can specify, inline, the Chromaticity and Gamma
         | (basically an inline color profile) somewhere in the headers.
         | 
         | 5) You can also append an ICC profile directly into the file,
         | OR, store a Windows-style filesystem path to the profile (in
         | which case, you must know to ignore the in-header color
         | profile, if you have one).
         | 
         | 6) You must know that RLE compression only works if the RGB
         | channels cleanly map to one-byte-each (no packing).
         | 
         | 7) RLE compression doesn't work if you use a negative height
         | (to indicate the image is encoded from top-to-bottom, rather
         | than the standard bottom-to-top).
         | 
         | 8) The (un)official documentation Microsoft hosts states that
         | BMP files can also store PNG or JPEG bitstream data, but yet no
         | application that I've seen has ever explicitly supported this.
         | 
         | 9) You have to know that Windows only supports 2 or so
         | variations of packed 16-bit BMP files (565, etc).
         | 
         | For the most part you can just assume you're dealing with a v5
         | BMP file, read or write the most pertinent parts like the
         | dimensions (skipping dumb stuff like ICC profiles, pixels-per-
         | meter and whatnot), and just have your way with the raw data.
         | But there is still some dumb stuff in there that shouldn't be
         | there, considering it should be a straight-forward format. What
         | also baffles me is that even with the bloat Microsoft added to
         | the format, BMP isn't even extensible. So it's kinda the worst
         | of both: It's slightly complicated to the point of not being
         | straight-foward to work with, and yet, you can't add extra
         | info/metadata to it.
         | 
         | On that note, Targa is a dreamy image format. It's got a simple
         | 18-byte or so header, and then the raw data. And, optionally,
         | for Targa v2 you can append a few bytes in a footer that
         | indicate an offset for an optional standard-targa-metadata
         | area, as well as an option metadata area for your data you wish
         | to add to the file. Followed by a magic string, that explicitly
         | indicate you're using v2 of the Targa spec.
        
       | [deleted]
        
       | SimianLogic2 wrote:
       | I have deep sympathy for this, after having done a ton of work
       | with the SWF format and the (even more poorly documented) AEPX
       | format.
        
       | [deleted]
        
       | riffraff wrote:
       | I remember reading this years ago, I think it was my first
       | encounter with the "... fierce passion of a thousand suns".
       | 
       | It still made me chuckle even today, thanks for sharing it.
        
       | ericol wrote:
       | > this Rube Goldberg of a file format
       | 
       | Concise, to the point, and a good enough insult as to keep it in
       | file. I love it.
        
       | [deleted]
        
       | egonschiele wrote:
       | This came up a couple months ago. I really wish there was a
       | standard open format that did most of the things PSD does. I had
       | asked the folks at Procreate if they'd design something, since
       | they are sort of a challenger to Photoshop, but they said it
       | wasn't something they wanted to do.
        
         | iggldiggl wrote:
         | Where this might get tricky is if you seriously want to support
         | non-destructive editing as well, because in that case any
         | filter that can be applied in non-destructive mode effectively
         | needs to become part of the file format specification, too.
        
           | wongarsu wrote:
           | You could store filters generic enough that any software can
           | skip over filters it doesn't recognize (e.g. you store filter
           | type uuid, filter data length, then arbitrary filter
           | parameters), have some open registry where you can get
           | official filter uuids in return for an example
           | implementation. Of course not everyone will register every
           | filter, so the format should probably store the image once
           | unfiltered and once with all filter applied, that way if a
           | program doesn't recognize or implement some filter it can
           | still fall back to the destructively filtered version. But of
           | course that increases file size, which may or may not be a
           | concern.
        
         | preommr wrote:
         | Procreate is 10$ for a one-time purchase.
         | 
         | PS is 13$/mo. Actually, if you got just PS it would be 30$/mo.
         | 
         | And even then I've seen people complain that procreate is too
         | expensive.
         | 
         | I don't really have a point other than design tools and their
         | pricing is a pet issue of mine and I like bringing up how
         | insane it is at any given chance.
        
         | hallarempt wrote:
         | We started working on OpenRaster for that:
         | https://www.openraster.org/ -- but it's not really moving along
         | very well.
        
       | luc_ wrote:
       | more of these posts please 10/10
        
       | rdtsc wrote:
       | > Why, for instance, did it suddenly decide that _these_
       | particular chunks should be aligned to four bytes, and that this
       | alignement (sic) should _not_ be included in the size?
       | 
       | Not saying this happened here, but I have seen this type of
       | mistake before. It was because they "simply" cast a C/C++ struct
       | to a binary blob
       | https://en.wikipedia.org/wiki/Data_structure_alignment#Typic...
       | and wrote that to disk (or sent over the network in my case). So
       | that particular compiler version and architecture-specific struct
       | field alignment became the "official" format. It just takes one
       | goofy mistake like that and everyone has to deal with it for
       | years to come.
        
         | nradov wrote:
         | That was very common not long ago. Even the JPEG/EXIF image
         | file format is designed that way. So it's efficient for reading
         | and writing, but introduces a lot of potential bugs with
         | alignment and chunk size issues. Inserting an additional EXIF
         | tag is a huge hassle because then you have to recalculate all
         | the pointers, even those in other data chunks.
        
           | just_for_you wrote:
           | One form of true insanity is probably JPEG-encoded TIFF
           | files. Apparently it's so bad that for the next version of
           | the TIFF specification they are taking that insane approach
           | out completely.
           | 
           | I can't do justice to the article on TIFF's problems with
           | JPEG
           | (http://www.simplesystems.org/libtiff/TIFFTechNote2.html),
           | but my understanding is that:
           | 
           | 1) Some JPEG-specific data is moved outside of the actual
           | JPEG bitstream tag, and into separate TIFF tags, making
           | editing JPEG-in-TIFF files non-trivial. 2) Size is not
           | encoded in some of these fields, so the TIFF editor you're
           | writing will have to _partially implement a JPEG decoder_
           | just to know the size of some of those TIFF tags, and 3) Some
           | tags /fields are pointers into other parts of the TIFF file,
           | meaning if you edit the file, you'll have to update the file
           | in many places.
           | 
           | (As a quick aside on insane formats, may I also mention the
           | EPWING dictionary format?)
        
         | masklinn wrote:
         | Note that the issue here is less the part that you quoted and
         | more the two sentences surrounding it, PSD has all of
         | 
         | * unaligned chunks
         | 
         | * aligned chunks with alignment included in size
         | 
         | * aligned chunks with alignment not included in size
         | 
         | The problem is not the specific choice, "Either one of these
         | three behaviours would be fine", it's that "PSD, of course,
         | uses all three, and more."
        
         | grishka wrote:
         | That's why you version your file formats and network protocols.
         | Actually, designing file formats and network protocols is one
         | of the few areas in software engineering where you do really
         | need future-proof design and extreme extensibility, because
         | once you release the thing, these are set in stone. Yet not
         | many people seem to realize this. They instead "future-proof"
         | their code with useless abstraction layers.
         | 
         | Anyway. Versioning helps you avoid ugly workarounds if you need
         | to extend your format in the future in ways that its current
         | version doesn't allow. You then keep the code for older
         | versions as a backwards-compatibility-only kind of deal and
         | move on to the new one.
        
           | klodolph wrote:
           | Versioning is one approach, but I favor making the format
           | extensible in the first place. For example, if you pick an
           | XML format you can add new attributes and tags, if you pick a
           | Protobuf format you can add new fields. "Extensible" sounds
           | like it can be a real mess but there are effective strategies
           | to minimize the mess.
           | 
           | There are also various chunked formats like AIFC (AIFF) and
           | PNG which can be extended by defining new chunk types,
           | without needing to change versioning. AIFC includes the
           | APPL/stoc chunk and PNG has various mechanisms.
           | 
           | There are a few problems with versioning file formats. One is
           | that you often end up with 'if (version > 3)' scattered
           | across your code base or other nonsense. Another problem is
           | that it is easy to accidentally mark the wrong version,
           | either a version which is too low (because you wrote
           | something to the file and forgot to make your encoder bump
           | the version properly) or a version which is too high (because
           | you didn't bother to use the minimum version your data
           | requires).
        
             | grishka wrote:
             | I'd say do both, actually. But versioning is more important
             | IMO because it gives you the freedom to potentially start
             | from scratch keeping only a small portion of the header.
             | 
             | On mainstream file formats... It's a mixed bag. Image
             | formats -- JPEG and PNG especially -- are extensible and
             | reasonably easy to parse. It's fairly trivial to get the
             | image dimensions out of one of these without decoding the
             | compressed data. I did as well write a JPEG decoder out of
             | curiosity once to understand the compression algorithm
             | better -- it's an interesting exercise, really, every
             | software developer should try it at some point.
             | 
             | But the worst format I've ever worked with is MP3. It's an
             | absolute mess. First, there are two kinds of ID3 tags.
             | These store metadata that your player displays. ID3v1 is a
             | fixed-length, fixed-layout thing that goes on the end of
             | the file. ID3v2 is an extensible, I'd say _way too
             | extensible_ , chunked thing capable of storing literally
             | anything, including jpegs of cover arts, that goes on the
             | beginning of the file. But none of them store the duration
             | of the file. You're supposed to chop the tags off the ends
             | of it, then find the first frame of encoded data by
             | searching for the pattern 0xFFFx, read its header, and
             | determine the byte length, bitrate, the sampling rate and
             | ultimately the duration of a that frame using several
             | lookup tables. Now that you know how much audio each frame
             | contains, and how long it is, you take the size of the file
             | (minus tags of course) and divide it by the frame length,
             | then multiply by the frame duration. That's how you get the
             | duration of an MP3. A constant-bitrate one. And to seek
             | within an MP3, you calculate the offset into the file and
             | round it to the nearest frame size and just start playing
             | from there. It gets even worse with VBR, because now you
             | can no longer rely on frames being the same byte length,
             | but I don't really remember the details any more. The gist
             | of it is that there's "header" encoded into the very first
             | frame in the file, and there are two kinds of these
             | headers, and there's a sort of lookup table in it, among
             | other things, to help you seek into the right part of the
             | file because the byte offsets don't linearly correspond to
             | the playback time in a VBR file. After you seek, you have
             | to go back and forth to find the 0xFFFx and play from
             | there. Or not, because sometimes there's a 0xFFFx in the
             | middle of a frame too, so you have to have some heuristics
             | to detect that it's the real one.
        
               | klodolph wrote:
               | You should have a version, but given the (admittedly
               | unrealistic) choice between versioning and extensibility
               | I'll take extensibility every time. There are plenty of
               | formats where there's a version tag and it's never been
               | bumped past "1".
               | 
               | Early compressed audio / video formats were generally a
               | total mess, with a couple exceptions like MOV, so I'm not
               | surprised that MP3 is horrible.
        
               | dfox wrote:
               | That is because MP3 is not a file format in the first
               | place. The file is originally simply an stream of MP3
               | frames written into a file without any structure and
               | usually with technically invalid and undecodable frame at
               | both ends (due to how MP3 compression works). Because it
               | is originally designed for transmission across some kind
               | of somewhat unreliable network, there are
               | resynchronization structures in the stream and decoders
               | tend to be able to ignore various kinds of totally
               | invalid crap in the input data, this feature is exploited
               | by all the ID3vX formats to essentially embed arbitrary
               | data into the file. Several other commonly used "MPEG
               | something" "file formats" are exactly the same thing.
               | 
               | This is somewhat ironic given the fact that quite large
               | part of MPEG specification deals with various framing and
               | metadata structures (on the other hand the overall
               | architecture of all that is best described as
               | "overengineered", so ignoring it makes some kind of
               | sense).
        
               | just_for_you wrote:
               | This pretty much nails it.
               | 
               | I also do appreciate MP3's simplicity, in the sense that
               | it's just a series of concatenated frames. It makes it
               | really easy to just fling them over the network and get
               | streaming audio working on a client. And there's also
               | somewhat of an elegance (a very ugly and hacky elegance,
               | mind you) to being able to exploit decoders ignoring
               | malformed frames.
               | 
               | For example if you open an MP3 IceCast stream via HTTP in
               | a media player like VLC, the server (if it realizes, via
               | the HTTP request headers, that you're Icecast-aware) will
               | occasionally barf the name of the current and following
               | song into the MP3 stream. Meaning you don't need any
               | higher-level streaming protocol to deal with, and can
               | just send raw frames over the wire, where the MP3 decoder
               | will ignore the song title as a malformed MP3 frame, but
               | VLC will pick-up on the song title and display it for you
               | as the server cycles from one song to the next. Kinda
               | handy, because VLC will make use of this metadata, but at
               | the same time, the MP3 stream's URL will also work in a
               | web browser too, since the browser won't need to know how
               | to deal with a higher-level protocol before being able to
               | start receiving those frames.
               | 
               | Actually, nevermind all that. MP3 bad.
        
               | superjan wrote:
               | I second your observations on JPEG. It turns out that you
               | can use the same code to skim through jpeg, lossless
               | jpeg, Jpeg2000 and jpeg-ls headers. Impressive in it's
               | simplicity.
        
           | spion wrote:
           | Data structures and object protocols are the analog to file
           | formats and network protocols. They're not immutable but
           | unless you want software changes to affect everything, you
           | want something at least a bit more stable
        
         | wolrah wrote:
         | As I understand it that's more or less what the older MS Office
         | formats were, just a dump of the entire OLE object representing
         | the document as-is.
         | 
         | I believe SimCity used a similar strategy as well, the .cty and
         | I think SC2k's .s2k formats just dumped the in-memory
         | representation to disk,.
        
           | krylon wrote:
           | I worked at a company once where they, too, used raw memory
           | dumps as their file format.
           | 
           | The compiler flags telling the compiler how to align structs
           | were different for debug and non-debug builds. So, of course,
           | the first thing I did, was to create non-debug build and try
           | to open a file created by a debug-build-executable, the
           | program crashed and burned without giving me a meaningful
           | error message, it took me hours to understand my mistake.
           | 
           | Raw memory dumps are very neat efficiency-wise, but they are
           | extremely fragile.
        
             | rwallace wrote:
             | I'm curious, what was the reason for the difference? While
             | the C language standard doesn't guarantee anything about
             | alignment, I'm used to the behavior of compilers in
             | practice being consistently 'align everything on its own
             | size'; what changed between debug and release builds in
             | that case?
        
               | ender341341 wrote:
               | They mentioned compiler flags, so my guess would be that
               | they were doing something like tightly packing structs in
               | the release build to lower memory usage or something
               | similar.
        
               | saagarjha wrote:
               | Or the opposite, as packing usually requires slower,
               | unaligned accesses.
        
               | sumtechguy wrote:
               | That would depend on how much that struct is created in
               | memory vs the total code that uses it. If you have one
               | instance in memory then yeah the code is probably bigger
               | and slower (depending on arch). But if you have thousands
               | of the structs in memory the speed and code size trade
               | off may be worth it on a memory constrained system.
               | 
               | Debug also sometimes turns on overrun buffers so you can
               | check for over/under runs in your code at debug time.
               | Some compilers have this others dont.
        
               | klodolph wrote:
               | Primitives have not historically been aligned to their
               | own size. There are plenty of older systems where double
               | floats had 32-bit alignment instead of 64-bit, or 32-bit
               | integers had 16-bit alignment.
        
               | krylon wrote:
               | In the non-debug build, the compiler was free to align
               | structs as it saw fit, in the debug build, it was told to
               | pack them really tight, no padding. Why? The only reason
               | I can think of is to save a few precious bytes of disk
               | space.
        
           | DonHopkins wrote:
           | FWIW here's the original 68k SimCity "Classic" (open source
           | Micropolis) save function, which I've long since cleaned up
           | and put in byte swapping to make it portable to SPARC and
           | x86, but yes it is just writing out some big buffers of
           | memory with a function that now swaps bytes:
           | 
           | https://github.com/SimHacker/micropolis/blob/master/Micropol.
           | ..
           | 
           | Here's some original SimCity 2000 Mac code that saves the
           | city into Mac resources (not the flat part of the file, but
           | the Mac resource fork) -- CompGameWrite actually does some
           | simple run length compression of the raw memory:
           | Boolean DoSave(short VolNum,Byte *name){         long
           | count,CountTotal;         short i,j,filnum;         short x,
           | y;         Byte *SaveText[] = {"\pGame Saved As:",NIL,NIL};
           | FInfo info;         Byte *NotCity[] = {
           | "\pSimCity 2000a will not save over a non-city file.",
           | "\pTry to save again using a different name.",
           | NIL,         };                  if (name[0]>20) name[0] =
           | 20;              #ifndef DEBUG             if
           | (GetFInfo(name,VolNum,&info)==0 && info.fdType!=CITYTYPE_ID)
           | {                 MessageDialog(NotCity);
           | return FALSE;             }         #endif
           | if (FSOpen(name,VolNum,&filnum)!=noErr) {                 if
           | (GameError(Create(name,VolNum,APPLICATION_ID,CITYTYPE_ID)))
           | return FALSE;                 if
           | (GameError(FSOpen(name,VolNum,&filnum))) return FALSE;
           | }                  if (!WriteHeader(filnum,0L)) return FALSE;
           | WriteLength = 4;        // includes 'SCDH' but not header
           | //*** Write ***             if (!MiscWrite(filnum)) return
           | FALSE;             if
           | (!GameWrite(filnum,'ALTM',(Ptr)AltMap[0])) return FALSE;
           | if (!CompGameWrite(filnum,'XTER',(Ptr)TerrainMap[0])) return
           | FALSE;             if
           | (!CompGameWrite(filnum,'XBLD',(Ptr)BuildMap[0])) return
           | FALSE;             if
           | (!CompGameWrite(filnum,'XZON',(Ptr)ZoneMap[0])) return FALSE;
           | if (!CompGameWrite(filnum,'XUND',(Ptr)UnderMap[0])) return
           | FALSE;                  if
           | (!CompGameWrite(filnum,'XTXT',(Ptr)TextMap[0])) return FALSE;
           | if (!CompGameWrite(filnum,'XLAB',(Ptr)LabelArray)) return
           | FALSE;             if
           | (!CompGameWrite(filnum,'XMIC',(Ptr)MicroRecord)) return
           | FALSE;             if
           | (!CompGameWrite(filnum,'XTHG',(Ptr)ThingList)) return FALSE;
           | if (!CompGameWrite(filnum,'XBIT',(Ptr)BitsMap[0])) return
           | FALSE;                  if
           | (!CompGameWrite(filnum,'XTRF',(Ptr)TrafficMap[0])) return
           | FALSE;             if
           | (!CompGameWrite(filnum,'XPLT',(Ptr)PolluteMap[0])) return
           | FALSE;             if
           | (!CompGameWrite(filnum,'XVAL',(Ptr)ValueMap[0])) return
           | FALSE;             if
           | (!CompGameWrite(filnum,'XCRM',(Ptr)CrimeMap[0])) return
           | FALSE;                  if
           | (!CompGameWrite(filnum,'XPLC',(Ptr)PoliceMap[0])) return
           | FALSE;             if
           | (!CompGameWrite(filnum,'XFIR',(Ptr)FireMap[0])) return FALSE;
           | if (!CompGameWrite(filnum,'XPOP',(Ptr)PopMap[0])) return
           | FALSE;             if
           | (!CompGameWrite(filnum,'XROG',(Ptr)ROGMap[0])) return FALSE;
           | if (!CompGameWrite(filnum,'XGRP',(Ptr)GraphData[0])) return
           | FALSE;              //*** Close ***
           | WriteHeader(filnum,WriteLength);
           | GameError(SetEOF(filnum,WriteLength+8));  // 8 bytes for file
           | header             GameError(FSClose(filnum));
           | GameError(FlushVol(NIL,VolNum));
           | BlockMove(name,CityStr,21);             DateCashTitle();
           | FilVolNum = VolNum;              //*** SUCCESS! ***
           | SaveText[1] = CityStr;             MessageDialog(SaveText);
           | return TRUE;         }
           | 
           | And here's the Mac SimEarth save function, which looks like
           | it just write out raw uncompressed memory -- the Mac memory
           | manager has a GetPtrSize function that (obviously) tells you
           | the size of the memory allocated for a pointer:
           | DoSave(VolNum,name)         int VolNum;         Str255 name;
           | {         long count,CountTotal;         int i,FilNum;
           | if (EqualString(name,"\pSimEarth",FALSE,TRUE)) {
           | MessageDialog( "\pERROR!!",
           | "\pThe name 'SimEarth' is reserved",
           | "\pfor this application.",1);                 return;
           | }                  if (FSOpen(name,VolNum,&FilNum)) {
           | if (GAIAError(Create(name,VolNum,'MYCR','SAVE'))) return;
           | if (GAIAError(FSOpen(name,VolNum,&FilNum))) return;
           | }             else if
           | (GAIAError(SetFPos(FilNum,fsFromStart,0L))) {
           | GAIAError(FSClose(FilNum));                 return;
           | }              /**Write**/             CountTotal =
           | (count=GetPtrSize(Map[0]));             if
           | (GAIAError(FSWrite(FilNum,&count,Map[0]))) return;
           | CountTotal += (count=GetPtrSize(Life[0]));             if
           | (GAIAError(FSWrite(FilNum,&count,Life[0]))) return;
           | CountTotal += (count=GetPtrSize(OcTempMap[0]));
           | if (GAIAError(FSWrite(FilNum,&count,OcTempMap[0]))) return;
           | CountTotal += (count=GetPtrSize(OcCurrentMap[0]));
           | if (GAIAError(FSWrite(FilNum,&count,OcCurrentMap[0])))
           | return;                  CountTotal +=
           | (count=GetPtrSize(DriftMap[0]));             if
           | (GAIAError(FSWrite(FilNum,&count,DriftMap[0]))) return;
           | CountTotal += (count=GetPtrSize(EventMap[0]));             if
           | (GAIAError(FSWrite(FilNum,&count,EventMap[0]))) return;
           | CountTotal += (count=GetPtrSize(SAirTempMap[0]));
           | if (GAIAError(FSWrite(FilNum,&count,SAirTempMap[0]))) return;
           | CountTotal += (count=GetPtrSize(SCloudDensity[0]));
           | if (GAIAError(FSWrite(FilNum,&count,SCloudDensity[0])))
           | return;                  CountTotal +=
           | (count=GetPtrSize(SAirCurrentMap[0]));             if
           | (GAIAError(FSWrite(FilNum,&count,SAirCurrentMap[0]))) return;
           | PutParameters();             CountTotal +=
           | (count=GetPtrSize(MiscHis));             if
           | (GAIAError(FSWrite(FilNum,&count,MiscHis))) return;
           | /**Close**/             GAIAError(SetEOF(FilNum,CountTotal));
           | GAIAError(FSClose(FilNum));
           | GAIAError(FlushVol(NIL,VolNum));
           | BlockMove(name,EwinStr,256);
           | SetWTitle(editWindow,EwinStr);
           | MessageDialog("\pSave Complete:",EwinStr,"",1);
           | EnableItem(GetMHandle(301),3);  /* Save */
           | FilVolNum = VolNum;         }
        
           | banana_giraffe wrote:
           | For what it's worth, I think that's true of the pre-1997
           | version of the format. For the format from 1997 to 2007, it's
           | a proper format, albeit a bizarre one (looking over the spec
           | makes me think the person tasked with making it was yanked
           | off the filesystem team against their will)
           | 
           | Nowadays, it's a XML/ZIP thing.
           | 
           | With things like Flatbuffers, I sometimes feel we've
           | regressed to these old formats that are just memory dumps.
        
             | DonHopkins wrote:
             | A story I heard at Sun, which may be apocryphal but was
             | fucking hilarious enough to be a repeatable rumor, was that
             | a release of an early operating system in BETA was
             | determined to be solid and tested and ready to release and
             | ship to customers, so they simply changed the version
             | string from something like "SunOS2.1BETA" to "SunOS2.1FCS"
             | (First Customer Ship), and recompiled. But the change from
             | a 12 character version to an 11 character version threw off
             | the alignment of some important data structures somewhere
             | in the kernel, and the entire OS ran MUCH SLOWER because of
             | 68k unaligned memory accesses!
        
             | klodolph wrote:
             | > With things like Flatbuffers, I sometimes feel we've
             | regressed to these old formats that are just memory dumps.
             | 
             | It depends on what your application requirements are, but
             | there are compelling arguments that on-disk / on-wire
             | representations should match in-memory representations.
             | It's not too hard to end up with in a scenario where
             | encoding / decoding times are a significant contribution to
             | overall performance.
        
             | ryanianian wrote:
             | > With things like Flatbuffers, I sometimes feel we've
             | regressed to these old formats
             | 
             | Sorta, but they're expressed in IDLs that are independent
             | of a particular compiler. The downside is the drift between
             | internal structs and the IDL structs, but the upside is you
             | can use the same "memory dump" on interpreted runtimes or
             | entirely different platforms (even those with different
             | endians). Plus impls like protobuf help guard against
             | breaking backward-compatibility by numbering fields and not
             | "allowing" you to remove fields in ways that would change
             | the structs' ABIs.
        
         | userbinator wrote:
         | In practice, it's the simplest and most efficient way, since
         | the majority of the time you're not going to be dealing with
         | any insanely weird architectures and stuff like 9-bit bytes has
         | thankfully disappeared from common use.
         | 
         | I do wish they'd pack their structures, however.
        
       ___________________________________________________________________
       (page generated 2021-01-28 23:02 UTC)