[HN Gopher] PSD is not my favourite file format (2009)
___________________________________________________________________
PSD is not my favourite file format (2009)
Author : kruuuder
Score : 199 points
Date : 2021-01-28 18:10 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| mmastrac wrote:
| This used to show up often on HN but I haven't seen it in years.
| The best comment was from this old thread [1]: >
| I enjoyed the commit message most: > > r11 by
| paracelsus on Sep 11, 2007 Diff > Photoshop loader is
| DONE for now, fuck you Adobe
|
| https://news.ycombinator.com/item?id=575122
| hallarempt wrote:
| PSD is horrible, that is certainly true. And underdocumented. And
| dumb... And we needed to implement read/write support for that in
| Krita.
|
| But Painttool Sai's format is actively evil, with different kinds
| of encryption for different size layers and things like that:
| https://github.com/Wunkolo/libsai/issues/6 (We tried to use that
| library to implement sai file format support for Krita, but then
| ran into trouble...)
|
| Manga Studio/Clip Studio Paint's file format is interesting as
| well: it's just an SQLite database, with raster data stored as
| blobs. Not going to implement that for Krita either.
| Gualdrapo wrote:
| Thank you for working on Krita.
| krylon wrote:
| While I totally empathize with the poor author who had to suffer
| through this madness, I thoroughly enjoyed reading this rant. My
| favorite part was this: PSD is not a good format.
| PSD is not even a bad format. Calling it such would be an
| insult to other bad formats, such as ...
| stagger87 wrote:
| I'd love to hear why they think JPEG is a worse format.
| eyesee wrote:
| Well for one thing it's not a file format. For that there's
| JFIF and EXIF.
| stagger87 wrote:
| I actually thought about that when I typed my reply, but
| decided the meaning of my question would still be clear the
| way it was phrased. Despite that, my question is more or
| less irrelevant since I misunderstood the author.
| user-the-name wrote:
| Not worse, that is the point. It is merely a bad format,
| which is better than what PSD is.
|
| As for why JPEG is bad: Did you know there is no one defined
| way to store the width and height of a JPEG image?
| stagger87 wrote:
| Thank you, I totally misread that!
| userbinator wrote:
| As someone who has written a JPEG decoder, I am compelled
| to say that you are wrong, and the width and height are
| stored in the SOF marker segment.
|
| http://vip.sugovica.hu/Sardi/kepnezo/JPEG%20File%20Layout%2
| 0...
| just_for_you wrote:
| To my knowledge, I see most JPEG images encode their height
| in an "APP1" EXIF segment, and in their SOF1 segment (which
| is basically just a basic-info segment for the core JPEG
| bitstream). It's also possible to store the size in an APP1
| XMP section, but that's mostly non-essential metadata for
| interchange purposes, and I usually don't see the size
| duplicated there for JPEGs anyway (PNG does though, if it
| has an XMP chunk).
|
| It's possible that some of the other officially-defined
| segments (see https://exiftool.org/TagNames/JPEG.html)
| might also contain duplicate info, but then that'd be
| specific to the app that uses it, not for general-purpose
| use.
|
| Storing the size in easily-parseable EXIF metadata and then
| once again in the JPEG bitstream seems pretty reasonable to
| me. Not to mention if you run a metadata/EXIF stripper on a
| JPEG file, then you remove all duplicate info and the only
| remaining image-dimensions will be stored in the JPEG
| bitstream anyway.
| jolmg wrote:
| So, what's this project about? There's no README nor description
| nor anything.
|
| I'm curious as to why they'd find it so worthwhile to try to
| parse PSD despite those troubles. They seem to already have
| parsers for other image formats.
| krylon wrote:
| Xee is an image viewer. If you follow the link to the landing
| page of the repo, it says in the About section it is "Xee
| source code for xCode 4.5"
| folkrav wrote:
| I've seen that rant a while ago. IIRC, Xee is an image viewer
| for macOS.
| masklinn wrote:
| > So, what's this project about? There's no README nor
| description nor anything.
|
| It's Xee, a lightweight (and excellent) image viewer for macOS.
|
| That's not the original repo (or author), used to be on google
| code, moved to bitbucket, then abandoned / sold the project: it
| was open-source until version 2.2 or something, v3 is is closed
| source.
| Hamuko wrote:
| Unfortunately Xee3 is basically abandoned at this point.
| MacPaw bought it alongside The Unarchiver and they haven't
| really been doing anything with it. Last update was three
| years ago. I wonder if it's just going to stop working in
| some future macOS version.
| bloudermilk wrote:
| I had a good laugh reading this. Of course the irony is that the
| author, after having gone through the hellish process of learning
| the spec, didn't documented any of the code (in that file).
| hprotagonist wrote:
| This is one of my favourite rants.
|
| The other one is the SO answer about X/HTML parsing with regex.
| https://stackoverflow.com/questions/1732348/regex-match-open...
| Blikkentrekker wrote:
| I find it weird that the comment is locked rather than deleted.
|
| Funny it be, it does not seem _a propos_ for answering a
| quaestion.
| _jal wrote:
| At this point, it is a cultural artifact.
| Alupis wrote:
| It's from a time before StackOverflow (and StackExchange at
| large) got very ridged with all the rules. Back then, there
| were a lot of "fun" questions and answers, including the
| infamous "What is the best comment in source code you have
| ever encountered?"[1]
|
| Some old timers might remember those days... SO was a much
| more fun place back then. Now it's rife with down-voters,
| close-voters and hostile-towards-newbie folks.
|
| [1] https://stackoverflow.com/questions/184618/what-is-the-
| best-...
| NobodyNada wrote:
| The post was locked because it had received hundreds of flags
| and comments complaining that the post was broken -- even
| _with_ the moderator note at the bottom -- and several edits
| trying to "fix" the post. See https://meta.stackoverflow.com
| /questions/250099#comment637_2...
| Blikkentrekker wrote:
| My point is that the comment should be deleted altogether
| as it's more of a personal rant than a serious answer.
| Hamuko wrote:
| My favourite is wm4's rant about C locales. It's quite
| impressive.
|
| https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...
| stickfigure wrote:
| "Those not comfortable with toxic language should pretend
| this is a religious text."
|
| I love it.
| cataphract wrote:
| Unsurprisingly, this kind of attitude got him kicked of his
| own project.
| Hamuko wrote:
| Clearly I haven't been paying attention to mpv close
| enough.
| awestroke wrote:
| Really? Where can I read more?
| dspillett wrote:
| "Please do not flag it for our attention." is the perfect
| finishing touch too.
| Triv888 wrote:
| Plain text file is my favorite file format... because in the end,
| portability is what matters the most.
| Arnt wrote:
| By plain text, do you have in mind CRLF or LF, and for
| structured data, do you have in mind YAML, JSON or one of the
| umpty others?
| gjvr wrote:
| ASCII? UTF-8? UTF-16? ... ...
| Triv888 wrote:
| lol... yes. UTF-512 should be enough for everybody!
| _kst_ wrote:
| UTF-640K?
| Dylan16807 wrote:
| Anyone using the words "plain text" for a file had _better_
| mean either ASCII or UTF-8.
|
| And for almost all purposes these days it should be the
| latter.
| user-the-name wrote:
| Shift-JIS? GB18030? KOI8-R?
|
| "Plain text" does not really exist.
| bluedino wrote:
| Joel on Software, _" Why are Microsoft Office File Formats So
| Complicated"_
|
| https://www.joelonsoftware.com/2008/02/19/why-are-the-micros...
|
| They were designed to be fast on very old computers.
|
| They were designed to use libraries.
|
| They were not designed with interoperability in mind.
|
| They have to reflect the history of the applications,
| mst wrote:
| Personally, rather than printing things out and then setting them
| on fire, I've preferred printing things out then when I'm done
| with it donating them to be shredded and used as horse bedding.
|
| Some code just deserves to end its life being shat on by a horse.
| shamyl_zakariya wrote:
| I wrote a PSD v5 parser for BeOS back in 1999 or 2000 (it acted
| as a plug-in for the OS's image format subsystem, and was also
| usable by 3rd party apps to write PSD files).
|
| I was one of the suckers who faxed a formal request to Adobe to
| get the file spec.
|
| I was young, and foolish, and spent the better part of a month or
| so in a hex editor trying to understand why a single file format
| could have like 3 different string encodings.
| 1f60c wrote:
| For posterity's sake, the link should probably be changed to
| https://github.com/zepouet/Xee-xCode-4.5/blob/83394493f51991...
| just_for_you wrote:
| I would actually beg to disagree, as someone who was bored a few
| weekends ago and decided to go through a few reasonably-complex
| PSD files (eg, ones with different layer types, bezier curves,
| etc.) in a hex editor with the official specification in-hand in
| an attempt to understand the format. (The specification can be
| found here: https://www.adobe.com/devnet-
| apps/photoshop/fileformatashtml...).
|
| After a few hours, I was able to grok it without too much
| difficulty, and found the format was reasonably well laid-out.
| Yes, it does suffer from some "let's take the in-memory
| representation and bake it into the on-disk format"-isms, and
| there were a few things were not covered by the specification
| (eg, a couple Resource IDs aren't mentioned at all, or they are,
| but there's no documentation on how to interpret their content) -
| but it's not anywhere close to insane.
|
| To give an example, partially going off memory from what I
| remember seeing in real files, but mostly skimming the spec, a
| very basic gist of a typical PSD files might look something like
| this. (I can only assume HN is going to mangle the formatting of
| the following text, so expect to see multiple edits to this
| comment):
|
| EDIT: I was right about the formatting getting messed up. I've
| posted it to Pastebin and will call it a day:
|
| https://pastebin.com/raw/WZNrCSAP
|
| So that (the Pastebin link above) is the basic and oversimplified
| gist of a PSD file. It's mostly just reading 4-byte tags and
| length fields. Now, _rendering_ a complex PSD file (or worse:
| generating a complex one) is another matter, but the on-disk
| format is pretty understandable IMO. Creating a bitmap-to-PSD
| conversion program from scratch is also something I could see
| being doable in a weekend if you followed the spec, and, as long
| as the PSD only contained simple rasterized data (eg, no filters
| and other fanciness) and only the bare-minimum ResourceID /Layer
| tags.
|
| Honestly, I really trust PSD as a general interchange format for
| images. It's quirky in some places, but it's pretty logically
| laid-out, is universally-recognized, and I can rest assured that
| a rasterized copy of my images will always be stapled to the end
| of the file if worse comes to worse. There really is no
| alternative, because while you can contain Photoshop data in a
| format like TIFF (Photoshop just staples the entire PSD file into
| a TIFF/EXIF tag anyway), and if an application unaware of that
| PSD data opens the TIFF files and saves it, the Photoshop data
| might go poof or have a mismatch with the other TIFF data. And
| then there's Gimp's XCF format, where the on-disk format is
| allowed to be changed willy-nilly because you are only supposed
| to use Gimp's official library for reading/writing to it (not to
| mention no application really supports XCF aside from Gimp and
| maybe a few open-source projects). And let's not even bother with
| $application-specific formats, because they are meant only for
| that specific application.
|
| The moral of the story is PSD is an alright format.
| hallarempt wrote:
| No, it's nothing of the kind. It's not alright, it's not a
| general interchange format for images that you can trust, and
| the specification isn't complete or correct in any case.
| oseph wrote:
| Here's a "cool" PSD quirk.
|
| Take a PSD that has many layers. Look at its filesize (mine is
| ~70mb). Add one layer to the PSD, fill it with white, and make it
| the topmost layer. Save it as a new PSD and compare the
| filesizes.
|
| The new PSD with the white layer is 55mb. Why?
| mockery wrote:
| Most* PSD files contain a "preview" copy of the fully-flattened
| document (which is compressed.) Flat white image compresses far
| better, so that portion of the file doesn't take as much space.
|
| Depending on what your layers look like (how many, how much
| they cover, etc.) it's not too surprising that the preview
| image could take a substantial fraction of the total file size
| (sounds like ~20% in this case!)
|
| * I believe this behavior can be toggled off with an option.
| Dylan16807 wrote:
| Is the preview the full resolution? That seems very overkill.
| w0mbat wrote:
| I wrote the classic-Mac image-display app, "Shomi" (originally
| DePICT) which tried to display anything as fast as possible.
|
| In modern times such an app would be just a bunch of API or
| library calls, but in the early days I had to write all the image
| format readers myself.
|
| At the time TIFF was the worst, so complicated, so many options,
| and you even had to write your own LZW codec that matched theirs.
|
| Next worse was BMP. It's upside-down and you get random zeroes
| where you expect sensible values.
|
| I managed with a minimalist Photoshop parser (the 8BIM format
| then) that didn't support everything but coped with the real
| world files my designer friends tried. It's got more complicated
| since then.
| just_for_you wrote:
| I would agree on BMP being slightly insane, since there's a
| number of things you have to magically know about it:
|
| 1) If using 8 bits per pixel or less, there's always a color
| palette between the header and bitmap. If more than 8bpp, then
| there's none.
|
| 2) If more or equal to 8BPP, you must know that you must
| _always_ use Bitfields to specify which bytes correspond to
| which RGB(A) channels, and that there cannot be any overlap
| between the channel masks.
|
| 3) There's no intuitive way to tell which version of the Bitmap
| file you have (there's like 5 major versions).
|
| 4) You can specify, inline, the Chromaticity and Gamma
| (basically an inline color profile) somewhere in the headers.
|
| 5) You can also append an ICC profile directly into the file,
| OR, store a Windows-style filesystem path to the profile (in
| which case, you must know to ignore the in-header color
| profile, if you have one).
|
| 6) You must know that RLE compression only works if the RGB
| channels cleanly map to one-byte-each (no packing).
|
| 7) RLE compression doesn't work if you use a negative height
| (to indicate the image is encoded from top-to-bottom, rather
| than the standard bottom-to-top).
|
| 8) The (un)official documentation Microsoft hosts states that
| BMP files can also store PNG or JPEG bitstream data, but yet no
| application that I've seen has ever explicitly supported this.
|
| 9) You have to know that Windows only supports 2 or so
| variations of packed 16-bit BMP files (565, etc).
|
| For the most part you can just assume you're dealing with a v5
| BMP file, read or write the most pertinent parts like the
| dimensions (skipping dumb stuff like ICC profiles, pixels-per-
| meter and whatnot), and just have your way with the raw data.
| But there is still some dumb stuff in there that shouldn't be
| there, considering it should be a straight-forward format. What
| also baffles me is that even with the bloat Microsoft added to
| the format, BMP isn't even extensible. So it's kinda the worst
| of both: It's slightly complicated to the point of not being
| straight-foward to work with, and yet, you can't add extra
| info/metadata to it.
|
| On that note, Targa is a dreamy image format. It's got a simple
| 18-byte or so header, and then the raw data. And, optionally,
| for Targa v2 you can append a few bytes in a footer that
| indicate an offset for an optional standard-targa-metadata
| area, as well as an option metadata area for your data you wish
| to add to the file. Followed by a magic string, that explicitly
| indicate you're using v2 of the Targa spec.
| [deleted]
| SimianLogic2 wrote:
| I have deep sympathy for this, after having done a ton of work
| with the SWF format and the (even more poorly documented) AEPX
| format.
| [deleted]
| riffraff wrote:
| I remember reading this years ago, I think it was my first
| encounter with the "... fierce passion of a thousand suns".
|
| It still made me chuckle even today, thanks for sharing it.
| ericol wrote:
| > this Rube Goldberg of a file format
|
| Concise, to the point, and a good enough insult as to keep it in
| file. I love it.
| [deleted]
| egonschiele wrote:
| This came up a couple months ago. I really wish there was a
| standard open format that did most of the things PSD does. I had
| asked the folks at Procreate if they'd design something, since
| they are sort of a challenger to Photoshop, but they said it
| wasn't something they wanted to do.
| iggldiggl wrote:
| Where this might get tricky is if you seriously want to support
| non-destructive editing as well, because in that case any
| filter that can be applied in non-destructive mode effectively
| needs to become part of the file format specification, too.
| wongarsu wrote:
| You could store filters generic enough that any software can
| skip over filters it doesn't recognize (e.g. you store filter
| type uuid, filter data length, then arbitrary filter
| parameters), have some open registry where you can get
| official filter uuids in return for an example
| implementation. Of course not everyone will register every
| filter, so the format should probably store the image once
| unfiltered and once with all filter applied, that way if a
| program doesn't recognize or implement some filter it can
| still fall back to the destructively filtered version. But of
| course that increases file size, which may or may not be a
| concern.
| preommr wrote:
| Procreate is 10$ for a one-time purchase.
|
| PS is 13$/mo. Actually, if you got just PS it would be 30$/mo.
|
| And even then I've seen people complain that procreate is too
| expensive.
|
| I don't really have a point other than design tools and their
| pricing is a pet issue of mine and I like bringing up how
| insane it is at any given chance.
| hallarempt wrote:
| We started working on OpenRaster for that:
| https://www.openraster.org/ -- but it's not really moving along
| very well.
| luc_ wrote:
| more of these posts please 10/10
| rdtsc wrote:
| > Why, for instance, did it suddenly decide that _these_
| particular chunks should be aligned to four bytes, and that this
| alignement (sic) should _not_ be included in the size?
|
| Not saying this happened here, but I have seen this type of
| mistake before. It was because they "simply" cast a C/C++ struct
| to a binary blob
| https://en.wikipedia.org/wiki/Data_structure_alignment#Typic...
| and wrote that to disk (or sent over the network in my case). So
| that particular compiler version and architecture-specific struct
| field alignment became the "official" format. It just takes one
| goofy mistake like that and everyone has to deal with it for
| years to come.
| nradov wrote:
| That was very common not long ago. Even the JPEG/EXIF image
| file format is designed that way. So it's efficient for reading
| and writing, but introduces a lot of potential bugs with
| alignment and chunk size issues. Inserting an additional EXIF
| tag is a huge hassle because then you have to recalculate all
| the pointers, even those in other data chunks.
| just_for_you wrote:
| One form of true insanity is probably JPEG-encoded TIFF
| files. Apparently it's so bad that for the next version of
| the TIFF specification they are taking that insane approach
| out completely.
|
| I can't do justice to the article on TIFF's problems with
| JPEG
| (http://www.simplesystems.org/libtiff/TIFFTechNote2.html),
| but my understanding is that:
|
| 1) Some JPEG-specific data is moved outside of the actual
| JPEG bitstream tag, and into separate TIFF tags, making
| editing JPEG-in-TIFF files non-trivial. 2) Size is not
| encoded in some of these fields, so the TIFF editor you're
| writing will have to _partially implement a JPEG decoder_
| just to know the size of some of those TIFF tags, and 3) Some
| tags /fields are pointers into other parts of the TIFF file,
| meaning if you edit the file, you'll have to update the file
| in many places.
|
| (As a quick aside on insane formats, may I also mention the
| EPWING dictionary format?)
| masklinn wrote:
| Note that the issue here is less the part that you quoted and
| more the two sentences surrounding it, PSD has all of
|
| * unaligned chunks
|
| * aligned chunks with alignment included in size
|
| * aligned chunks with alignment not included in size
|
| The problem is not the specific choice, "Either one of these
| three behaviours would be fine", it's that "PSD, of course,
| uses all three, and more."
| grishka wrote:
| That's why you version your file formats and network protocols.
| Actually, designing file formats and network protocols is one
| of the few areas in software engineering where you do really
| need future-proof design and extreme extensibility, because
| once you release the thing, these are set in stone. Yet not
| many people seem to realize this. They instead "future-proof"
| their code with useless abstraction layers.
|
| Anyway. Versioning helps you avoid ugly workarounds if you need
| to extend your format in the future in ways that its current
| version doesn't allow. You then keep the code for older
| versions as a backwards-compatibility-only kind of deal and
| move on to the new one.
| klodolph wrote:
| Versioning is one approach, but I favor making the format
| extensible in the first place. For example, if you pick an
| XML format you can add new attributes and tags, if you pick a
| Protobuf format you can add new fields. "Extensible" sounds
| like it can be a real mess but there are effective strategies
| to minimize the mess.
|
| There are also various chunked formats like AIFC (AIFF) and
| PNG which can be extended by defining new chunk types,
| without needing to change versioning. AIFC includes the
| APPL/stoc chunk and PNG has various mechanisms.
|
| There are a few problems with versioning file formats. One is
| that you often end up with 'if (version > 3)' scattered
| across your code base or other nonsense. Another problem is
| that it is easy to accidentally mark the wrong version,
| either a version which is too low (because you wrote
| something to the file and forgot to make your encoder bump
| the version properly) or a version which is too high (because
| you didn't bother to use the minimum version your data
| requires).
| grishka wrote:
| I'd say do both, actually. But versioning is more important
| IMO because it gives you the freedom to potentially start
| from scratch keeping only a small portion of the header.
|
| On mainstream file formats... It's a mixed bag. Image
| formats -- JPEG and PNG especially -- are extensible and
| reasonably easy to parse. It's fairly trivial to get the
| image dimensions out of one of these without decoding the
| compressed data. I did as well write a JPEG decoder out of
| curiosity once to understand the compression algorithm
| better -- it's an interesting exercise, really, every
| software developer should try it at some point.
|
| But the worst format I've ever worked with is MP3. It's an
| absolute mess. First, there are two kinds of ID3 tags.
| These store metadata that your player displays. ID3v1 is a
| fixed-length, fixed-layout thing that goes on the end of
| the file. ID3v2 is an extensible, I'd say _way too
| extensible_ , chunked thing capable of storing literally
| anything, including jpegs of cover arts, that goes on the
| beginning of the file. But none of them store the duration
| of the file. You're supposed to chop the tags off the ends
| of it, then find the first frame of encoded data by
| searching for the pattern 0xFFFx, read its header, and
| determine the byte length, bitrate, the sampling rate and
| ultimately the duration of a that frame using several
| lookup tables. Now that you know how much audio each frame
| contains, and how long it is, you take the size of the file
| (minus tags of course) and divide it by the frame length,
| then multiply by the frame duration. That's how you get the
| duration of an MP3. A constant-bitrate one. And to seek
| within an MP3, you calculate the offset into the file and
| round it to the nearest frame size and just start playing
| from there. It gets even worse with VBR, because now you
| can no longer rely on frames being the same byte length,
| but I don't really remember the details any more. The gist
| of it is that there's "header" encoded into the very first
| frame in the file, and there are two kinds of these
| headers, and there's a sort of lookup table in it, among
| other things, to help you seek into the right part of the
| file because the byte offsets don't linearly correspond to
| the playback time in a VBR file. After you seek, you have
| to go back and forth to find the 0xFFFx and play from
| there. Or not, because sometimes there's a 0xFFFx in the
| middle of a frame too, so you have to have some heuristics
| to detect that it's the real one.
| klodolph wrote:
| You should have a version, but given the (admittedly
| unrealistic) choice between versioning and extensibility
| I'll take extensibility every time. There are plenty of
| formats where there's a version tag and it's never been
| bumped past "1".
|
| Early compressed audio / video formats were generally a
| total mess, with a couple exceptions like MOV, so I'm not
| surprised that MP3 is horrible.
| dfox wrote:
| That is because MP3 is not a file format in the first
| place. The file is originally simply an stream of MP3
| frames written into a file without any structure and
| usually with technically invalid and undecodable frame at
| both ends (due to how MP3 compression works). Because it
| is originally designed for transmission across some kind
| of somewhat unreliable network, there are
| resynchronization structures in the stream and decoders
| tend to be able to ignore various kinds of totally
| invalid crap in the input data, this feature is exploited
| by all the ID3vX formats to essentially embed arbitrary
| data into the file. Several other commonly used "MPEG
| something" "file formats" are exactly the same thing.
|
| This is somewhat ironic given the fact that quite large
| part of MPEG specification deals with various framing and
| metadata structures (on the other hand the overall
| architecture of all that is best described as
| "overengineered", so ignoring it makes some kind of
| sense).
| just_for_you wrote:
| This pretty much nails it.
|
| I also do appreciate MP3's simplicity, in the sense that
| it's just a series of concatenated frames. It makes it
| really easy to just fling them over the network and get
| streaming audio working on a client. And there's also
| somewhat of an elegance (a very ugly and hacky elegance,
| mind you) to being able to exploit decoders ignoring
| malformed frames.
|
| For example if you open an MP3 IceCast stream via HTTP in
| a media player like VLC, the server (if it realizes, via
| the HTTP request headers, that you're Icecast-aware) will
| occasionally barf the name of the current and following
| song into the MP3 stream. Meaning you don't need any
| higher-level streaming protocol to deal with, and can
| just send raw frames over the wire, where the MP3 decoder
| will ignore the song title as a malformed MP3 frame, but
| VLC will pick-up on the song title and display it for you
| as the server cycles from one song to the next. Kinda
| handy, because VLC will make use of this metadata, but at
| the same time, the MP3 stream's URL will also work in a
| web browser too, since the browser won't need to know how
| to deal with a higher-level protocol before being able to
| start receiving those frames.
|
| Actually, nevermind all that. MP3 bad.
| superjan wrote:
| I second your observations on JPEG. It turns out that you
| can use the same code to skim through jpeg, lossless
| jpeg, Jpeg2000 and jpeg-ls headers. Impressive in it's
| simplicity.
| spion wrote:
| Data structures and object protocols are the analog to file
| formats and network protocols. They're not immutable but
| unless you want software changes to affect everything, you
| want something at least a bit more stable
| wolrah wrote:
| As I understand it that's more or less what the older MS Office
| formats were, just a dump of the entire OLE object representing
| the document as-is.
|
| I believe SimCity used a similar strategy as well, the .cty and
| I think SC2k's .s2k formats just dumped the in-memory
| representation to disk,.
| krylon wrote:
| I worked at a company once where they, too, used raw memory
| dumps as their file format.
|
| The compiler flags telling the compiler how to align structs
| were different for debug and non-debug builds. So, of course,
| the first thing I did, was to create non-debug build and try
| to open a file created by a debug-build-executable, the
| program crashed and burned without giving me a meaningful
| error message, it took me hours to understand my mistake.
|
| Raw memory dumps are very neat efficiency-wise, but they are
| extremely fragile.
| rwallace wrote:
| I'm curious, what was the reason for the difference? While
| the C language standard doesn't guarantee anything about
| alignment, I'm used to the behavior of compilers in
| practice being consistently 'align everything on its own
| size'; what changed between debug and release builds in
| that case?
| ender341341 wrote:
| They mentioned compiler flags, so my guess would be that
| they were doing something like tightly packing structs in
| the release build to lower memory usage or something
| similar.
| saagarjha wrote:
| Or the opposite, as packing usually requires slower,
| unaligned accesses.
| sumtechguy wrote:
| That would depend on how much that struct is created in
| memory vs the total code that uses it. If you have one
| instance in memory then yeah the code is probably bigger
| and slower (depending on arch). But if you have thousands
| of the structs in memory the speed and code size trade
| off may be worth it on a memory constrained system.
|
| Debug also sometimes turns on overrun buffers so you can
| check for over/under runs in your code at debug time.
| Some compilers have this others dont.
| klodolph wrote:
| Primitives have not historically been aligned to their
| own size. There are plenty of older systems where double
| floats had 32-bit alignment instead of 64-bit, or 32-bit
| integers had 16-bit alignment.
| krylon wrote:
| In the non-debug build, the compiler was free to align
| structs as it saw fit, in the debug build, it was told to
| pack them really tight, no padding. Why? The only reason
| I can think of is to save a few precious bytes of disk
| space.
| DonHopkins wrote:
| FWIW here's the original 68k SimCity "Classic" (open source
| Micropolis) save function, which I've long since cleaned up
| and put in byte swapping to make it portable to SPARC and
| x86, but yes it is just writing out some big buffers of
| memory with a function that now swaps bytes:
|
| https://github.com/SimHacker/micropolis/blob/master/Micropol.
| ..
|
| Here's some original SimCity 2000 Mac code that saves the
| city into Mac resources (not the flat part of the file, but
| the Mac resource fork) -- CompGameWrite actually does some
| simple run length compression of the raw memory:
| Boolean DoSave(short VolNum,Byte *name){ long
| count,CountTotal; short i,j,filnum; short x,
| y; Byte *SaveText[] = {"\pGame Saved As:",NIL,NIL};
| FInfo info; Byte *NotCity[] = {
| "\pSimCity 2000a will not save over a non-city file.",
| "\pTry to save again using a different name.",
| NIL, }; if (name[0]>20) name[0] =
| 20; #ifndef DEBUG if
| (GetFInfo(name,VolNum,&info)==0 && info.fdType!=CITYTYPE_ID)
| { MessageDialog(NotCity);
| return FALSE; } #endif
| if (FSOpen(name,VolNum,&filnum)!=noErr) { if
| (GameError(Create(name,VolNum,APPLICATION_ID,CITYTYPE_ID)))
| return FALSE; if
| (GameError(FSOpen(name,VolNum,&filnum))) return FALSE;
| } if (!WriteHeader(filnum,0L)) return FALSE;
| WriteLength = 4; // includes 'SCDH' but not header
| //*** Write *** if (!MiscWrite(filnum)) return
| FALSE; if
| (!GameWrite(filnum,'ALTM',(Ptr)AltMap[0])) return FALSE;
| if (!CompGameWrite(filnum,'XTER',(Ptr)TerrainMap[0])) return
| FALSE; if
| (!CompGameWrite(filnum,'XBLD',(Ptr)BuildMap[0])) return
| FALSE; if
| (!CompGameWrite(filnum,'XZON',(Ptr)ZoneMap[0])) return FALSE;
| if (!CompGameWrite(filnum,'XUND',(Ptr)UnderMap[0])) return
| FALSE; if
| (!CompGameWrite(filnum,'XTXT',(Ptr)TextMap[0])) return FALSE;
| if (!CompGameWrite(filnum,'XLAB',(Ptr)LabelArray)) return
| FALSE; if
| (!CompGameWrite(filnum,'XMIC',(Ptr)MicroRecord)) return
| FALSE; if
| (!CompGameWrite(filnum,'XTHG',(Ptr)ThingList)) return FALSE;
| if (!CompGameWrite(filnum,'XBIT',(Ptr)BitsMap[0])) return
| FALSE; if
| (!CompGameWrite(filnum,'XTRF',(Ptr)TrafficMap[0])) return
| FALSE; if
| (!CompGameWrite(filnum,'XPLT',(Ptr)PolluteMap[0])) return
| FALSE; if
| (!CompGameWrite(filnum,'XVAL',(Ptr)ValueMap[0])) return
| FALSE; if
| (!CompGameWrite(filnum,'XCRM',(Ptr)CrimeMap[0])) return
| FALSE; if
| (!CompGameWrite(filnum,'XPLC',(Ptr)PoliceMap[0])) return
| FALSE; if
| (!CompGameWrite(filnum,'XFIR',(Ptr)FireMap[0])) return FALSE;
| if (!CompGameWrite(filnum,'XPOP',(Ptr)PopMap[0])) return
| FALSE; if
| (!CompGameWrite(filnum,'XROG',(Ptr)ROGMap[0])) return FALSE;
| if (!CompGameWrite(filnum,'XGRP',(Ptr)GraphData[0])) return
| FALSE; //*** Close ***
| WriteHeader(filnum,WriteLength);
| GameError(SetEOF(filnum,WriteLength+8)); // 8 bytes for file
| header GameError(FSClose(filnum));
| GameError(FlushVol(NIL,VolNum));
| BlockMove(name,CityStr,21); DateCashTitle();
| FilVolNum = VolNum; //*** SUCCESS! ***
| SaveText[1] = CityStr; MessageDialog(SaveText);
| return TRUE; }
|
| And here's the Mac SimEarth save function, which looks like
| it just write out raw uncompressed memory -- the Mac memory
| manager has a GetPtrSize function that (obviously) tells you
| the size of the memory allocated for a pointer:
| DoSave(VolNum,name) int VolNum; Str255 name;
| { long count,CountTotal; int i,FilNum;
| if (EqualString(name,"\pSimEarth",FALSE,TRUE)) {
| MessageDialog( "\pERROR!!",
| "\pThe name 'SimEarth' is reserved",
| "\pfor this application.",1); return;
| } if (FSOpen(name,VolNum,&FilNum)) {
| if (GAIAError(Create(name,VolNum,'MYCR','SAVE'))) return;
| if (GAIAError(FSOpen(name,VolNum,&FilNum))) return;
| } else if
| (GAIAError(SetFPos(FilNum,fsFromStart,0L))) {
| GAIAError(FSClose(FilNum)); return;
| } /**Write**/ CountTotal =
| (count=GetPtrSize(Map[0])); if
| (GAIAError(FSWrite(FilNum,&count,Map[0]))) return;
| CountTotal += (count=GetPtrSize(Life[0])); if
| (GAIAError(FSWrite(FilNum,&count,Life[0]))) return;
| CountTotal += (count=GetPtrSize(OcTempMap[0]));
| if (GAIAError(FSWrite(FilNum,&count,OcTempMap[0]))) return;
| CountTotal += (count=GetPtrSize(OcCurrentMap[0]));
| if (GAIAError(FSWrite(FilNum,&count,OcCurrentMap[0])))
| return; CountTotal +=
| (count=GetPtrSize(DriftMap[0])); if
| (GAIAError(FSWrite(FilNum,&count,DriftMap[0]))) return;
| CountTotal += (count=GetPtrSize(EventMap[0])); if
| (GAIAError(FSWrite(FilNum,&count,EventMap[0]))) return;
| CountTotal += (count=GetPtrSize(SAirTempMap[0]));
| if (GAIAError(FSWrite(FilNum,&count,SAirTempMap[0]))) return;
| CountTotal += (count=GetPtrSize(SCloudDensity[0]));
| if (GAIAError(FSWrite(FilNum,&count,SCloudDensity[0])))
| return; CountTotal +=
| (count=GetPtrSize(SAirCurrentMap[0])); if
| (GAIAError(FSWrite(FilNum,&count,SAirCurrentMap[0]))) return;
| PutParameters(); CountTotal +=
| (count=GetPtrSize(MiscHis)); if
| (GAIAError(FSWrite(FilNum,&count,MiscHis))) return;
| /**Close**/ GAIAError(SetEOF(FilNum,CountTotal));
| GAIAError(FSClose(FilNum));
| GAIAError(FlushVol(NIL,VolNum));
| BlockMove(name,EwinStr,256);
| SetWTitle(editWindow,EwinStr);
| MessageDialog("\pSave Complete:",EwinStr,"",1);
| EnableItem(GetMHandle(301),3); /* Save */
| FilVolNum = VolNum; }
| banana_giraffe wrote:
| For what it's worth, I think that's true of the pre-1997
| version of the format. For the format from 1997 to 2007, it's
| a proper format, albeit a bizarre one (looking over the spec
| makes me think the person tasked with making it was yanked
| off the filesystem team against their will)
|
| Nowadays, it's a XML/ZIP thing.
|
| With things like Flatbuffers, I sometimes feel we've
| regressed to these old formats that are just memory dumps.
| DonHopkins wrote:
| A story I heard at Sun, which may be apocryphal but was
| fucking hilarious enough to be a repeatable rumor, was that
| a release of an early operating system in BETA was
| determined to be solid and tested and ready to release and
| ship to customers, so they simply changed the version
| string from something like "SunOS2.1BETA" to "SunOS2.1FCS"
| (First Customer Ship), and recompiled. But the change from
| a 12 character version to an 11 character version threw off
| the alignment of some important data structures somewhere
| in the kernel, and the entire OS ran MUCH SLOWER because of
| 68k unaligned memory accesses!
| klodolph wrote:
| > With things like Flatbuffers, I sometimes feel we've
| regressed to these old formats that are just memory dumps.
|
| It depends on what your application requirements are, but
| there are compelling arguments that on-disk / on-wire
| representations should match in-memory representations.
| It's not too hard to end up with in a scenario where
| encoding / decoding times are a significant contribution to
| overall performance.
| ryanianian wrote:
| > With things like Flatbuffers, I sometimes feel we've
| regressed to these old formats
|
| Sorta, but they're expressed in IDLs that are independent
| of a particular compiler. The downside is the drift between
| internal structs and the IDL structs, but the upside is you
| can use the same "memory dump" on interpreted runtimes or
| entirely different platforms (even those with different
| endians). Plus impls like protobuf help guard against
| breaking backward-compatibility by numbering fields and not
| "allowing" you to remove fields in ways that would change
| the structs' ABIs.
| userbinator wrote:
| In practice, it's the simplest and most efficient way, since
| the majority of the time you're not going to be dealing with
| any insanely weird architectures and stuff like 9-bit bytes has
| thankfully disappeared from common use.
|
| I do wish they'd pack their structures, however.
___________________________________________________________________
(page generated 2021-01-28 23:02 UTC)