[HN Gopher] SQLite Archive Files (2018)
___________________________________________________________________
SQLite Archive Files (2018)
Author : alfonsodev
Score : 159 points
Date : 2021-09-27 09:11 UTC (13 hours ago)
(HTM) web link (www.sqlite.org)
(TXT) w3m dump (www.sqlite.org)
| reacweb wrote:
| When I read "If the input X is incompressible, then a copy of X
| is returned", I worry that this is broken. If I archive a file,
| then extract it from the archive, I can not be sure to obtain the
| same file. If the file is already compressed at the beginning, it
| will be decompressed at the end.
|
| Maybe I am wrong. I didn't know this tool. My brief review of the
| documentation leads me to believe that it has an obvious problem.
| kybernetikos wrote:
| I think your concern is dealt with in the immediately following
| section:
|
| >The Y parameter is the compressed content (the output from a
| prior call to sqlar_compress()) and SZ is the original
| uncompressed size of the input X that generated Y. If SZ is
| less than or equal to the size of Y, that indicates that no
| compression occurred, and so sqlar_uncompress(Y,SZ) returns a
| copy of Y.
|
| The format stores both the possibly compressed blob _and_ the
| original size. From those two pieces of information it can
| always return the correct original file.
| reacweb wrote:
| Ok, I was wrong. My review was too brief.
| mbreese wrote:
| _> If SZ is less than or equal to the size of Y, that indicates
| that no compression occurred, and so sqlar_uncompress(Y,SZ)
| returns a copy of Y._
|
| This is pretty common for compression tools. If the input is
| incompressible (or not compressible by X%), then the original
| data is stored. In this case, the code is checking the stored
| size against the uncompressed size. If they are equal, then
| uncompress is a noop.
|
| My take away is that there is a zlib compression function built
| into SQLite. Which can be pretty handy.
|
| Another benefit I can see is that because the SQLite database
| has a flexible schema, you could add new features to the
| archive while maintaining backwards compatibility. For example,
| if you wanted to add a SHA1 hash to each record, you should be
| able to, while still allowing older tools to read the updated
| file.
| lifthrasiir wrote:
| Note that this kind of fallback is also prevalent in
| _compression stream formats_ including zlib which sqlar uses,
| so the archive format doesn 't need to reimplement the same
| fallback. The only reason it might be useful is the
| opportunistic support for random access for uncompressed
| data.
| formerly_proven wrote:
| It's still a really annoying design, one should really
| explicitly communicate if and what compression method was
| used for a given blob of data. Here another DB column would
| have been a very easy way.
| m_ke wrote:
| Would love something like that for storing large image datasets
| for computer vision. Storing embeddings, predictions and metadata
| in a contiguous format with compression support, ANN indexing
| support and SQL would be amazing.
| noxer wrote:
| There is an SQLite compression and encryption "add-on" which
| seems to do what you want. Check the official website. Hint:
| You have to buy a license to use it legally.
| [deleted]
| kybernetikos wrote:
| Given how crazy the zip file format is, and the claim that sqlite
| is faster than the filesystem for small files
| (https://www.sqlite.org/fasterthanfs.html) this seems pretty
| reasonable to me.
|
| In particular, development repositories with many many small
| source files often have horrendously slow copying/deleting
| behaviour (particularly on windows) even on fast disks. I wonder
| if sqlite archive files would be a better way to store them.
| proto-n wrote:
| I wonder if the particular slowness on windows could be related
| to how closing file handles is slow on windows because of
| blocking antivirus checks. Alas I can't recall where I read
| about this, it was some kind of "a few things I learned over
| many years of programming" blogpost.
| noxer wrote:
| That's why you turn AVs off either completely or at least
| exclude your own code folder. It wont ever do anything useful
| and worse even I had it delete/quarantine my own binaries in
| the past because it somehow got triggered. Also by default it
| happily uploads all you debug exe files to MS and executes
| them in their sandbox environment. Absolutely unwanted
| behavior if you ask me.
| Isthatablackgsd wrote:
| This is true. Even moving folders from one drive to other
| drive will trigger it. I download pictures and videos weekly
| (average 100 images a week). Often my browsers will hang when
| it trying to save the file to the folder because it was
| waiting for Windows Security to finish their scanning.
| Windows are proactive (sometime too much) with their
| securities because of the history of how we handle their
| securities updates and many many (thousands of many)
| unpatched Windows OS out there infecting with malware and
| randomware.
|
| Fortunately it have an exclusion list, so the list can add
| any file, folder, file type and process.
| marcodiego wrote:
| I saw windows take many seconds to compile a hello world.
| Compile it for the arduino took even longer, probably because
| it needed to generate more files.
| noxer wrote:
| I wish they would build compression directly into SQLite. I use
| SQLite as a log store mostly dumping JSON data in it. Due to the
| lack of compression the DB is probably 10 times the size it could
| be.
| OskarS wrote:
| I mean, they do: [1]. It's an extension, true, probably because
| they don't want to force zlib as a dependency. Still, very
| straightforward to build with and use, and as you can see from
| the code, not hard to implement either.
|
| [1]: https://sqlite.org/src/file?name=ext/misc/compress.c
| kevinmgranger wrote:
| Compression at the _database_ level would be more useful. If
| you have a large text dataset but need to use the Full Text
| Search feature, I don't think you can compress the index.
| noxer wrote:
| It exists, but its not build in and not free.
|
| See my comment
| https://news.ycombinator.com/item?id=28670543
| phiresky wrote:
| You can also try sqlite-zstd [1], which is an sqlite extension
| allows transparent compression of individual rows by
| automatically training dictionaries based on groups of data
| rows.
|
| Disclaimer: I made it and it's not production ready
|
| [1] https://github.com/phiresky/sqlite-zstd
| Cthulhu_ wrote:
| You could probably pass the data through a gzip filter before
| storage if you don't need the data to be indexable while in the
| database. I know, that's an extra step, but in most programming
| languages it's fairly straightforward.
|
| And to be honest, if you need to search through or index JSON
| blobs in a database you need to reconsider your design.
| nicoburns wrote:
| > And to be honest, if you need to search through or index
| JSON blobs in a database you need to reconsider your design.
|
| As a quick-and-dirty logging solution it makes quite a lot of
| sense. You can add whatever fields you want to your
| structured logs, and if you find you need to search on a
| particular field, just add an index for that field.
| mbreese wrote:
| I would imagine SQLite would do a lot of random access IO,
| which would make this difficult even without indexing.
|
| But there is a gzip "variant" that we use in genomics called
| bgzip that supports this. It's basically a chunked gzip file
| (multiple gzip records concatenated together) with an extra
| flag (in the gzip header) for the uncompressed length of each
| chunk. Using this information, you can do random IO on the
| compressed file while only uncompressing the chunks
| necessary. I'm sure other compression formats have the same
| support.
| masklinn wrote:
| > I would imagine SQLite would do a lot of random access
| IO, which would make this difficult even without indexing.
|
| To individual blobs? Seems unlikely.
|
| Posgres automatically compress large blobs, it's part of
| the TOAST system:
|
| > The TOAST management code is triggered only when a row
| value to be stored in a table is wider than
| TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code
| will compress and/or move field values out-of-line until
| the row value is shorter than TOAST_TUPLE_TARGET bytes
| (also normally 2 kB, adjustable) or no more gains can be
| had.
|
| So if a TOAST-able value is larger than
| TOAST_TUPLE_THRESHOLD it'll get compressed (using pglz by
| default, from 14.0 LZ4 will finally be supported), and if
| it's still above TOAST_TUPLE_TARGET it'll get stored out-
| of-line (otherwise it gets stored inline).
| dunham wrote:
| Interesting. Sqlite moves some data off-page (out of the
| btree node) if the row is bigger than a certain threshold
| (the beginning of it is still stored in-page). I wonder
| if this could be hooked to compress the overflow data.
| mbreese wrote:
| Not the filesystem... to the database file itself. I
| thought that was what we were talking about.
|
| The blobs in the sqlite archive are compressed, but one
| of the critiques was that the compression was better when
| you used multiple files to build the compression
| dictionary (zip vs tarbal).
|
| And now I realize that I probably replied to the wrong
| comment...
| noxer wrote:
| >You could probably pass the data through a gzip filter
| before storage
|
| That p much voids all the benefit of using SQLite.
|
| >And to be honest, if you need to search through or index
| JSON blobs in a database you need to reconsider your design.
|
| This is absolutely intentional the log contains information
| to manually re-play certain events in case of an fatal error
| mostly for debug purpose or performance benchmarks.
|
| This means 99.99% of the data will never be needed and
| eventually will be discarded.
|
| But when I need to re-play data I need to be able to query
| for exactly the data I need. Each JSON "blob" (which is
| actually store as TEXT in SQLite) has a few additional fields
| to quickly filter off 95% or more of unwanted data. The rest
| SQLite simply reads completely to separate wanted and
| unwanted entries further.
|
| With a DB size around 100GB I get the data I need by reading
| 5GB or less. That's perfectly fine for my use case even if I
| would let the DB grow quite a lot larger it would be just
| fine.
| parhamn wrote:
| Very cool idea! I'm a bit torn whether the format should concern
| itself with compression. Seems like a useful general blob
| container strategy, might be prudent to leave compression to the
| consumer?
| genocidicbunny wrote:
| I've hand-rolled something very similar to this, and I just
| used an extra 'tag' column that stored some extra info, often
| the compression type.
|
| FWIW, at the bottom of the linked page it says that you can
| skip the sqlar_compress/uncompress functions and just roll your
| own:
|
| > The code above is for the general case. For the special case
| of an SQLite Archive that only stores uncompressed or
| uncompressible content (this might come up, for example, in an
| SQLite Archive that stores only JPEG, GIF, and/or PNG images)
| then the content can be inserted into and extracted from the
| database without using the sqlar_compress() and
| sqlar_uncompress() functions, and the sqlar.c extension is not
| required.
| kybernetikos wrote:
| That seems like a nice effect of storing the uncompressed
| size along with the blob, and using that to control
| decompression. You can always choose to store uncompressed
| blobs, and the size will always be the same so sqlite will
| not attempt to decompress the blob. It's not quite the same
| as rolling your own compression, but it's pretty easy to see
| how the format could be extended to support that.
| genocidicbunny wrote:
| Using the size to trigger compression vs no-op is clever,
| but doesn't help when you have different compression
| methods that you want to use.
|
| For example, you be using this as an archive format for a
| game, and you may want to store the texture data using a
| different compression method than the ai scripts or the
| audio data. At that point, you definitely need another
| piece of data to clue you in. A tag field is useful then,
| though I've also seen filename/file extension-based methods
| as well.
| say_it_as_it_is wrote:
| S3Lite?
| cxr wrote:
| For an honest assessment, difficulty of implementation--and,
| accordingly, lack of diversity in implementations--should be be
| listed in the "Disadvantages" section.
|
| (It's interesting that applications against censorship are
| brought up. Difficulty of implementation has consequences here,
| too. In order to effectively use SQLite as an archive format, the
| receiving end will need the SQLite software. By comparison, it's
| pretty trivial to craft a polyglot file that is both plain text
| and HTML and is self-extracting and assumes no software on the
| other end except to rely on the ubiquity of commodity web
| browsers. Always bet on text.)
| bastawhiz wrote:
| Sorry, how exactly do you create a "self extracting" archive
| file with only text and HTML?
| [deleted]
| [deleted]
| PostThisTooFast wrote:
| I don't see why this is better than simply implementing a table
| like this yourself.
| ComputerGuru wrote:
| I think the title could use a (2018) appended to it, just so no
| one thinks this is a new thing that will be pushed on them or
| something.
| dang wrote:
| Ok, added. Thanks!
| edwintorok wrote:
| Would be interesting if this was expanded to support more
| compression formats. E.g. zstd. Gzip is quite an old format and
| zstd is a lot quicker to decompress.
| chrisseaton wrote:
| I don't think it uses the gzip format does it? It just uses
| deflate.
| lifthrasiir wrote:
| It's actually the zlib format. The linked post says DEFLATE,
| but the linked source code correctly mentions zlib instead.
| masklinn wrote:
| It doesn't use the gzip file format, but it uses the gzip
| compression format (which is just DEFLATE).
| rini17 wrote:
| Yes I did double check there's no column to indicate compressed
| format, no idea how they want to extend it in future. Maybe
| just autodetect from the contents? However that could fail if
| you want to store already deflated files.
|
| I'd also like (nullable) mimetype column. Can be handy for
| example to store encoding for text files.
|
| On the other side, come people call for file permissions... I'm
| not a fan, too system-dependent.
| mark-r wrote:
| Since compression can produce any bit combination with equal
| probability, the only way to reliably auto-detect from the
| contents is to put some kind of fixed flag at the front. At
| that point you get more flexibility by making it a new column
| in the table instead. It doesn't seem like it would be hard
| to extend this with a compression type and mime type if you
| need those.
|
| I question the need for multiple compression types though,
| the difference between the default and the best won't be
| great enough to be worth the hassle. Especially if it leaves
| you open to a patent troll.
| edwintorok wrote:
| `zstd` and `zlib` as the only 2 supported formats would be
| enough for me (zstd for state of the art, fast compression
| and decompression, with choice of high compression ratios
| if needed; and zlib for backward
| compatibility/portability), but others may disagree and
| want their own favourite compression format there. OTOH if
| you use zstd that already has multiple file capability
| built-in, so you don't gain much by using sqlite (unless
| zstd were to adopt sqlite as.its storage format...).
| edwintorok wrote:
| Actually looks like if you pass multiple files to zstd
| it'll compress them individually, it won't create an
| archive, so tar or sqlite format would still be useful
| there.
| lifthrasiir wrote:
| I'm generally supportive of SQLite's NIH syndrome---which is
| normally bad, but it can work if the trade-off is well researched
| and the resulting product is of high quality---but this one is
| not. Specifically sqlar is a _worse_ replacement of ZIP.
|
| It lacks pretty much every feature of modern compressed archive
| formats: filesystem and custom metadata besides from simple
| st_mode, solid compression, metadata compression, encryption and
| integrity check and so on. Therefore it can only be legitimately
| compared with ZIP, which does support custom metadata, very bad
| encryption and partial integrity check (via zlib) and only lacks
| the guaranteed encoding for file names. Even ignoring other
| formats it is not without a problem: for example the compression
| mode (DEFLATE vs. uncompressed) is implicitly indicated by `sz =
| length(data)` and I don't think it is a good idea. If I were
| designing sqlar and didn't want to spare an additional field I
| would have instead set sz to something negative so that it never
| collides with the compressed case (of course, if I had a chance I
| would just add a separate field instead). Pretty disappointing
| given other tools from the SQLite ecosystem.
| m_eiman wrote:
| Implicitly indicated compression is also a "feature" of LZJB
| which we've used for firmware update files. Figuring out what
| was going on and fixing the issue when a bootloader suddenly
| started rejecting new firmware updates was interesting and
| annoying.
|
| The updates were processed in chunks, and rarely a chunk
| doesn't compress at all - which means that it took a long time
| before it happened the first time. Always fun to find issues
| with code that has "always" worked and suddenly doesn't any
| longer.
|
| https://en.wikipedia.org/wiki/LZJB
| mbreese wrote:
| But given that we are talking about using a SQLite database as
| a container, these features could in theory be added in the
| future while maintaining the ability for older versions to
| still be able to read the newer files.
|
| I think that's still pretty impressive.
| bityard wrote:
| I don't think anybody is suggesting that SQLite archives are a
| suitable replacement for ZIP or tar + compressor of choice
| general-purpose compression duties.
|
| The docs for this seem to suggest that this extension is most
| useful when your application already heavily depends on SQLite
| and you need a place to store some files but don't want to
| introduce further OS-specific dependencies into your
| application.
|
| This is a low-effort solution to a rather specific problem.
| Which is why it's not part of SQLite proper. The extension
| itself is 124 lines of C including comments for heaven's sake.
| lifthrasiir wrote:
| The extension only deals with the compression part. The
| actual operation happens in the shell which is intermingled
| with general archive support. And this suggests the most
| important fact: SQLite already supports an ordinary ZIP file
| via the zipfile virtual table [1]. Both zipfile and sqlar
| extensions are enabled when compiled with zlib, and it is not
| even like that sqlar came first (they are added roughly
| together AFAIK). Therefore if you have SQLite and want to
| have a cheap file archive support, you can just use zip
| instead of the inferior sqlar.
|
| [1] https://sqlite.org/zipfile.html
| chousuke wrote:
| To me, this feature looks more like a convenience wrapper than
| anything else. Maybe it actually was useful for something and
| got implemented for that reason.
|
| You pretty much get everything it needs "for free" with the
| standard SQLite machinery. I can't imagine the code
| implementing this is a large burden at all.
| tptacek wrote:
| ZIP encryption is an anti-feature, and you should probably
| never, ever use it. The problem is that the ZIP ecosystem is
| full of implementations of horrible insecure cipher
| constructions (even the modern implementations may support them
| for backwards compatibility). "Horrible" here means "breakable
| with a Perl script".
|
| We relied on ZIP encryption at a previous firm, because no
| client of ours could ever get PGP to work. But we had to train
| everyone to use 7z, and getting files back from clients was
| fraught (they might just click the "encrypt" button in
| whichever random ZIP program they used).
|
| If you want to encrypt an archive file, you should do it with a
| dedicated encryption program that you trust (Filippo Valsorda's
| `age` is a good bet, since it has good implementations in both
| Go and Rust, so it's very easy to integrate into designs, and
| also has a very clean, simple command line interface). You
| should not lean on ZIP's supposed AES support.
|
| The rest of this critique of `sqlar` might be valid! But I
| can't let the encryption thing pass unchallenged.
| lifthrasiir wrote:
| I think I made it clear that ZIP encryption is "very bad" :-)
| tptacek wrote:
| Totally fair! I think I'm mostly pushing back on the idea
| that archive formats with built-in encryption are a good
| idea.
| lifthrasiir wrote:
| That's reasonable. Given that compression softwares are
| likely what typical people would use for encryption
| though, I'd like to see an actually good encrypted format
| design in a commonly used archival format. I'm even okay
| and more than welcome if that'd be just `age` plus some
| carefully selected headers.
| nextaccountic wrote:
| To be fair, they said that ZIP had very bad encryption.
| zzo38computer wrote:
| > Specifically sqlar is a worse replacement of ZIP.
|
| I mostly agree. However, it does have some advantages too, as
| mentioned in the linked article.
|
| (An alternative way to query archives by SQL would be to
| implement a virtual table; in this case it can work with any
| archive format. In the case of ZIP format, it already does
| this.)
|
| > filesystem and custom metadata besides from simple st_mode
|
| Since it is a SQLite database, you can add additional tables
| etc, so it isn't a problem.
|
| > solid compression, metadata compression
|
| I think that using separate archiving and compression is good,
| which would do these things. A separate program should be used
| for archive and for compression.
|
| However, in some circumstances, solid compression is
| undesirable (as mentioned before in HN). In some formats, solid
| compression is optional, but you might want an index, and there
| will need to be some way to find parts of the file by index.
| This is why in my own experimental concatenatable compression
| format, you can add an optional key frame index.
|
| > encryption
|
| Also encryption should be a separate program from archive and
| compression too, I should think.
|
| (However, there are extensions for SQLite for doing encryption,
| so it is possible to use those if wanted.)
|
| > the compression mode (DEFLATE vs. uncompressed) is implicitly
| indicated by `sz = length(data)` and I don't think it is a good
| idea
|
| I agree; it doesn't look like good to me either.
|
| (Although, the Free Hero Mesh code page file does something
| similar, but in that case the uncompressed sizes of lumps is
| always exactly 2048 bytes anyways, so there is no need to store
| a separate uncompressed size.)
| lifthrasiir wrote:
| > Since it is a SQLite database, you can add additional
| tables etc, so it isn't a problem.
|
| You can add additional tables, but you can't add support for
| additional tables in the past versions. This is especially
| true because sqlar has no version specified.
|
| > I think that using separate archiving and compression is
| good, which would do these things. A separate program should
| be used for archive and for compression.
|
| I've heard this argument a lot and I think it is completely
| false. Archival and compression should go together, otherwise
| you would get suboptimal results.
|
| Archival then compression (i.e. the traditional tarballs)
| will disable random access to individual files _and_
| metadata. This hurts performance in pretty every use case
| except for the one-shot decompression. In particular this
| problem completely prohibits using archival then compression
| in container formats. You can reenable them by using random-
| accessible compression formats, but they are not common and
| less efficient than formats without that constraint.
|
| Compression then archival is much better than archival then
| compression because it essentially mimics the combined
| compression-archival format (I have used .bz2.tar in the
| past), but this mode is unpopular so you need to manually
| decompress them. And it is harder to use different
| compression algorithms for different formats; you should put
| that info to the file name or the metadata, and that would be
| no different from compression-archival formats.
|
| Some claim that the needs for solid compression demonstrate
| the problem of combined compression-archival formats. This is
| not true and even misleading. Archival then compression
| commonly results in _worse_ compression than solid compressed
| archives even with the same algorithm. The gist of solid
| compression is a shared context for related files, so unless
| the archiver somehow reorders files it can 't compete with
| combined archivers with solid compression. And if your
| archiver is doing that reordering, it is no longer decoupled
| from the compressor so it would be better to combine them. In
| addition to this, there are also alternatives to solid
| compression with different trade-offs, for example learned
| shared preset dictionaries (even zlib supports this, although
| unpopular).
|
| > Also encryption should be a separate program from archive
| and compression too, I should think.
|
| Yeah, I'm personally fine without encryption as it is really
| hard to get it right (for example, you should be prepared of
| metadata encryption and key stretching). I've listed it
| partly for the completeness and partly for the fact that it
| is probably the most approachable encryption software to end
| users.
|
| > (However, there are extensions for SQLite for doing
| encryption, so it is possible to use those if wanted.)
|
| And I think the official encryption extension [1] is
| proprietary.
|
| [1] https://www.sqlite.org/see/
| slaymaker1907 wrote:
| You forgot the biggest limitation of sqlar which is lack of
| support for large files (max is 4GB IIRC). I think this is
| really unfortunate since a sqlite based archive file format
| could be really neat and easily support a lot of the features
| you mention. Additionally, I think it is really cool to have
| the ability to inspect the file format using SQL.
|
| If sqlar were actually more usable, it would also be really
| neat for application formats since you could have additional
| metadata for your application in addition to the archive.
| Instead of having some magic MANIFEST file for Java jar files,
| that data could be stored in a separate table with its own
| indices.
| banana_giraffe wrote:
| As I understand things, sqlar is more of a thought experiment
| or proof of concept type thing, not a "we've got a better
| solution than zip files"
|
| https://youtu.be/8y_ABXwYtuc?t=2135
|
| It's interesting, and maybe has some use cases, but not a
| generic solution. I've certainly used a hand-rolled version of
| this idea before to handle specific scenarios where it made
| sense.
| OskarS wrote:
| To add to that list: when you do a .tar.gz, the compression
| algorithm works across file boundaries (very useful when you
| have a lot of small files), but neither sqlar nor .zip does
| this.
|
| I think the idea of using SQLite as an application file format
| is excellent, and it's great that it can serve as a general
| store of files. As a general purpose compressed archive format,
| however, I agree it leaves a lot to be desired.
| lifthrasiir wrote:
| Yeah, I even think a _full file system_ based on SQLite
| format is possible; pretty much all ingredients are there,
| just waiting to be cooked. But it should have been cooked in
| the first place.
| tobias3 wrote:
| Already there. E.g.
| https://github.com/guardianproject/libsqlfs but I'd also
| count something like s3ql where the data is stored as s3
| objects.
| brandmeyer wrote:
| I don't think so. An important missing element in SQLite's
| format is larger block support. Modern filesystems do this
| via extents. SQLite has a strictly fixed block size and no
| support for forming larger blocks from contiguous runs of
| basic blocks.
|
| Another missing element is that SQLite's support for large
| blobs is via linked lists of pages. That's fine for
| sequential access, but unacceptable for random access.
|
| For a database to be usable as a filesystem backend, it
| must at least be composable with itself without sacrificing
| performance. Lets say you built a kernel module or fuse
| module using sqlite as the backend. How well would sqlite
| itself perform on a file in this filesystem?
| lifthrasiir wrote:
| > Another missing element is that SQLite's support for
| large blobs is via linked lists of pages.
|
| This surprised me and you were correct, there is no
| guarantee that related payload overflow pages are
| consecutively ordered. I thought extents can be
| implemented with those pages but it would be pretty bad
| for random access as is. In fact one of the reasons I
| thought SQLite can function as a file system was a built-
| in support of random-accessible BLOBs, and some more
| research revealed that the support was based on the fact
| that SQLite loads the entire BLOB into memory. Oops.
| conradev wrote:
| That is a trade off in that you can no longer randomly access
| a single file without decompressing the entire stream.
|
| That is often desired for file archives. For use cases like
| genomics, though, reads from random offsets are common, hence
| the bgzf format used by BAM files which is a sequence of gzip
| chunks and an index. Lower compression ratio, but faster
| seeking.
| masklinn wrote:
| > To add to that list: when you do a .tar.gz, the compression
| algorithm works across file boundaries (very useful when you
| have a lot of small files), but neither sqlar nor .zip does
| this.
|
| That's mentioned both in the article (3.2) and GP ("solid
| compression" is the term of art for a single compression
| stream for the entire archive).
___________________________________________________________________
(page generated 2021-09-27 23:02 UTC)