[HN Gopher] I built a 2x faster lexer, then discovered I/O was t...
___________________________________________________________________
I built a 2x faster lexer, then discovered I/O was the real
bottleneck
Author : modulovalue
Score : 173 points
Date : 2026-01-20 16:10 UTC (5 days ago)
(HTM) web link (modulovalue.com)
(TXT) w3m dump (modulovalue.com)
| marginalia_nu wrote:
| Zip with no compression is a nice contender for a container
| format that shouldn't be slept on. It effectively reduces the
| I/O, while unlike TAR, allowing direct random to the files
| without "extracting" them or seeking through the entire file,
| this is possible even via mmap, over HTTP range queries, etc.
|
| You can still get the compression benefits by serving files with
| Content-Encoding: gzip or whatever. Though it has builtin
| compression, you can just not use that and use external
| compression instead, especially over the wire.
|
| It's pretty widely used, though often dressed up as something
| else. JAR files or APK files or whatever.
|
| I think the articles complaints about lacking unix access rights
| and metadata is a bit strange. That seems like a feature more
| than a bug, as I wouldn't expect this to be something that
| transfers between machines. I don't want to unpack an archive and
| have to scrutinize it for files with o+rxst permissions, or have
| their creation date be anything other than when I unpacked them.
| stabbles wrote:
| > Zip with no compression is a nice contender for a container
| format that shouldn't be slept on
|
| SquashFS with zstd compression is used by various container
| runtimes, and is popular in HPC where filesystems often have
| high latency. It can be mounted natively or with FUSE, and the
| decompression overhead is not really felt.
| ciupicri wrote:
| Wouldn't you still have a lot of syscalls?
| stabbles wrote:
| Yes, but with much lower latency. The squashfs file ensures
| the files are close together and you benefit from fs cache
| a lot.
| LtdJorge wrote:
| You then use io_uring
| __turbobrew__ wrote:
| Just make sure you mount the squashfs with --direct-io or
| else you will be double caching (caching the sqfs pages, and
| caching the uncompressed files within the sqfs). I have no
| idea why this isn't the default. Found this out the hard way.
| 1718627440 wrote:
| Isn't this what is already common in the Python community?
|
| > I don't want to unpack an archive and have to scrutinize it
| for files with o+rxst permissions, or have their creation date
| be anything other than when I unpacked them.
|
| I'm the opposite, when I pack and unpack something, I want the
| files to be identical including attributes. Why should I throw
| away all the timestamps, just because the file were temporarily
| in an archive?
| rustyhancock wrote:
| Yes, it's a lossy process.
|
| If your archive drops it you can't get it back.
|
| If you don't want it you can just chmod -R u=rw,go=r,a-x
| 1718627440 wrote:
| > If your archive drops it you can't get it back.
|
| Hence, the common archive format is tar not zip.
| password4321 wrote:
| > _Why should I throw away all the timestamps, just because
| the file were temporarily in an archive?_
|
| In case anyone is unaware, you don't have to throw away all
| the timestamps when using "zip with no compression". The
| metadata for each zipped file includes one timestamp
| (originally rounded to even number of seconds in local time).
|
| I am a big last modified timestamp fan and am often
| discouraged that scp, git, and even many zip utilities are
| not (at least by default).
| rcxdude wrote:
| git updates timestamps in part by necessity of
| compatibility with build systems. If it applied the
| timestamp of when the file was last modified on checkout
| then most build systems would break if you checked out an
| older commit.
| ralferoo wrote:
| git blame is more useful than the file timestamp in any
| case.
| nh2 wrote:
| There is some confusion here.
|
| ZIP retains timestamps. This makes sense because timestamps
| are a global concept. Consider them a attribute dependent on
| only the file in ZIP, similar to the file's name.
|
| Owners and permissions are dependent also on the computer the
| files are stored on. User "john" might have a different user
| ID on another computer, or not exist there at all, or be a
| different John. So there isn't one obvious way to handle
| this, while there is with timestamps. Archiving tools will
| have to pick a particular way of handling it, so you need to
| pick the tool that implements the specific way you want.
| pwg wrote:
| > ZIP retains timestamps.
|
| It does, but unless the 'zip' archive creator being used
| makes use of the extensions for high resolution timestamps,
| the basic ZIP format retains only old MSDOS style
| timestamps (rounded to the closed two seconds). So one may
| lose some precision in ones timestamps when passing files
| through a zip archive.
| zahlman wrote:
| > Isn't this what is already common in the Python community?
|
| I'm not aware of standards language mandating it, but build
| tools generally do compress wheels and sdists.
|
| If you're thinking of zipapps, those are not actually common.
| 1718627440 wrote:
| I was talking about using zipfile as a generic file format,
| instead of open and close.
| zahlman wrote:
| I'm afraid I don't understand specifically what you're
| referring to. Maybe you could show some code citations of
| popular projects doing it?
| LtdJorge wrote:
| Doesn't ZIP have all the metadata at the end of the file,
| requiring some seeking still?
| conradludgate wrote:
| Yes, but it's an O(1) random access seek rather than O(n)
| scanning seek
| marginalia_nu wrote:
| It has an index at the end of the file, yeah, but once you've
| read that bit, you learn where the contents are located and
| if compression is disabled, you can e.g. memory map them.
|
| With tar you need to scan the entire file start-to-finish
| before you know where the data is located, as it's literally
| a tape archiving format, designed for a storage medium with
| no random access reads.
| 01HNNWZ0MV43FF wrote:
| I thought Tar had an extension to add an index, but I can't
| find it in the Wikipedia article. Maybe I dreamt it.
| st_goliath wrote:
| You might be thinking of _ar_ , the classic Unix ARchive that
| is used for static libraries?
|
| The format used by `ar` is a quite simple, somewhat like tar,
| with files glued together, a short header in between and no
| index.
|
| Early Unix eventually introduced a program called `ranlib`
| that generates and appends and index for libraries (also
| containing extracted symbols) to speed up linking. The index
| is simply embedded as a file with a special name.
|
| The GNU version of `ar` as well as some later Unix
| descendants support doing that directly instead.
| cb321 wrote:
| Besides `ar` as a sibiling observed, you might also be
| thinking of pixz - https://github.com/vasi/pixz , but really
| any archive format (cpio, etc.) can, in principle, just put a
| stake in the ground to have its last file be any kind of
| binary / whatever index file directory like Zip. Or it could
| hog a special name like .__META_INF__ instead.
| zahlman wrote:
| > It's pretty widely used, though often dressed up as something
| else. JAR files or APK files or whatever.
|
| JAR files generally do/did use compression, though. I imagine
| you _could_ forgo it, but I didn 't see it being done. (But
| maybe that was specific to the J2ME world where it was more
| necessary?)
| charcircuit wrote:
| Specifically the benefit is for the native libraries within
| the file as you can map the library directly to memory
| instead of having to make a decompressed copy and then
| mapping that copy to memory.
| zahlman wrote:
| Yes, that's clear. I'm just not aware of people actually
| _doing_ that, or having done it back in the era when Java
| was more dominant.
| charcircuit wrote:
| The bigger issue is that glibc doesn't support loading
| libraries from zip archives where bionic's linker ddoes.
| So on platforms where glibc is used you wouldn't see it
| being done.
| zahlman wrote:
| Again, I was talking about Java (not C). Good to know,
| though.
| paulddraper wrote:
| > I wouldn't expect this to be something that transfers between
| machines
|
| Maybe non-UNIX machines I suppose.
|
| But I 100% need executable files to be executable.
| marginalia_nu wrote:
| Do you also want the setuid bit I added?
| Joker_vD wrote:
| Honestly, sometimes I just want to mark all files on a Linux
| system as executable and see what would even break and why.
| Seriously, why is there a whole bit for something that's
| essentially an 'read permission, but you can also directly
| execute it from the shell'?
| paulddraper wrote:
| It's a security thing, in conjunction with sudoers, I
| think.
| dwattttt wrote:
| This seems like something that shouldn't be the container
| formats responsibility. You can record arbitrary metadata and
| put it in a file in the container, so it's trivial to layer
| on top.
|
| On the other hand, tie the container structure to your OS
| metadata structure, and your (hopefully good) container
| format is now stuck with portability issues between other
| OSes that don't have the same metadata layout, as well as
| your own OS in the past & future.
| hinkley wrote:
| Gzip will make most line protocols efficient enough that you
| can do away with needing to write a cryptic one that will just
| end up being friction every time someone has to triage a
| production issue. Zstd will do even better.
|
| The real one-two punch is make your parser faster and then
| spend the CPU cycles on better compression.
| cb321 wrote:
| DNA researchers developed a parallel format for gzip they
| call "bgzip" ( https://learngenomics.dev/docs/genomic-file-
| formats/compress... ) that makes data seem less trapped
| behind a decompression perf wall. Zstd is still a bit faster
| (but < ~2X) and also gets better compression ratios
| (https://forum.nim-lang.org/t/5103#32269)
| smallstepforman wrote:
| This is how Haiku packages are managed, from the outside its a
| single zstd file, internally all dependacies and files and
| included in read only file. Reduces IO, reduces file clutter,
| instant install/uninstall, zero chance for user to corrupt
| files or dependancy, and easy to switch between versions. The
| Haiku file system also supports virtual dir mapping so the
| stubborn Linux port thinks its talking to /usr/local/lib, but
| in reality its part of the zstd file in /system/packages.
| 0xbadcafebee wrote:
| Strangely enough, there is a tool out there that gives Zip-like
| functionality while preserving Tar metadata functionality, that
| nobody uses. It even has extra archiving functions like binary
| deltas. dar (Disk ARchive) http://dar.linux.free.fr/
| spwa4 wrote:
| You mean ZIP?
|
| Zip has 2 tricks: First, compression is per-file, allowing
| extraction of single files without decompressing anything
| else.
|
| Second, the "directory" is at the end, not the beginning, and
| ends in the offset of the beginning of the directory. Meaning
| 2 disk seeks (matters even on SSDs) and you can show the user
| all files.
|
| Then, you know exactly what bytes are what file and
| everything's fast. Second, you can easily take off the
| directory from the zip file, allowing new files to be added
| without modifying the rest of the file, which can be extended
| to allow for arbitrary modification of the contents, although
| you may need to "defragment" the file.
|
| And I believe, encryption is also per-file. Meaning to
| decrypt a file you need _both_ the password and the directory
| entry, which means that if you delete a file, and rewrite
| just the directory, the data is unrecoverable without
| requiring a total rewrite of the bytes.
| hedora wrote:
| I think Zip's main trick is that it's been preloaded on
| everything forever.
| thaumasiotes wrote:
| > It effectively reduces the I/O, while unlike TAR, allowing
| direct random to the files without "extracting" them or seeking
| through the entire file
|
| How do you access a particular file without seeking through the
| entire file? You can't know where anything is without first
| seeking through the whole file.
| charcircuit wrote:
| You look at the end of the file which tells you where the
| central directory is. The directory tells you where
| individual files are.
| nikanj wrote:
| At the end of the ZIP file, there's a central directory of
| all files contained in that archive. Read the last block,
| seek to the block containing the file you want to access,
| done
| account42 wrote:
| One problem with the zip format is that metadata is stored both
| in the central directory and also before each file data - that
| creates ambiguity when the metadata differs which different
| programs/libraries don't handle consistently.
| akaltar wrote:
| Amazing article, thanks for sharing. I really appreciate the deep
| investigations in response to the comments
| stabbles wrote:
| "I/O is the bottleneck" is only true in the loose sense that
| "reading files" is slow.
|
| Strictly speaking, the bottleneck was latency, not bandwidth.
| raggi wrote:
| there are a loooot of languages/compilers for which the most
| wall-time expensive operation in compilation or loading is
| stat(2) searching for files
| ghthor wrote:
| I actually ran into this issue building dependency graphs of a
| golang monorepo. We analyzed the cpu trace and found that the
| program was doing a lot of GC so we reduced allocations. This
| was just noise though as the runtime was just making use of
| time waiting for I/O as it had shelled out to go list to get a
| json dep graph from the CLI program. This turns out to be slow
| due to stat calls and reading from disk. We replaced our usage
| of go list with a custom package import graph parser using the
| std lib parser packages and instead of reading from disk we
| give the parser byte blobs from git, also using git ls-files to
| "stat" the files. Don't remember the specifics but I believe we
| brought the time from 30-45s down to 500ms to build the dep
| graph.
| nudpiedo wrote:
| Same thing applies to other system aspects:
|
| compressing the kernel loads it faster on RAM even if it still
| has to execute the un compressing operation. Why?
|
| Load from disk to RAM is a larger bottleneck than CPU
| uncompressing.
|
| Same is applied to algorithms, always find the largest bottleneck
| in your dependent executions and apply changes there as the rest
| of the pipeline waits for it. Often picking the right algorithm
| "solves it" but it may be something else, like waiting for IO or
| coordinating across actors (mutex if concurrency is done as it
| used to).
|
| That's also part of the counterintuitive take that more
| concurrency brings more overhead and not necessarily faster
| execution speeds (topic largely discussed a few years ago with
| async concurrency and immutable structures).
| direwolf20 wrote:
| Networks too. Compressing the response with gzip is usually
| faster than sending it uncompressed through the network. This
| wasn't always the case.
| solatic wrote:
| Headline is wrong. I/O wasn't the bottleneck, syscalls were the
| bottleneck.
|
| Stupid question: why can't we get a syscall to load an entire
| directory into an array of file descriptors (minus an array of
| paths to ignore), instead of calling open() on every individual
| file in that directory? Seems like the simplest solution, no?
| cb321 wrote:
| One aspect of the question is that "permissions" are mostly
| regulated at the time of open and user-code should check for
| failures. This was a driving inspiration for the tiny 27 lines
| of C virtual machine in https://github.com/c-blake/batch that
| allows you to, e.g., synthesize a single call that mmaps a
| whole file
| https://github.com/c-blake/batch/blob/64a35b4b35efa8c52afb64...
| which seems like it would have also helped the article author.
| stabbles wrote:
| What comes closest is scandir [1], which gives you an iterator
| of direntries, and can be used to avoid lstat syscalls for each
| file.
|
| Otherwise you can open a dir and pass its fd to openat together
| with a relative path to a file, to reduce the kernel overhead
| of resolving absolute paths for each file.
|
| [1] https://man7.org/linux/man-pages/man3/scandir.3.html
| zokier wrote:
| in what way does scandir avoid stat syscalls?
| stabbles wrote:
| Because you get an iterator over `struct dirent`, which
| includes `d_type` for popular filesystems.
|
| Notice that this avoids `lstat` calls; for symlinks you may
| still need to do a stat call if you want to stat the
| target.
| direwolf20 wrote:
| This is a (3) man page which means it's not a syscall. Have
| you checked it doesn't call lstat on each file?
| stabbles wrote:
| Fair, https://www.man7.org/linux/man-
| pages/man2/getdents64.2.html is a better link. You'd have
| to call lstat when d_type is DT_UNKNOWN
| arter45 wrote:
| >why can't we get a syscall to load an entire directory into an
| array of file descriptors (minus an array of paths to ignore),
| instead of calling open() on every individual file in that
| directory?
|
| You mean like a range of file descriptors you could use if you
| want to save files in that directory?
| levodelellis wrote:
| Not sure, I'd like that too
|
| You could use io_uring but IMO that API is annoying and I
| remember hitting limitations. One thing you could do with
| io_uring is using openat (the op not the syscall) with the dir
| fd (which you get from the syscall) so you can asynchronously
| open and read files, however, you couldn't open directories for
| some reason. There's a chance I may be remembering wrong
| justsomehnguy wrote:
| If you don't need the security at all then yes. Otherwise you
| need to check every file for the permissions.
| direwolf20 wrote:
| You can probably do it with io_uring, as a generic syscall
| batching mechanism.
| king_geedorah wrote:
| io_uring supports submitting openat requests, which sounds like
| what you want. Open the dirfd, extract all the names via
| readdir and then submit openat SQEs all at once. Admittedly I
| have not used the io uring api myself so I can't speak to edge
| cases in doing so, but it's "on the happy path" as it were.
|
| https://man7.org/linux/man-pages/man3/io_uring_prep_open.3.h...
|
| https://man7.org/linux/man-pages/man2/readdir.2.html
|
| Note that the prep open man page is a (3) page. You could of
| course construct the SQEs yourself.
| torginus wrote:
| You have a limit of 1k simultaneous open files per process -
| not sure what overhead exists in the kernel that made them
| impose this, but I guess it exists for a reason. You might
| run into trouble if you open too many files at ones (either
| the kernel kills your process, or you run into some internal
| kernel bottleneck that makes the whole endeavor not so
| worthwhile)
| dinosaurdynasty wrote:
| That's mainly for historical reasons (select syscall can
| only handle fds<1024), modern programs can just set their
| soft limit to their hard limit and not worry about it
| anymore: https://0pointer.net/blog/file-descriptor-
| limits.html
| paulddraper wrote:
| io_uring can open multiple files.
| ori_b wrote:
| It's not the syscalls. There were only 300,000 syscalls made.
| Entering and exiting the kernel takes 150 cycles on my (rather
| beefy) Ryzen machine, or about 50ns per call.
|
| Even assume it takes 1us per mode switch, which would be
| insane, you'd be looking at 0.3s out of the 17s for syscall
| overhead.
|
| It's not obvious to me where the overhead is, but random seeks
| are still expensive, even on SSDs.
| ncruces wrote:
| Didn't test, but my guess is it's not "syscalls" but "open,"
| "stat," etc; "read" would be fine. And something like
| "openat" might mitigate it.
| mnahkies wrote:
| Something that struck me earlier this week was when profiling
| certain workloads, I'd really like a flame graph that included
| wall time waiting on IO, be it a database call, filesystem or
| other RPC.
|
| For example, our integration test suite on a particular service
| has become quite slow, but it's not particularly clear where the
| time is going. I suspect a decent amount of time is being spent
| talking to postgres, but I'd like a low touch way to profile this
| trillic wrote:
| See if you can wrap the underlying library call to pg.query or
| whatever it is with a generic wrapper that logs time in the
| query function. Should be easy in a dynamic lang.
| Kuinox wrote:
| Tracing profiler can do exactly that, you don't need a
| dynamic lang.
| 6keZbCECT2uB wrote:
| There's prior work:
| https://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.h...
|
| There are a few challenges here. - Off-cpu is missing the
| interrupt with integrated collection of stack traces, so you
| instrument a full timeline when they move on and off cpu or
| periodically walk every thread for its stack trace -
| Applications have many idle threads and waiting for IO is a
| common threadpool case, so its more challenging to associate
| the thread waiting for a pool doing delegated IO from idle
| worker pool threads
|
| Some solutions: - Ive used nsight systems for non GPU stuff to
| visualize off CPU time equally with on CPU time - gdb thread
| apply all bt is slow but does full call stack walking. In
| python, we have py-spy dump for supported interpreters -
| Remember that any thing you can represent as call stacks and
| integers can be converted easily to a flamegraph. eg taking
| strace durations by tid and maybe fd and aggregating to a
| flamegraph
| jchw wrote:
| Sounds more like the VFS layer/FS is the bottleneck. It would be
| interesting to try another FS or operating system to see how it
| compares.
| arter45 wrote:
| Some say Mac OS X is (or used to be) slower than Linux at least
| for certain syscalls.
|
| https://github.com/golang/go/issues/28739#issuecomment-10426...
|
| https://stackoverflow.com/questions/64656255/why-is-the-c-fu...
|
| https://github.com/valhalla/valhalla/issues/1192
|
| https://news.ycombinator.com/item?id=13628320
|
| Not sure what's the root cause, though.
| jchw wrote:
| This would not be surprising at all! An impressive amount of
| work has gone into making the Linux VFS and filesystem code
| fast and scalable. I'm well aware that Linux didn't invent
| the RCU scheme, but it uses variations on RCU liberally to
| make filesystem operations minimally contentious, and
| aggressively caches. (I've also learned recently that the
| Linux VFS abstractions are quite different from BSD/UNIX, and
| they don't really map to eachother. Linux has many
| structures, like dentries and generic inodes, that map to
| roughly one structure in BSD/UNIX, the vnode structure. I'm
| not positive that this has huge performance implications but
| it does seem like Linux is aggressive at caching dentries
| which may make a difference.)
|
| That said, I'm certainly no expert on filesystems or OS
| kernels, so I wouldn't know if Linux would perform faster or
| slower... But it would be very interesting to see a
| comparison, possibly even with a hypervisor adding overhead.
| djmips wrote:
| profile profile profile
| rurban wrote:
| I still use a 10x faster lexer, RE2C over flex, because it does
| so much more at compile-time. And on top of that has a bunch of
| optimization options for better compilers, like computed goto's.
|
| Of course syscalls suck, slurping the whole file at once always
| wins, and in this case all files at once.
|
| Kernels suck in general. You don't really need one for high perf
| and low space.
| sheepscreek wrote:
| Always knew this anecdotally - through experience - told myself
| it's the file system block size that's the culprit. Put together
| with SSD random seek times, it makes sense on the surface. I
| never thought of syscalls being so expensive. But they might just
| be symptom and not the bottleneck itself (after all it's just a
| function call to the server). My initial thought was DMA. You
| see, CPUs usually have direct access to only one PCI/e in most
| consumer hardware. The other PCI/e and mvme.2 slots share the
| same bandwidth and take turns. When someone wants access, they do
| a dance with the CPU and other parts of the computer using INT or
| (or interrupt) instructions that make the CPU pause so I/O can
| take over for a bit. The switching back and forth is costly too
| and adds up quickly.
|
| That said, it wouldn't explain why a MacBook (which should have
| the SSD already on the fastest/dedicated pathway) be this slow
| unless something else in the OS was the bottleneck?
|
| I think we're just scratching the surface here and there is more
| to this story that is waiting to be discovered. But yeah, to get
| the job done, package it in fewer files for the OS, preload into
| RAM or use mmap, then profit.
| Dwedit wrote:
| On Windows, an optimization is to call CloseHandle from a
| secondary thread.
| hwspeed wrote:
| Classic case of optimizing the wrong thing. I've hit similar
| issues with ML training pipelines where GPU utilization looks
| terrible because data loading is the bottleneck. The profiler
| tells you the GPU kernel is fast, but doesn't show you it's
| sitting idle 80% of the time waiting for the next batch. Amdahl's
| law is brutal when you've got a serial component in your
| pipeline.
| torginus wrote:
| I started programming on DOS - I remember how amazing was that
| you basically almost talked to hardware directly, there was very
| little restriction on what you could do, and the OS (which imo
| was much more akin to a set of libraries) provided very little
| abstraction for you.
|
| Then I moved to Windows, and Linux. Each had its own
| idiosyncrasies, like how everything is a file on Linux, and
| you're supposed to write programs by chaining existing
| executables together, or on the desktop, both Win32 and X11
| started out with their own versions of UI elements, so XWindow or
| Win32 would know about where a 'button' was, and the OS was
| responsible for event handling and drawing stuff.
|
| Eventually both Windows and Linux programs moved to a model where
| the OS just gave you the window as a drawing surface, and you
| were supposed to fill it.
|
| Similarly, all other OS supplied abstractions slowly fell out of
| use beyond the bare minimum.
|
| Considering this, I wonder if it's time to design a new, much
| lower level abstraction, for file systems in this case, this
| would be a way to mmap an entire directory into the process
| space, where each file would be a struct, whicha had a list of
| pointers for the pages on the disk, and each directory would be a
| list of such entries, again stored in some data structure you
| could access, synchronizing reads/writes would be orechestrated
| by the kernel somehow (I'm thinking locking/unlocking pages being
| written to).
|
| So that way there'd be no difference between traversing an in-
| memory data structure and reading the disk.
|
| I know this approach isnt super compatible with the async/await
| style of I/O, however I'm not 100% convinced that's the correct
| approach either (disk paging is a fundamental feature of all
| OSes, yet is absolutely inexpressible in programming terms)
| LorenPechtel wrote:
| I'd love to see this.
|
| Bring back the "segmented" memory architecture. It was not evil
| because of segments, but because of segment size. If any
| segment can be any size the bad aspects fall away.
|
| File handles aren't needed anymore. You open a file, you get
| back a selector rather than an ID. You reference memory from
| that selector, the system silently swaps pages in as needed.
|
| You could probably do the same thing with directories but I
| haven't thought about it.
| torginus wrote:
| The idea as I stated it is super half-baked but
|
| > You could probably do the same thing with directories but I
| haven't thought about it.
|
| For example in the FAT filesystem, a directory is just a file
| with a special flag set in its file descriptor and inside
| said file there is just a list of file descriptors. Not sure
| if something so simple would a good idea, but it certainly
| works and has worked IRL.
| 1vuio0pswjnm7 wrote:
| "TAR vs ZIP: Sequential vs Random Access"
|
| https://199.233.217.201/pub/pkgsrc/distfiles/dictd-1.13.3.ta...
|
| Wikipedia:
|
| "In order to efficiently store dictionary data, dictzip, an
| extension to the gzip compression format (also the name of the
| utility), can be used to compress a .dict file. Dictzip
| compresses file in chunks and stores the chunk index in the gzip
| file header, thus allowing random access to the data."
| Sparkyte wrote:
| I/O has been the bottleneck for many things especially databases.
|
| So as someone who has seen a long spread of technological
| advancements over the years I can confidently tell you that chips
| have far surpassed any peripheral components.
|
| Kind of that scenario where compute has to be fast enough anyway
| to support I/O. So really it always has to be faster, but I am
| saying that it has exceeded those expectations.
___________________________________________________________________
(page generated 2026-01-26 15:01 UTC)