https://github.com/Jarred-Sumner/hop Skip to content Sign up * Why GitHub? Features - + Mobile - + Actions - + Codespaces - + Packages - + Security - + Code review - + Issues - + Integrations - + GitHub Sponsors - + Customer stories- * Team * Enterprise * Explore + Explore GitHub - Learn and contribute + Topics - + Collections - + Trending - + Learning Lab - + Open source guides - Connect with others + The ReadME Project - + Events - + Community forum - + GitHub Education - + GitHub Stars program - * Marketplace * Pricing Plans - + Compare plans - + Contact Sales - + Education - [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this user All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} Jarred-Sumner / hop Public * Notifications * Star 105 * Fork 1 * 105 stars 1 fork Star Notifications * Code * Issues 1 * Pull requests 1 * Actions * Projects 0 * Wiki * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Wiki * Security * Insights master Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show Loading {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default Loading View all tags 1 branch 1 tag Code Loading Latest commit @Jarred-Sumner Jarred-Sumner Update README.md ... d57dd9c Nov 10, 2021 Update README.md d57dd9c Git stats * 19 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .vscode Initial Nov 10, 2021 src Fix Nov 10, 2021 .gitignore Initial Nov 10, 2021 Makefile generalize Nov 10, 2021 README.md Update README.md Nov 10, 2021 build.zig readme Nov 10, 2021 schema.peechy Initial Nov 10, 2021 View code [ ] hop Usage Why? Some benchmarks On macOS 12 with an M1X On an Ubuntu AMD64 server Why faster? How does it work? README.md hop Simple archive format designed for quickly reading some files without extracting the entire archive. Possibly will be used in Bun. 25x faster than unzip and 10x faster than tar at reading individual files (uncompressed) [141064938-] Format Random Fast Fast Compression Encryption Append access extraction archiving hop tar zip (when small) Features: * Faster at printing individual files than tar & zip (compression disabled) * Faster extraction than zip, comparable to tar (compression disabled) * Faster archiving than zip, comparable to tar (compression disabled) Anti-features: * Single-threaded (but doesn't need to be) * I wrote it in about 3 hours and there are no tests * No checksums yet. Probably not a good idea to use this for untrusted data until that's fixed. * Ignores symlinks * Can't be larger than 4 GB * Archives are read-only and file names are not normalized across platforms Usage Download the binary from /releases To create an archive: hop ./path-to-folder To extract an archive: hop archive.hop To print one file from the archive: hop archive.hop package.json Why? Why can't software read many tiny files with similar performance characteristics as individual files? * Reading and writing lots of tiny files incurs significant syscall overhead, and (npm) packages often have lots of tiny files. Zip files are unacceptably slow to read from like a directory. tar files extract quickly, but are slow at non-sequential access. * Reading directory entries (ls) in large directory trees is slow Some benchmarks On macOS 12 with an M1X Using tigerbeetle github repo as an example Archiving: image Extracting: image On an Ubuntu AMD64 server Extracting a node_modules folder image Why faster? * It stores an array of hashes for each file path and the list of files are sorted lexigraphically. This makes non-sequential access faster than tar, but can make creating new archives slower. * Does not store directories, only files * .hop files are read-only (more precisely, one could append but would have to rewrite all metadata) * copy_file_range * packed struct makes serialization & deserialization very fast because there is very little encoding/decoding step. How does it work? 1. File contents go at the top, file metadata goes at the bottom 2. This is the metadata it currently stores: package Hop; struct StringPointer { uint32 off; uint32 len; } struct File { StringPointer name; uint32 name_hash; uint32 chmod; uint32 mtime; uint32 ctime; StringPointer data; } message Archive { uint32 version = 1; uint32 content_offset = 2; File[] files = 3; uint32[] name_hashes = 4; byte[] metadata = 5; } About No description, website, or topics provided. Resources Readme Releases 1 v0.0.0 Latest Nov 10, 2021 Packages 0 No packages published Languages * Zig 94.2% * Makefile 5.8% * (c) 2021 GitHub, Inc. * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.