https://github.com/Jarred-Sumner/hop

Skip to content
 
Sign up

  * Why GitHub?
    Features -
      + Mobile -
      + Actions -
      + Codespaces -
      + Packages -
      + Security -
      + Code review -
      + Issues -
      + Integrations -
      + GitHub Sponsors -
      + Customer stories-
  * Team
  * Enterprise
  * Explore
      + Explore GitHub -

    Learn and contribute

      + Topics -
      + Collections -
      + Trending -
      + Learning Lab -
      + Open source guides -

    Connect with others

      + The ReadME Project -
      + Events -
      + Community forum -
      + GitHub Education -
      + GitHub Stars program -
  * Marketplace
  * Pricing
    Plans -
      + Compare plans -
      + Contact Sales -
      + Education -

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this user All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}

Jarred-Sumner / hop Public

  * Notifications
  * Star 105
  * Fork 1
  * 

105 stars 1 fork
Star
Notifications

  * Code
  * Issues 1
  * Pull requests 1
  * Actions
  * Projects 0
  * Wiki
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Wiki
  * Security
  * Insights

master
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
Loading
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
Loading
View all tags
1 branch 1 tag
Code
Loading

Latest commit

@Jarred-Sumner
Jarred-Sumner Update README.md
...
d57dd9c Nov 10, 2021
Update README.md
d57dd9c

Git stats

  * 19 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
.vscode
Initial
Nov 10, 2021
src
Fix
Nov 10, 2021
.gitignore
Initial
Nov 10, 2021
Makefile
generalize
Nov 10, 2021
README.md
Update README.md
Nov 10, 2021
build.zig
readme
Nov 10, 2021
schema.peechy
Initial
Nov 10, 2021
View code
[                    ]
hop Usage Why? Some benchmarks On macOS 12 with an M1X On an Ubuntu
AMD64 server Why faster? How does it work?

README.md

 hop

Simple archive format designed for quickly reading some files without
extracting the entire archive. Possibly will be used in Bun.

25x faster than unzip and 10x faster than tar at reading individual
files (uncompressed)

[141064938-]

Format   Random      Fast       Fast    Compression Encryption Append
         access   extraction archiving
hop                                                       
tar                                                       
zip     (when                                             
       small)

Features:

  * Faster at printing individual files than tar & zip (compression
    disabled)
  * Faster extraction than zip, comparable to tar (compression
    disabled)
  * Faster archiving than zip, comparable to tar (compression
    disabled)

Anti-features:

  * Single-threaded (but doesn't need to be)
  * I wrote it in about 3 hours and there are no tests
  * No checksums yet. Probably not a good idea to use this for
    untrusted data until that's fixed.
  * Ignores symlinks
  * Can't be larger than 4 GB
  * Archives are read-only and file names are not normalized across
    platforms

 Usage

Download the binary from /releases

To create an archive:

hop ./path-to-folder

To extract an archive:

hop archive.hop

To print one file from the archive:

hop archive.hop package.json

 Why?

Why can't software read many tiny files with similar performance
characteristics as individual files?

  * Reading and writing lots of tiny files incurs significant syscall
    overhead, and (npm) packages often have lots of tiny files. Zip
    files are unacceptably slow to read from like a directory. tar
    files extract quickly, but are slow at non-sequential access.
  * Reading directory entries (ls) in large directory trees is slow

 Some benchmarks

 On macOS 12 with an M1X

Using tigerbeetle github repo as an example

Archiving:

image

Extracting:

image

 On an Ubuntu AMD64 server

Extracting a node_modules folder

image

 Why faster?

  * It stores an array of hashes for each file path and the list of
    files are sorted lexigraphically. This makes non-sequential
    access faster than tar, but can make creating new archives
    slower.
  * Does not store directories, only files
  * .hop files are read-only (more precisely, one could append but
    would have to rewrite all metadata)
  * copy_file_range
  * packed struct makes serialization & deserialization very fast
    because there is very little encoding/decoding step.

 How does it work?

 1. File contents go at the top, file metadata goes at the bottom
 2. This is the metadata it currently stores:

package Hop;

struct StringPointer {
    uint32 off;
    uint32 len;
}

struct File {
    StringPointer name;
    uint32 name_hash;
    uint32 chmod;
    uint32 mtime;
    uint32 ctime;
    StringPointer data;
}

message Archive {
    uint32 version = 1;
    uint32 content_offset = 2;
    File[] files = 3;
    uint32[] name_hashes = 4;
    byte[] metadata = 5;
}

About

No description, website, or topics provided.

Resources

Readme

Releases 1

 
v0.0.0 Latest
Nov 10, 2021

Packages 0

No packages published

Languages

  * Zig 94.2%
  * Makefile 5.8%

  * (c) 2021 GitHub, Inc.
  * Terms
  * Privacy
  * Security
  * Status
  * Docs

 

  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.