[HN Gopher] GGUF, the Long Way Around
___________________________________________________________________
GGUF, the Long Way Around
Author : Tomte
Score : 96 points
Date : 2024-02-29 19:36 UTC (3 hours ago)
(HTM) web link (vickiboykis.com)
(TXT) w3m dump (vickiboykis.com)
| skadamat wrote:
| This is an excellent deep dive! Love the depth here Vicki
| cooper_ganglia wrote:
| I've been looking for a good resource on GGUF for the past week
| or so, the timing on this is awesome! Thanks!
| RicoElectrico wrote:
| As LLMs have quite minor changes between architectures, would it
| make sense to just embed the model compiled to some sort of
| simple bytecode right in the GGUF file? Then, only implement
| specific new operations when researchers come up with a new model
| that gains enough traction to be of interest.
| sroussey wrote:
| Yeah, but you want to avoid remote code execution:
|
| https://www.bleepingcomputer.com/news/security/malicious-ai-...
| RicoElectrico wrote:
| The bytecode would not even need to be Turing-complete. Or
| maybe it could take inspiration from eBPF which gives some
| guarantees. What you posted is related to the design
| oversight of Python's pickle format.
| sroussey wrote:
| I think ONNX does what you say.
| liuliu wrote:
| Not really. We've been on that road before. Embedding
| computation graph into the file makes changes to the
| computation graph harder (you need to make sure it is backward
| compatible). This is OK in general (as we have onnx already),
| but then if you have dynamic shape and the fact that different
| optimizations we implemented are actually tied to the
| computation graph, this is simply not optimal. (BTW, this is
| why PyTorch just embed the code into the pth file, much easier
| and backward compatible than a static computation graph).
| rahimnathwani wrote:
| It seems like a lot of innovation is around training, no? GGML
| (the library that reads GGUF format) supports these values for
| the required 'general.architecture': llama
| mpt gptneox gptj gpt2 bloom
| falcon rwkv
| tbalsam wrote:
| Llama.cpp I think has a ton of clone-and-own boilerplate,
| presumably from having grown so quickly (I think one of their .cu
| files is over 10k lines or so, roughly, ATM).
|
| While I haven't seen the model storage and distribution format,
| the rewrite to GGUF for file storage seems to have been a big
| boon/boost to the project. Thanks Phil! Cool stuff. Also, he's a
| really nice guy to boot. Please say hi from Fern to him if you
| ever run into him. I mean it literally, make his life a hellish
| barrage of nonstop greetings from Fern.
| liuliu wrote:
| I honestly think have a way to just use json (a.k.a.
| safetensors) / msgpack or some lightweight metadata serializer
| is a better route than coming up with a new file format. That's
| also why I just use SQLite to serialize the metadata (and
| tensor weights, this part is an oversight).
| andy99 wrote:
| Gguf is cleaner to read in languages that don't have a json
| parsing library, and works with memory mapping in C. It's
| very appealing for minimal inference frameworks vs other
| options.
| liuliu wrote:
| safetensors can mmap too because the tensor data are just
| offsets and you are free to align to whatever you want.
|
| It is hard to keep metadata minimal, and before long, you
| will start to have many different "atom"s and end-up with
| things that mov supports but mp4 doesn't etc etc. (mov
| format is generally well-defined and easy-to-parse, but
| being a binary format, you have to write your parser etc is
| not a pleasant experience).
|
| If you just want minimal dependency, flatbuffers,
| capnproto, json are all well-supported on many platforms.
| jart wrote:
| mmap() requires that you map at page aligned intervals
| which must be congruent with the file offset. You can't
| just round down because some gpus like metal require that
| the data pointers themselves be page aligned too.
| liuliu wrote:
| Yeah, safetensors separates metadata and tensor data. The
| metadata is an offset reference to the tensor data that
| you are free to define yourselves. In that way, you can
| create files in safetensors format but the tensor data
| itself is paged aligned offsets.
| andy99 wrote:
| > GPT-Generated Unified Format
|
| GG is Georgi Gerganov
___________________________________________________________________
(page generated 2024-02-29 23:00 UTC)