https://github.com/ggerganov/llama.cpp/pull/613

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
      + Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
      + For
      + Enterprise
      + Teams
      + Startups
      + Education
      + By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
      + Case Studies
      + Customer Stories
      + Resources
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
      + Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this user All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}
ggerganov / llama.cpp Public

  * Notifications
  * Fork 2.3k
  * Star 16.4k

  * Code
  * Issues 117
  * Pull requests 20
  * Discussions
  * Actions
  * Projects 0
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Projects
  * Security
  * Insights

New issue

Have a question about this project? Sign up for a free GitHub account
to open an issue and contact its maintainers and the community.

Pick a username
    [                    ]

Email Address
    [                    ]

Password
    [                    ]

[                    ] Sign up for GitHub
By clicking "Sign up for GitHub", you agree to our terms of service
and privacy statement. We'll occasionally send you account related
emails.

Already on GitHub? Sign in to your account

Jump to bottom

Make loading weights 10-100x faster #613

Merged
jart merged 9 commits into ggerganov:master from jart:loader Mar 30,
2023
Merged

Make loading weights 10-100x faster #613

jart merged 9 commits into ggerganov:master from jart:loader Mar 30,
2023
+717 -328 
Conversation 35 Commits 9 Checks 22 Files changed 11

Conversation

This file contains bidirectional Unicode text that may be interpreted
or compiled differently than what appears below. To review, open the
file in an editor that reveals hidden Unicode characters. Learn more
about bidirectional Unicode characters
Show hidden characters
jart
Copy link
Collaborator

@jart jart commented Mar 29, 2023

This is a breaking change that's going to give us three benefits:

 1. Your inference commands should load 100x faster
 2. You may be able to safely load models 2x larger
 3. You can run many concurrent inference processes

This was accomplished by changing the file format so we can mmap()
weights directly into memory without having to read() or copy them
thereby ensuring the kernel can make its file cache pages directly
accessible to our inference processes; and secondly, that the file
cache pages are much less likely to get evicted (which would force
loads to hit disk) because they're no longer competing with memory
pages that were needlessly created by gigabytes of standard i/o.

The new file format supports single-file models like LLaMA 7b, and
it also supports multi-file models like LLaMA 13B. Our Python tool
now merges the foo.1, foo.2, etc. files back into a single file so
that the C++ code which maps it doesn't need to reshape data every
time. That's made llama.cpp so much simpler. Much of its load code
has now been deleted.

Furthermore, this change ensures that tensors are aligned properly
on a 32-byte boundary. That opens the door to seeing if we can get
additional performance gains on some microprocessors, by using ops
that require memory alignment.

Lastly note that both POSIX and the Windows platform are supported

The issue this PR solves is #91

This PR was written in collaboration with @slaren. This PR is also
rebased on
PR #586 so please do not squash merge! Use either merge or rebase.

Sorry, something went wrong.

 37 Sumanai, lofcz, sevenreasons, nicknitewolf, z11h, Piezoid,
AMT-dev7, chris-brace, Belluxx, alan-w-255, and 27 more reacted with
thumbs up emoji  31 lin72h, sindresorhus, lofcz, FNsi, mateuszmlc,
oKatanaaa, z11h, chris-brace, Belluxx, slaff, and 21 more reacted
with hooray emoji [?] 52 blackle, lin72h, Technetium1, FNsi,
mateuszmlc, lofcz, mqy, jwijffels, z11h, chris-brace, and 42 more
reacted with heart emoji  51 Loufe, lin72h, Technetium1, FNsi,
mateuszmlc, lofcz, marvinborner, thomasantony, z11h, ryseek, and 41
more reacted with rocket emoji
All reactions

  *  37 reactions
  *  31 reactions
  * [?] 52 reactions
  *  51 reactions

slaren added 6 commits March 29, 2023 16:36
@slaren @jart
Add mmap support for model files
2a6cef6
@slaren @jart
Fix ggml_init_params in quantize
a1e0f17
@slaren @jart
Make mmap_file static
4ae12d0
@slaren @jart
Unmap the file in llama_free
4daaa5e
@slaren @jart
Always initialize mm_addr and mm_length in llama_model
812cfa1
@slaren @jart
Initial windows support (untested)
80c2178
@jart jart added performance Speed related topics breaking change 
Changes that break ABIs, APIs, file formats, or other forms of
backwards compatibility. labels Mar 29, 2023
@jart jart mentioned this pull request Mar 30, 2023
Add support for memory mapping models #586
Closed
4 tasks
@luminalle
Copy link

luminalle commented Mar 30, 2023

Should the other converters also be rewritten to handle this new
format?

All reactions

Sorry, something went wrong.

@jart jart force-pushed the loader branch from 69debdf to b806987 
Compare March 30, 2023 00:51
@jart
Copy link
Collaborator Author

jart commented Mar 30, 2023

Yes indeed. I just fixed the quantize program. Now I'm hunting down
all the tests.

 1 z11h reacted with thumbs up emoji
All reactions

  *  1 reaction

Sorry, something went wrong.

@jart jart force-pushed the loader branch from b806987 to a3307d2 
Compare March 30, 2023 01:14
@jart
Copy link
Collaborator Author

jart commented Mar 30, 2023

All tests look green except for a CMake test. For example: https://
github.com/ggerganov/llama.cpp/actions/runs/4559537462/jobs/
8043597142?pr=613 I'm stumped on this error. I can't figure out where
the file models/ggml-vocab.bin comes from. Does anyone know? Could it
be a stale cache?

All reactions

Sorry, something went wrong.

@FNsi
Copy link

FNsi commented Mar 30, 2023 *
edited

    All tests look green except for a CMake test. For example: https:
    //github.com/ggerganov/llama.cpp/actions/runs/4559537462/jobs/
    8043597142?pr=613 I'm stumped on this error. I can't figure out
    where the file models/ggml-vocab.bin comes from. Does anyone
    know? Could it be a stale cache?

#355 mentioned "Added ./models/ggml-vocab.bin containing just LLaMA
vocab data (used for tests)"

 1 jart reacted with thumbs up emoji
All reactions

  *  1 reaction

Sorry, something went wrong.

bakkot
bakkot reviewed Mar 30, 2023
View reviewed changes
llama.h

  @@ -20,7 +20,7 @@
  #endif
  
  #define LLAMA_FILE_VERSION 1
  #define LLAMA_FILE_MAGIC 0x67676d66 // 'ggmf' in hex

Copy link

@bakkot bakkot Mar 30, 2023 *
edited

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.
Learn more.

[Choose a reason] Hide comment
Nit: why change the magic rather than the version? I assumed the plan
was to keep the magic constant forever. If you bump the version
instead, old executables will recognize new model files and give a
more useful error message. And it's nice to distinguish between "this
is definitely a model file for this project, but it's the wrong
version" vs "this is some random junk we don't know anything about".

(This PR is a very neat bit of engineering; please don't let my
nitpick distract from that.)

Sorry, something went wrong.

 8 sw, z11h, chris-brace, Green-Sky, slaren, IsGarrido, ghetzel, and
cyrillkuettel reacted with thumbs up emoji
All reactions

  *  8 reactions

Copy link
Collaborator

@Green-Sky Green-Sky Mar 30, 2023

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.
Learn more.

[Choose a reason] Hide comment
not a nitpick but a real change request :)

Sorry, something went wrong.

All reactions

Copy link
Collaborator

@Green-Sky Green-Sky Mar 30, 2023

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.
Learn more.

[Choose a reason] Hide comment
(nvm)

Sorry, something went wrong.

All reactions

@ggerganov
Copy link
Owner

ggerganov commented Mar 30, 2023 *
edited

@jart
The models/ggml-vocab.bin is generated by convert-pth-to-ggml.py by
providing an extra arg.

I had the expectation that mmap support would be much more intrusive,
but in fact it turned out to be very compact. llama.cpp is much more
simpler now. Good stuff

Regarding the version comment - yes, the plan was to bump versions
and no the magic. But I'm ok to change the magic to commemorate the
significance of this update. In fact, maybe we can make this a thing
and everybody who makes a significant contribution to the project
will get their initials appended to the version. What do you think? 

Let me play with this tonight before merging. We have to make special
care that all the other ggml model files floating around (Alpaca,
GPT4All, Chinese LLaMA, etc.) have a nice way to convert to this new
format and update the instructions in the README.

Also, maybe some synchronisation with #545 would be needed

 7 KASR, jart, sevenreasons, thomasantony, FNsi, HanClinto, and
trholding reacted with thumbs up emoji  2 Green-Sky and trholding
reacted with laugh emoji
All reactions

  *  7 reactions
  *  2 reactions

Sorry, something went wrong.

@jart
Make loading weights 10-100x faster ...
75d1e55

This is a breaking change that's going to give you three benefits:

1. Your inference commands should load 100x faster
2. You may be able to safely load models 2x larger
3. You can run many concurrent inference processes

This was accomplished by changing the file format so we can mmap()
weights directly into memory without having to read() or copy them
thereby ensuring the kernel can make its file cache pages directly
accessible to our inference processes; and secondly, that the file
cache pages are much less likely to get evicted (which would force
loads to hit disk) because they're no longer competing with memory
pages that were needlessly created by gigabytes of standard i/o.

The new file format supports single-file models like LLaMA 7b, and
it also supports multi-file models like LLaMA 13B. Our Python tool
now merges the foo.1, foo.2, etc. files back into a single file so
that the C++ code which maps it doesn't need to reshape data every
time. That's made llama.cpp so much simpler. Much of its load code
has now been deleted.

Furthermore, this change ensures that tensors are aligned properly
on a 32-byte boundary. That opens the door to seeing if we can get
additional performance gains on some microprocessors, by using ops
that require memory alignment.

Lastly note that both POSIX and the Windows platform are supported

Fixes ggerganov#91

@jart jart force-pushed the loader branch from a3307d2 to 75d1e55 
Compare March 30, 2023 07:09
@jart
Copy link
Collaborator Author

jart commented Mar 30, 2023

File updated. A lot more tests are green now. No idea what's up with
the sanitizer.

I thought so too! I too was pleasantly surprised by how well it
worked out. Glad we took a few weeks to think.

I'm honored to hear you say that. I can roundup the magic to 64 bytes
if you like, so there's room to hand out kudos without breaking
backwards compatibility in the future. Since my initials also act as
a stamp of approval, I'm going to be sending a follow-up change after
this, that'll harden the loading code, so that folks will be able to
trade model files for this format on HuggingFace with maximum safety
and confidence.

#545 is an ambitious unification. I've done my best to comment my
changes to make the merge less painful for the author. I've sought to
update the other scripts too, but don't know how to run them. One
thing you could also consider with this project is having a contrib/
folder, where folks can merge as much of their own stuff as they
want, under the expectation that the ones who need it are the ones
who maintain it.

 9 FNsi, oKatanaaa, sevenreasons, nicknitewolf, thomasantony, z11h,
szopeno, yacineMTB, and pabl-o-ce reacted with thumbs up emoji  2 
FNsi and whjms reacted with hooray emoji
All reactions

  *  9 reactions
  *  2 reactions

Sorry, something went wrong.

@jart
Ensure --mlock works properly with mmap() support
a45e843
mqy
mqy reviewed Mar 30, 2023
View reviewed changes
llama.cpp

  int fd = open(fname, O_RDONLY);
  if (fd == -1) return 0;
  int64_t length = lseek(fd, 0, SEEK_END);
  void *addr = mmap(NULL, length, PROT_READ, MAP_SHARED, fd, 0);

Copy link
Contributor

@mqy mqy Mar 30, 2023

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.
Learn more.

[Choose a reason] Hide comment
 1. Is it more safe to use mmap64 for 4GB+ files?
 2. It seems mmap, mmap64 and MapViewOfFile support mapping from
    given offset. Is it possible to map from header_len (as offset)?
    If we can do this, no need to align model file, right?

Sorry, something went wrong.

 1 FNsi reacted with eyes emoji
All reactions

  *  1 reaction

Copy link
Collaborator Author

@jart jart Mar 30, 2023

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.
Learn more.

[Choose a reason] Hide comment
 1. The right thing to do on 32-bit platforms is to have your build
    system define -D_FILE_OFFSET_BITS=64 which will cause your system
    header files to automatically #define mmap mmap64
 2. File offsets passed to mmap() need to be page size aligned, so I
    don't think so.

Sorry, something went wrong.

All reactions

Copy link

@pgoodman pgoodman Mar 31, 2023 *
edited

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.
Learn more.

[Choose a reason] Hide comment
@jart Is it possible to ensure the file size is a multiple of the
hugepage size (e.g. using ftruncate), to benefit from fewer TLB
lookups when the model data is accessed? (corresponding mmap hints or
other system-specific APIs, e.g. needed for macOS, might need to be
used)

Sorry, something went wrong.

All reactions

Copy link
Collaborator Author

@jart jart Mar 31, 2023

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.
Learn more.

[Choose a reason] Hide comment
It doesn't matter with mmap() if the file length isn't page size
aligned, even with smaller pages. You should be good to go if you
modify the mmap() code in llama.cpp by hand and actually manage to
get huge pages to work without nuking your machine :-)

Sorry, something went wrong.

All reactions

sw
sw reviewed Mar 30, 2023
View reviewed changes
convert-pth-to-ggml.py Outdated Show resolved Hide resolved
convert-pth-to-ggml.py Outdated Show resolved Hide resolved
llama.cpp Show resolved Hide resolved
jart added a commit to jart/llama.cpp that referenced this pull
request Mar 30, 2023
@jart
Introduce GGML migration tool for new file format ...
f013c39

If you deleted your old Meta LLaMA .pth files, then the
migrate-ggml-2023-03-30-pr613.py script will allow you to convert your
old ggml files into the new mmap()'able format.

See ggerganov#613

jart added a commit to jart/llama.cpp that referenced this pull
request Mar 30, 2023
@jart
Introduce GGML migration tool for new file format ...
c0f330f

If you deleted your old Meta LLaMA .pth files, then the
migrate-ggml-2023-03-30-pr613.py script will allow you to convert your
old ggml files into the new mmap()'able format.

See ggerganov#613

@jart jart force-pushed the loader branch from f013c39 to c0f330f 
Compare March 30, 2023 12:48
@jart
Copy link
Collaborator Author

jart commented Mar 30, 2023

@ggerganov This change now includes a migration tool named
migrate-ggml-2023-03-30-pr613.py. This will ensure that users of the
old GGML file format who've deleted the original .pth files, will be
able to convert their ggml+ggmf files to the new ggml+ggjt format.
Please take a look.

[?] 5 snxraven, napiquet, lin72h, blackle, and jakeisnt reacted with
heart emoji
All reactions

  * [?] 5 reactions

Sorry, something went wrong.

@x02Sylvie
Copy link

x02Sylvie commented Mar 30, 2023

Having issue migrating alpaca model ggml-alpaca-13b-q4.bin, python
script seems to think that model has two n_parts rather than one,
adding --n_parts argument to conversion script to manually specify
--n_parts 1 just like when running alpaca models on llama.cpp might
resolve the issue?
migrate

All reactions

Sorry, something went wrong.

@jart
Copy link
Collaborator Author

jart commented Mar 30, 2023

@x02Sylvie I don't have access to the Alpaca model. Could send a pull
request fixing that after this gets merged?

All reactions

Sorry, something went wrong.

@x02Sylvie
Copy link

x02Sylvie commented Mar 30, 2023 *
edited

I don't really know python, so I'd rather leave pull request to
someone smarter than me,

I did however manage to get alpaca 13b model converted by manually
setting n_parts to 1 in .py conversion script . I'm unsure if it's
proper place to set n_parts though

def get_n_parts(dim):
    mappings = {4096: 1, 5120: 2, 6656: 4, 8192: 8}

    n_parts = mappings.get(dim)

    if n_parts is None:
        print(f"Invalid dim: {dim}")
        sys.exit(1)
    print(f"n_parts = {n_parts}\n")
    return n_parts

to

def get_n_parts(dim):
    mappings = {4096: 1, 5120: 2, 6656: 4, 8192: 8}

    n_parts = 1

    if n_parts is None:
        print(f"Invalid dim: {dim}")
        sys.exit(1)
    print(f"n_parts = {n_parts}\n")
    return n_parts

Model does work however after conversion

All reactions

Sorry, something went wrong.

@jart
Copy link
Collaborator Author

jart commented Mar 30, 2023

Yes, that code which essentially guesses n_parts based off the
dimension sizes looks like a LLaMA kludge to me. @ggerganov would
need to weigh in before we change it. I was simply cargo culting
other parts of the codebase that do this.

 3 x02Sylvie, lin72h, and FNsi reacted with hooray emoji
All reactions

  *  3 reactions

Sorry, something went wrong.

@rabidcopy
Copy link
Contributor

rabidcopy commented Mar 30, 2023

So far the conversion script works with my current ggml models. Even
converted the gpt4all model with convert-gpt4all-to-ggml.py and then
converted that with migrate-ggml-2023-03-30-pr613.py and it works. Do
have to manually set n_parts in the script for my larger models that
are in 1 part. But still works nonetheless!

All reactions

Sorry, something went wrong.

jart added a commit to jart/llama.cpp that referenced this pull
request Mar 30, 2023
@jart
Introduce GGML migration tool for new file format ...
adaba69

If you deleted your old Meta LLaMA .pth files, then the
migrate-ggml-2023-03-30-pr613.py script will allow you to convert your
old ggml files into the new mmap()'able format.

See ggerganov#613

@jart jart force-pushed the loader branch from c0f330f to adaba69 
Compare March 30, 2023 15:27
@jart
Copy link
Collaborator Author

jart commented Mar 30, 2023

I've just pushed an update to this PR addressing the n_part issue.
We're now using this algorithm in the migration tool:

    # count number of multipart files by convention
    n_parts = 1
    while True:
        if os.path.exists("%s.%d" % (args.fin_path, n_parts)):
            n_parts += 1
        else:
            break

 2 rabidcopy and ggerganov reacted with thumbs up emoji
All reactions

  *  2 reactions

Sorry, something went wrong.

@slaren slaren mentioned this pull request Mar 30, 2023
SWAP info added to README #632
Closed
ggerganov
ggerganov approved these changes Mar 30, 2023
View reviewed changes
Copy link
Owner

@ggerganov ggerganov left a comment

 

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.
Learn more.

[Choose a reason] Hide comment
I think this is ready to merge - please go ahead



Sorry, something went wrong.

 6 jart, ghaerr, Miserlou, FNsi, jakeisnt, and helsont reacted with
hooray emoji
All reactions

  *  6 reactions

llama.cpp Outdated

  "\tsee https://github.com/ggerganov/llama.cpp/issues/91\n"
  "\tuse convert-pth-to-ggml.py to regenerate from original pth\n"
  "\tuse migrate-ggml-2023-03-30-pr613.py if you deleted originals\n"
  ,
  path);

Copy link
Owner

@ggerganov ggerganov Mar 30, 2023

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.
Learn more.

[Choose a reason] Hide comment
Let's print the magic we got and the magic we expect here to help
debug issues

Sorry, something went wrong.

All reactions

Copy link
Collaborator Author

@jart jart Mar 30, 2023

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.
Learn more.

[Choose a reason] Hide comment
As you wish. Updated.

Sorry, something went wrong.

All reactions

@CoderRC
Copy link

CoderRC commented Mar 30, 2023

Successfully compiled master branch and successfully compiled jart's
master branch, and successfully run ./main -m ./models/7B/
ggml-model-q4_0.bin -p "Building a website can be done in 10 simple
steps:" -t 8 -n 512.
In msys2 with mingw32 gcc compiler using:
make LDFLAGS='-D_POSIX_MAPPED_FILES -lmingw32_extended' CFLAGS=
'-D_POSIX_MAPPED_FILES -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra
-Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow
-Wstrict-prototypes -Wpointer-arith -Wno-unused-function -mfma -mf16c
-mavx -mavx2' CXXFLAGS='-D_POSIX_MAPPED_FILES -I. -I./examples -O3
-DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual
-Wno-unused-function'

If confused how exactly I did compile it, read #103 (comment)

All reactions

Sorry, something went wrong.

@CoderRC
Copy link

CoderRC commented Mar 30, 2023

This should be ready to merge due to my testing and compiling not
failing.

[?] 2 jart and snxraven reacted with heart emoji
All reactions

  * [?] 2 reactions

Sorry, something went wrong.

@jart
Introduce GGML migration tool for new file format ...
516474b

If you deleted your old Meta LLaMA .pth files, then the
migrate-ggml-2023-03-30-pr613.py script will allow you to convert your
old ggml files into the new mmap()'able format.

See ggerganov#613

@jart jart force-pushed the loader branch from adaba69 to 516474b 
Compare March 30, 2023 19:18
@jart
Copy link
Collaborator Author

jart commented Mar 30, 2023

All tests that can be green are green. Thread sanitizer failures
mentioned earlier look due to ARM NEON changes yesterday. Proceeding
with merge as advised. It's really exciting to commit my first
official change on this project. Thanks @ggerganov and everyone who
helped!

 2 ggerganov and FNsi reacted with thumbs up emoji  5 ghaerr, FNsi,
oKatanaaa, niltonvasques, and jwgrawe reacted with hooray emoji
All reactions

  *  2 reactions
  *  5 reactions

Sorry, something went wrong.

Hide details View details @jart jart merged commit ee0c40d into 
ggerganov:master Mar 30, 2023
18 of 22 checks passed
@ggerganov ggerganov mentioned this pull request Mar 30, 2023
parallelize the quantization process #581
Closed
@prusnak prusnak mentioned this pull request Mar 30, 2023
drop quantize.py (now that models are using a single file) #640
Merged
@gaceladri
Copy link

gaceladri commented Mar 31, 2023

Hello,

I can not load the gtp4all after converting it to the new ggml format
using your script:
python3 convert-gpt4all-to-ggml.py models/gpt4all/
gpt4all-lora-quantized.bin ./models/tokenizer.model

I have opened a new issue probably related to this: #655 (comment)

All reactions

Sorry, something went wrong.

@gaceladri
Copy link

gaceladri commented Mar 31, 2023

I could run it with the previous version https://github.com/ggerganov
/llama.cpp/tree/master-ed3c680

    Hello,

    I can not load the gtp4all after converting it to the new ggml
    format using your script: python3 convert-gpt4all-to-ggml.py
    models/gpt4all/gpt4all-lora-quantized.bin ./models/
    tokenizer.model

    I have opened a new issue probably related to this: #655
    (comment)

All reactions

Sorry, something went wrong.

@rabidcopy
Copy link
Contributor

rabidcopy commented Mar 31, 2023 *
edited

    Hello,

    I can not load the gtp4all after converting it to the new ggml
    format using your script: python3 convert-gpt4all-to-ggml.py
    models/gpt4all/gpt4all-lora-quantized.bin ./models/
    tokenizer.model

    I have opened a new issue probably related to this: #655
    (comment)

You need to also run the resulting file through
migrate-ggml-2023-03-30-pr613.py as well.

gpt4all weights -> convert-gpt4all-to-ggml.py -> converted gpt4all
weights -> migrate-ggml-2023-03-30-pr613.py -> gpt4all weights
compatible with the latest version of llama.cpp

 5 gaceladri, jart, jeffmcjunkin, mol4711, and dweekly reacted with
thumbs up emoji
All reactions

  *  5 reactions

Sorry, something went wrong.

@gaceladri
Copy link

gaceladri commented Mar 31, 2023

It worked. Thank you for your fast response!

 2 jart and pablodz reacted with laugh emoji
All reactions

  *  2 reactions

Sorry, something went wrong.

Nuked88 pushed a commit to Nuked88/llama.http that referenced this
pull request Mar 31, 2023
@jart @Nuked88
Introduce GGML migration tool for new file format ...
5c8b15d

If you deleted your old Meta LLaMA .pth files, then the
migrate-ggml-2023-03-30-pr613.py script will allow you to convert your
old ggml files into the new mmap()'able format.

See ggerganov#613

@edwios edwios mentioned this pull request Mar 31, 2023
Failed to load llama model ggerganov/whisper.cpp#702
Open
Sign up for free to join this conversation on GitHub. Already have an
account? Sign in to comment
Reviewers

@sw sw  sw left review comments

@pgoodman pgoodman  pgoodman left review comments

@mqy mqy  mqy left review comments

@bakkot bakkot  bakkot left review comments

@Green-Sky Green-Sky  Green-Sky left review comments

@ggerganov ggerganov  ggerganov approved these changes

Assignees
No one assigned
Labels
breaking change Changes that break ABIs, APIs, file formats, or other
forms of backwards compatibility. performance Speed related topics
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

None yet

14 participants
@jart @luminalle @FNsi @ggerganov @x02Sylvie @rabidcopy @CoderRC 
@gaceladri @sw @pgoodman @mqy @bakkot @Green-Sky @slaren
Add this suggestion to a batch that can be applied as a single
commit. This suggestion is invalid because no changes were made to
the code. Suggestions cannot be applied while the pull request is
closed. Suggestions cannot be applied while viewing a subset of
changes. Only one suggestion per line can be applied in a batch. Add
this suggestion to a batch that can be applied as a single commit. 
Applying suggestions on deleted lines is not supported. You must
change the existing code in this line in order to create a valid
suggestion. Outdated suggestions cannot be applied. This suggestion
has been applied or marked resolved. Suggestions cannot be applied
from pending reviews. Suggestions cannot be applied on multi-line
comments. Suggestions cannot be applied while the pull request is
queued to merge.

Footer

 (c) 2023 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.