[HN Gopher] A brief history of LLaMA models
___________________________________________________________________
A brief history of LLaMA models
Author : andrewon
Score : 68 points
Date : 2023-04-28 02:26 UTC (1 days ago)
(HTM) web link (agi-sphere.com)
(TXT) w3m dump (agi-sphere.com)
| FloatArtifact wrote:
| There needs to be a slight dedicated to tracking all these models
| with regular updates.
| vessenes wrote:
| Most places that recommend llama.cpp for mac fail to mention
| https://github.com/jankais3r/LLaMA_MPS, which runs unquantized 7b
| and 13b models on the M1/M2 GPU directly. It's slightly slower,
| (not a lot), and significantly lower energy usage. To me the win
| not having to quantize while not melting a hole in my lap is
| huge; I wish more people knew about it.
| brucethemoose2 wrote:
| There is also CodyCapybara (7B finetuned on code competitions),
| the "uncensored" Vicuna, OpenAssistant 13B (which is said to be
| very good), various non English tunes, medalpaca... the release
| pace maddening.
| acapybara wrote:
| And let's not forget about Alpacino (offensive/unfiltered
| model).
| simonw wrote:
| I'm running Vicuna (a LLaMA variant) on my iPhone right now.
| https://twitter.com/simonw/status/1652358994214928384
|
| The same team that built that iPhone app - MLC - also got Vicuna
| running directly in a web browser using Web GPU:
| https://simonwillison.net/2023/Apr/16/web-llm/
| newswasboring wrote:
| With all these new AI models, both stable diffusion and llama
| specially, I'm considering switching to iPhone. I don't think I
| fully understand why iPhones and Macs are getting so many
| implementations but it seems like it's hardware based.
| simonw wrote:
| My understanding is that part of it is that Apple Silicon
| shares all available RAM between CPU and GPU.
|
| I'm not sure how many of these models are actively taking
| advantage of that architecture yet though.
| int_19h wrote:
| The GPU isn't actually used by llama.cpp. What makes it
| that much faster is that the workload, either on CPU or on
| GPU, is very memory-intensive, so it benefits greatly from
| fast RAM. And Apple is using DDR5 running at very high
| clock speeds for this shared memory stuff.
|
| It's still noticeably slower than GPU, though.
| bkm wrote:
| Homogenized hardware I assume, this is why iOS had so many
| photography Apps too.
| sp332 wrote:
| iPhones leaned in to "computational photography" a long time
| ago. Eventually they added custom hardware to handle all the
| matrix multiplies efficiently. They exposed some of it to
| apps with an API called CoreML. They've been adding more
| features like on-device photo tagging, voice recognition, VR
| stuff.
| sagarm wrote:
| Google was the leader on computational smartphone
| photography. They released their "night sight" mode before
| Samsung and Apple had anything competitive.
| doodlesdev wrote:
| > Our system thinks you might be a robot! We're really
| sorry about this, but it's getting harder and harder to tell the
| difference between humans and bots these days.
|
| Yeah, fuck you too. Come on, really, why put this in front of a
| _blog post_? Is it that hard to keep up with the bot requests
| when serving a static page?
| jiggawatts wrote:
| It keeps saying the phrase "model you can run locally", but
| despite days of trying, I failed to compile any of the GitHub
| repos associated with these models.
|
| None of the Python dependencies are strongly versioned, and
| "something" happened to the CUDA compatibility of one of them
| about a month ago. The original developers "got lucky" but now
| nobody else can compile this stuff.
|
| After years of using only C# and Rust, both of which have sane
| package managers with semantic versioning, lock files,
| reproducible builds, and even SHA checksums the Python package
| ecosystem looks ridiculously immature and even childish.
|
| Seriously, can anyone here build a docker image for running these
| models on CUDA? I think right now it's borderline impossible, but
| I'd be happy to be corrected...
| KETpXDDzR wrote:
| llama.cpp was easy to setup IMO
| rch wrote:
| Just use Nixpkgs already.
| throwaway6734 wrote:
| There's a rust deep learning library called dfdx that just
| setup llama: https://github.com/coreylowman/llama-dfdx
| Taek wrote:
| I have it running locally using the oobabooga webui, setup was
| moderately annoying but I'm definitely no python expert and I
| didn't have too much trouble.
| int_19h wrote:
| All of these things exist in the Python package ecosystem, and
| are generally much more common outside of ML/DS stuff. The
| latter... well, it reminds me of coding in early PHP days.
| Basically, anything goes so long as it works.
___________________________________________________________________
(page generated 2023-04-29 23:00 UTC)