[HN Gopher] Ollama now supports AMD graphics cards
___________________________________________________________________
Ollama now supports AMD graphics cards
Author : tosh
Score : 383 points
Date : 2024-03-15 17:47 UTC (5 hours ago)
(HTM) web link (ollama.com)
(TXT) w3m dump (ollama.com)
| sofixa wrote:
| This is great news. The more projects do this, the less of a moat
| CUDA is, and the less of a competitive advantage Nvidia has.
| anonymous-panda wrote:
| What does performance look like?
| sevagh wrote:
| Spoiler alert: not good enough to break CUDA's moat
| qeternity wrote:
| This is not CUDA's moat. That is on the R&D/training side.
|
| Inference side is partly about performance, but mostly
| about cost per token.
|
| And given that there has been a ton of standardization
| around LLaMA architectures, AMD/ROCm can target this much
| more easily, and still take a nice chunk of the inference
| market for non-SOTA models.
| bornfreddy wrote:
| Not sure why you're downvoted, but as far as I've heard AMD
| cards can't beat 4090 - yet.
|
| Still, I think AMD will catch or overtake NVidia in
| hardware soon, but software is a bigger problem. Hopefully
| the opensource strategy will pay off for them.
| arein3 wrote:
| Really hope so, maybe this time will catch and last
|
| Usually when corps open source stuff to get adoption,
| they stuff the adopters after they gain enough market
| share and the cycle repeats again
| nerdix wrote:
| A RTX 4090 is about twice the price of and 50%-ish faster
| than AMD's most expensive consumer card so I'm not sure
| anyone really expects it to ever surpass a 4090.
|
| A 7900 XTX beating a RTX 4080 at inference is probably a
| more realistic goal though I'm not sure how they compare
| right now.
| Zambyte wrote:
| I get 35tps on Mistral:7b-Instruct-Q6_K with my 6650 XT.
| ixaxaar wrote:
| Hey I did, and sorry for the self promo,
|
| Please check out https://github.com/geniusrise - tool for
| running llms and other stuff, behaves like docker compose,
| works with whatever is supported by underlying engines:
|
| Huggingface - MPS, cuda VLLM - cuda, ROCm llama.cpp,
| whisper.cpp - cuda, mps, rocm
|
| Also coming up integration with spark (TorchDistributor), kafka
| and airflow.
| Symmetry wrote:
| Given the price of top line NVidia cards, if they can be had at
| all, there's got to be a lot of effort going on behind the scenes
| to improve AMD support in various places.
| reilly3000 wrote:
| I'm curious as to how they pulled this off. OpenCL isn't that
| common in the wild relative to Cuda. Hopefully it can become
| robust and widespread soon enough. I personally succumbed to the
| pressure and spent a relative fortune on a 4090 but wish I had
| some choice in the matter.
| Apofis wrote:
| I'm surprised they didn't speak about the implementation at
| all. Anyone got more intel?
| refulgentis wrote:
| They're open source and based on llama.cpp so nothings
| secret.
|
| My money, looking at nothing, would be on one of the two
| Vulkan backends added in Jan/Feb.
|
| I continue to be flummoxed by a mostly-programmer-forum
| treating ollama like a magical new commercial entity breaking
| new ground.
|
| It's a CLI wrapper around llama.cpp so you don't have to
| figure out how to compile it
| washadjeffmad wrote:
| I tried it recently and couldn't figure out why it existed.
| It's just a very feature limited app that doesn't require
| you to know anything or be able to read a model card to "do
| AI".
|
| And that more or less answered it.
| dartos wrote:
| It's because most devs nowadays are new devs and probably
| aren't very familiar with native compilation.
|
| So compiling the correct version of llama.cpp for their
| hardware is confusing.
|
| Compound that with everyone's relative inexperience with
| configuring any given model and you have prime grounds
| for a simple tool to exist.
|
| That's what ollama and their Modelfiles accomplish.
| tracerbulletx wrote:
| It's just because it's convenient. I wrote a rich text
| editor front end for llama.cpp and I originally wrote a
| quick go web server with streaming using the go bindings,
| but now I just use ollama because it's just simpler and
| the workflow for pulling down models with their registry
| and packaging new ones in containers is simpler. Also
| most people who want to play around with local models
| aren't developers at all.
| mypalmike wrote:
| Eh, I've been building native code for decades and hit
| quite a few roadblocks trying to get llama.cpp building
| with cuda support on my Ubuntu box. Library version
| issues and such. Ended up down a rabbit hole related to
| codenames for the various Nvidia architectures... It's a
| project on hold for now.
|
| Weirdly, the Python bindings built without issue with
| pip.
| refulgentis wrote:
| Edited it out of my original comment because I didn't
| want to seem ranty/angry/like I have some personal
| vendatta, as opposed to just being extremely puzzled, but
| it legit took me months to realize it wasn't a GUI
| because of how it's discussed on HN, i.e. as key to
| democratizing, as a large, unique, entity, etc.
|
| Hadn't thought about it recently. After seeing it again
| here, and being gobsmacked by the # of genuine, earnest,
| comments assuming there's extensive independent
| development of large pieces going on in it, I'm going
| with:
|
| - "The puzzled feeling you have is simply because
| llama.cpp is a challenge on the best of days, you need to
| know a lot to get to fully accelerated on ye average
| MacBook. and technical users don't want a GUI for an LLM,
| they want a way to call an API, so that's why there isn't
| content extalling the virtues of GPT4All*. So TL;DR
| you're old and have been on computer too much :P"
|
| but I legit don't know and still can't figure it out.
|
| * picked them because they're the most recent example of
| a genuinely democratizing tool that goes far beyond
| llama.cpp and also makes large contributions back to
| llama.cpp, ex. GPT4All landed 1 of the 2 vulkan backends
| j33zusjuice wrote:
| Ahhhh, I see what you did there.
| harwoodr wrote:
| ROCm: https://github.com/ollama/ollama/commit/6c5ccb11f993ccc
| 88c47...
| skipants wrote:
| Another giveaway that it's ROCm is that it doesn't support
| the 5700 series...
|
| I'm really salty because I "upgraded" to a 5700XT from a
| Nvidia GTX 1070 and can't do AI on the GPU anymore, purely
| because the software is unsupported.
|
| But, as a dev, I suppose I should feel some empathy that
| there's probably some really difficult problem causing
| 5700XT to be unsupported by ROCm.
| JonChesterfield wrote:
| I wrote a bunch of openmp code on a 5700XT a couple of
| years ago, if you're building from source it'll probably
| run fine
| programmarchy wrote:
| Apple killed off OpenCL for their platforms when they created
| Metal which was disappointing. Sounds like ROCm will keep it
| alive but the fragmentation sucks. Gotta support CUDA, OpenCL,
| and Metal now to be cross-platform.
| jart wrote:
| What is OpenCL? AMD GPUs support CUDA. It's called HIP. You
| just need a bunch of #define statements like this:
| #ifndef __HIP__ #include <cuda_fp16.h>
| #include <cuda_runtime.h> #else #include
| <hip/hip_fp16.h> #include <hip/hip_runtime.h>
| #define cudaSuccess hipSuccess #define cudaStream_t
| hipStream_t #define cudaGetLastError hipGetLastError
| #endif
|
| Then your CUDA code works on AMD.
| jiggawatts wrote:
| Can you explain why nobody knows this trick, for some
| values of "nobody"?
| jart wrote:
| No idea. My best guess is their background is in graphics
| and games rather than machine learning. When CUDA is all
| you've ever known, you try just a little harder to find a
| way to keep using it elsewhere.
| wmf wrote:
| People know; it just hasn't been reliable.
| jart wrote:
| What's not reliable about it? On Linux hipcc is about as
| easy to use as gcc. On Windows it's a little janky
| because hipcc is a perl script and there's no perl
| interpreter I'll admit. I'm otherwise happy with it
| though. It'd be nice if they had a shell script installer
| like NVIDIA, so I could use an OS that isn't a 2 year old
| Ubuntu. I own 2 XTX cards but I'm actually switching back
| to NVIDIA on my main workstation for that reason alone.
| GPUs shouldn't be choosing winners in the OS world. The
| lack of a profiler is also a source of frustration. I
| think the smart thing to do is to develop on NVIDIA and
| then distribute to AMD. I hope things change though and I
| plan to continue doing everything I can do to support AMD
| since I badly want to see more balance in this space.
| wmf wrote:
| The compilation toolchain may be reliable but then you
| get kernel panics at runtime.
| jart wrote:
| I've heard geohot is upset about that. I haven't tortured
| any of my AMD cards enough to run into that issue yet. Do
| you know how to make it happen?
| moffkalast wrote:
| OpenCL is as dead as OpenGL and the inference implementations
| that exist are very unperformant. The only real options are
| CUDA, ROCm, Vulkan and CPU. And Vulkan is a proper pain too,
| takes forver to build compute shaders and has to do so for each
| model. It only makes sense on Intel Arc since there's nothing
| else there.
| zozbot234 wrote:
| SYCL is a fairly direct successor to the OpenCL model and is
| not quite dead, Intel seems to be betting on it more than
| others.
| mpreda wrote:
| ROCm includes OpenCL. And it's a very performant OpenCL
| implementation.
| taminka wrote:
| why though? except for apple, most vendors still actively
| support it and newer versions of OpenCL are released...
| karmakaze wrote:
| It would serve Nvidia right if their insistence on only running
| CUDA workloads on their hardware results in adoption of
| ROCm/OpenCL.
| KeplerBoy wrote:
| OpenCL is fine on Nvidia Hardware. Of course it's a second
| class citizen next to CUDA, but then again everything is a
| second class citizen on AMD hardware.
| aseipp wrote:
| You can use OpenCL just fine on Nvidia, but CUDA is just a
| superior compute programming model overall (both in features
| and design.) Pretty much every vendor offers something
| superior to OpenCL (HIP, OneAPI, etc), because it simply
| isn't very nice to use.
| karmakaze wrote:
| I suppose that's about right. The implementors are busy
| building on a path to profit and much less concerned about
| any sort-of lock-in or open standards--that comes much
| later in the cycle.
| shmerl wrote:
| May be Vulkan compute? But yeah, interesting how.
| deadalus wrote:
| I wish AMD did well in the Stable Diffusion front because AMD is
| never greedy on VRAM. The 4060Ti 16GB(minimum required for Stable
| Diffusion in 2024) starts at $450.
|
| AMD with ROCm is decent on Linux but pretty bad on Windows.
| choilive wrote:
| They bump up VRAM because they can't compete on raw compute.
| risho wrote:
| it doesn't matter how much compute you have if you don't have
| enough vram to run the model.
| Zambyte wrote:
| Exactly. My friend was telling me that I was making a
| mistake for getting a 7900 XTX to run language models, when
| the fact of the matter is the cheapest NVIDIA card with 24
| GB of VRAM is over 50% more expensive than the 7900 XTX.
| Running a high quality model at like 80 tps is way more
| important to me than running a way lower quality model at
| like 120 tps.
| wongarsu wrote:
| Or rather Nvidia is purposefully restricting VRAM to avoid
| gaming cards canibalizing their supremely profitable
| professional/server cards. AMD has no relevant server cards,
| so they have no reason to hold back on VRAM in consumer cards
| api wrote:
| They lag on software a lot more than the lag on silicon.
| Adverblessly wrote:
| I run A1111, ComfyUI and kohya-ss on an AMD (6900XT which has
| 16GB, the minimum required for Stable Diffusion in 2024 ;)),
| though on Linux. Is it a Windows specific Issue for you?
|
| Edit to add: Though apparently I still don't run ollama on AMD
| since it seems to disagree with my setup.
| simon83 wrote:
| Does anyone know how the AMD consumer GPU support on Linux has
| been implemented? Must use something else than ROCm I assume?
| Because ROCm only supports the 7900 XTX on Linux[1], while on
| Windows[2] support is from RX 6600 and upwards.
|
| [1]:
| https://rocblas.readthedocs.io/en/rocm-6.0.0/about/compatibi...
| [2]:
| https://rocblas.readthedocs.io/en/rocm-6.0.0/about/compatibi...
| Symmetry wrote:
| The newest release, 6.0.2, supports a number of other cards[1]
| and in general people are able to get a lot more cards to work
| than are officially supported. My 7900 XT worked on 6.0.0 for
| instance.
|
| [1]https://rocm.docs.amd.com/projects/install-on-
| linux/en/lates...
| bravetraveler wrote:
| I wouldn't read too much into support. It's more in terms of
| business/warranty/promises than what can actually do things
|
| I've had a 6900XT since launch and this is the first I'm
| hearing _" unsupported"_, having played with ROCM plenty over
| the years with Fedora Linux.
|
| I think, at most, it's taken a couple key environment variables
| HarHarVeryFunny wrote:
| How hard would it be for AMD just to document the levels of
| support of different cards the way NVIDIA does with their
| "compute capability" numbers ?!
|
| I'm not sure what is worse from AMD - the ML software support
| they provide for their cards, or the utterly crap
| documentation.
|
| How about one page documenting AMD's software stack compared
| to NVIDIA, one page documenting what ML frameworks support
| AMD cards, and another documenting "compute capability" type
| numbers to define the capabilities of different cards.
| londons_explore wrote:
| And _almost_ looks like they 're deliberately trying to not
| win any market share.
|
| It's as if the CEO is mates with NVidias CEO and has an
| unwritten agreement not to try too hard to topple the
| applecart...
|
| Oh wait... They're cousins!
| KronisLV wrote:
| Feels like all of this local LLM stuff is definitely pushing
| people in the direction of getting new hardware, since nothing
| like RX 570/580 or other older cards sees support.
|
| On one hand, the hardware nowadays is better and more powerful,
| but on the other, the initial version of CUDA came out in 2007
| and ROCm in 2016. You'd think that compute on GPUs wouldn't
| require the latest cards.
| hugozap wrote:
| I'm a happy user of Mistral on my Mac Air M1.
| isoprophlex wrote:
| How many gbs of RAM do you have in your M1 machine?
| hugozap wrote:
| 8gb
| isoprophlex wrote:
| Thanks, wow, amazing that you can already run a small
| model with so little ram. I need to buy a new laptop,
| guess more than 16 gb on a macbook isn't really needed
| SparkyMcUnicorn wrote:
| I would advise getting as much RAM as you possibly can.
| You can't upgrade later, so get as much as you can
| afford.
|
| Mine is 64GB, and my memory pressure goes into the red
| when running a quantized 70B model with a dozen Chrome
| tabs open.
| dartos wrote:
| Mistral is _very_ small when quantized.
|
| I'd still go with 16gbs
| TylerE wrote:
| I've run LLMs and some of the various image models on my
| M1 Studio 32GB without issue. Not as fast as my old 3080
| card, but considering the Mac all in has about a 5th the
| power draw, it's a _lot_ closer than I expected. I 'm not
| sure of the exact details but there is clearly some
| secret sauce that allows it to leverage the onboard NN
| hardware.
| evilduck wrote:
| I use several LLM models locally for chat UIs and IDE
| autocompletions like copilot (continue.dev).
|
| Between Teams, Chrome, VS Code, Outlook, and now LLMs my
| RAM usage sits around 20-22GB. 16GB will be a bottleneck
| to utility.
| jonplackett wrote:
| Is it easy to set this up?
| LoganDark wrote:
| Super easy. You can just head down to https://lmstudio.ai
| and pick up an app that lets you play around. It's not
| particularly advanced, but it works pretty well.
|
| It's mostly optimized for M-series silicon, but it also
| technically works on Windows, and isn't too difficult to
| trick into working on Linux either.
| glial wrote:
| Also, https://jan.ai is open source and worth trying out
| too.
| LoganDark wrote:
| Looks super cool, though it seems to be missing a good
| chunk of features, like the ability to change the prompt
| format. (Just installed it myself to check out all the
| options.) All the other missing stuff I can see though is
| stuff that LM Studio doesn't have either (such as a
| notebook mode). If it has a good chat mode then that's
| good enough for most!
| hugozap wrote:
| It is, it doesn't require any setup.
|
| After installation:
|
| > ollama run mistral:latest
| mysteria wrote:
| The Ollama backend llama.cpp definitely supports those older
| cards with the OpenCL and Vulkan backends, though performance
| is worse than ROCm or CUDA. In their Vulkan thread for instance
| I see people getting it working with Polaris and even Hawaii
| cards.
|
| https://github.com/ggerganov/llama.cpp/pull/2059
|
| Personally I just run it on CPU and several tokens/s is good
| enough for my purposes.
| bradley13 wrote:
| No new hardware needed. I was shocked that Mixtral runs well on
| my laptop, which has a so-so mobile GPU. Mixtral isn't hugely
| fast, but definitely good enough!
| superkuh wrote:
| llama.cpp added first class support for the RX 580 by
| implementing the vulkan backend. There are some issues on older
| kernel amdgpu code where a llm process VRAM is never reloaded
| if it gets kicked out to GTT (in 5.x kernels) but overall it's
| much faster than the clBLAST opencl implementation.
| jmorgan wrote:
| The compatibility matrix is quite complex for both AMD and
| NVIDIA graphics cards, and completely agree: there is a lot of
| work to do, but the hope is to gracefully fall back to older
| cards.. they still speed up inference quite a bit when they do
| work!
| Zambyte wrote:
| It's pretty funny to see this blog post, when I have been running
| Ollama on my AMD RX 6650 for weeks :D
|
| They have shipped ROCm containers since 0.1.27 (21 days ago).
| This blog post seems to be published along with the latest
| release, 0.1.29. I wonder what they actually changed in this
| release with regards to AMD support.
|
| Also: see this issue[0] that I made where I worked through
| running Ollama on an AMD card that they don't "officially"
| support yet. It's just a matter of setting an environment
| variable.
|
| [0] https://github.com/ollama/ollama/issues/2870
|
| Edit: I did notice one change, now the starcoder2[1] model works
| now. Before that would crash[2].
|
| [1] https://ollama.com/library/starcoder2
|
| [2] https://github.com/ollama/ollama/issues/2953
| throwaway5959 wrote:
| I mean, it was 21 days ago. What's the difference?
| yjftsjthsd-h wrote:
| 2 versions, apparently
| mchiang wrote:
| While the PRs went in slightly earlier, much of the time was
| spent on testing the integrations, and working with AMD
| directly to resolve issues.
|
| There were issues that we resolved prior to cutting the
| release, and many reported by the community as well.
| SushiHippie wrote:
| Can anyone (maybe ollama contributors) explain to me the
| relationship between llama.cpp and ollama?
|
| I always thought that ollama basically just was a wrapper (i.e.
| not much changes to inference code, and only built on top) around
| llama.cpp, but this makes it seem like it is more than that?
| Zambyte wrote:
| llama.cpp also supports AMD cards:
| https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#hi...
| renewiltord wrote:
| Wow, that's a huge feature. Thank you, guys. By the way, does
| anyone have a preferred case where they can put 4 AMD 7900XTX?
| There's a lot of motherboards and CPUs that support 128 lanes.
| It's the physical arrangement that I have trouble with.
| nottorp wrote:
| Used crypto mining parts not available any more?
| duskwuff wrote:
| Crypto mining didn't require significant bandwidth to the
| card. Mining-oriented motherboards typically only provisioned
| a single lane of PCIe to each card, and often used anemic
| host CPUs (like Celeron embedded parts).
| renewiltord wrote:
| Exactly. They'd use PCIe x1 to PCIe x16 risers with power
| adapters. These require high-bandwidth.
| nottorp wrote:
| Oh. Shows I wasn't into that.
|
| I did once work with a crypto case, but yes, it was one
| motherboard with a lot of wifis and we still didn't need
| the pcie lanes.
| kodarna wrote:
| I wonder why they aren't supporting RX 6750 XT and lower yet, are
| there architectural differences between these and RX 6800+?
| Zambyte wrote:
| They don't support it, but it works if you set an environment
| variable.
|
| https://github.com/ollama/ollama/issues/2870#issuecomment-19...
| slavik81 wrote:
| Those are Navi 22/23/24 GPUs while the RX 6800+ GPUs are Navi
| 21. They have different ISAs... however, the ISAs are identical
| in all but name.
|
| LLVM has recently introduced a unified ISA for all RDNA 2 GPUs
| (gfx10.3-generic), so the need for the environment variable
| workaround mentioned in the other comment should eventually
| disappear.
| latchkey wrote:
| If anyone wants to run some benchmarks on MI300x, ping me.
| freedomben wrote:
| I'm thrilled to see support for RX 6800/6800 XT / 6900 XT. I
| bought one of those for an outrageous amount during the post-
| covid shortage in hopes that I could use it for ML stuff, and
| thus far it hasn't been very successful, which is a shame because
| it's a beast of a card!
|
| Many thanks to ollama project and llama.cpp!
| mey wrote:
| Sad to see that the cut off is just after 6700 XT which is what
| is in my desktop. They indicate more devices are coming,
| hopefully that includes some of the more modern all in one
| chips with RDNA 2/3 from AMD as well.
| mey wrote:
| It appears that the cut off lines up with HIP SDK support
| from AMD, https://rocm.docs.amd.com/projects/install-on-
| windows/en/lat...
| rcarmo wrote:
| Curious to see if this will work on APUs. Have a 7840HS to test,
| will give it a go ASAP.
| airocker wrote:
| I heard "Nvidia for LLM today is similar to how Sun Microsystems
| was for the web"
| api wrote:
| ... for a very brief period of time until Linux servers and
| other options caught up.
| rahimnathwani wrote:
| Here is the commit that added ROCm support to llama.cpp back in
| August:
|
| https://github.com/ggerganov/llama.cpp/commit/6bbc598a632560...
| tarruda wrote:
| Does this work with integrated Radeon graphics? If so it might be
| worth getting one of those Ryzen mini PCs to use as a local LLM
| server.
| stebalien wrote:
| Not yet: https://github.com/ollama/ollama/issues/2637
| eclectic29 wrote:
| I'm not sure why Ollama garners so much attention. It has limited
| value - used for only experimenting with models + cannot support
| more than 1 model at a time. It's not meant for production
| deployments. Granted that it makes the experimentation process
| super easy but for something that relies on llama.cpp completely
| and whose main value proposition is easy model management I'm not
| sure it deserves the brouhaha people are giving it.
|
| Edit: what do you do after the initial experimentation? you need
| to deploy these models eventually to production. I'm not even
| talking about giving credit to llama.cpp, just mentioning that
| this product is gaining disproportionate attention and kudos
| compared to the value it delivers. Not denying that it's a great
| product.
| reustle wrote:
| Even for just running a model locally, Ollama provided a much
| simpler "one click install" earlier than most tools. That in
| itself is worth the support.
| Aka456 wrote:
| Koboldcpp is also very, very good, plug and play, nice UI,
| nice api, vulkan accelerated, have an AMD fork...
| crooked-v wrote:
| > Granted that it makes the experimentation process super easy
|
| That's the answer to your question. It may have less space than
| a Zune, but the average person doesn't care about technically
| superior alternatives that are much harder to use.
| davidhariri wrote:
| As it turns out, making it faster and better to manage things
| tends to get people's attention. I think it's well deserved.
| vikramkr wrote:
| It's nice for personal use which is what I think it was built
| for, has some nice frontend options too. The tooling around it
| is nice, and there are projects building in rag etc. I don't
| think people are intending to deploy days services through
| these tools
| andrewstuart wrote:
| Sounds like you are dismissing Ollama as a "toy".
|
| Refer:
|
| https://paulgraham.com/startupideas.html
| nerdix wrote:
| The answer to your question is:
|
| ollama run mixtral
|
| That's it. You're running a local LLM. I have no clue how to
| run llama.cpp
|
| I got Stable Diffusion running and I wish there was something
| like ollama for it. It was painful.
| cbhl wrote:
| In my opinion, pre-built binaries and an easy-to-use front-end
| are things that should exist and are valid as a separate
| project unto themselves (see, e.g., HandBrake vs ffmpeg).
|
| Using the name of the authors or the project you're building on
| can also read like an endorsement, which is not _necessarily_
| desirable for the original authors (it can lead to ollama bugs
| being reported against llama.cpp instead of to the ollama devs
| and other forms of support request toil). Consider the third
| clause of BSD 3-Clause for an example used in other projects
| (although llama.cpp is licensed under MIT).
| airocker wrote:
| More than one is easy: put it behind a load balancer. Put one
| ollama in one container or one port.
| Karrot_Kream wrote:
| I mean, it takes something difficult like an LLM and makes it
| easy to run. It's bound to get attention. If you've tried to
| get other models like BERT based models to run you'll realize
| just how big the usability gains are running ollama than
| anything else in the space.
|
| If the question you're asking is why so many folks are focused
| on experimentation instead of productionizing these models,
| then I see where you're coming from. There's the question of
| how much LLMs are actually being used in prod scenarios right
| now as opposed to just excited people chucking things at them;
| that maybe LLMs are more just fun playthings than tools for
| production. But in my experience as HN has gotten bigger, the
| number of posters talking about productionizing anything has
| really gone down. I suspect the userbase has become more
| broadly "interested in software" rather than "ships production
| facing code" and the enthusiasm in these comments reflects
| those interests.
|
| FWIW we use some LLMs in production and we do _not_ use ollama
| at all. Our prod story is very different than what folks are
| talking about here and I 'd love to have a thread that focuses
| more on language model prod deployments.
| jart wrote:
| Well you would be one of the few hundred people on the planet
| doing that. With local LLMs we're just trying to create a way
| for everyone else to use AI that doesn't require sharing all
| their data with them. First thing everyone asks for of course
| is how to turn the open source local llms into their own
| online service.
| evilduck wrote:
| I am 100% uninterested in your production deployment of rent
| seeking behavior for tools and models I can run myself. Ollama
| empowers me to do more of that easier. That's why it's popular.
| brucethemoose2 wrote:
| Interestingly, Ollama is not popular at all in the "localllama"
| community (which also extends to related discords and repos).
|
| And I think thats because of capabilities... Ollama is somewhat
| restrictive compared to other frontends. I have a littany of
| reasons I personally wouldn't run it over exui or koboldcpp,
| both for performance and output quality.
|
| This is a necessity of being stable and one-click though.
| elwebmaster wrote:
| You are not making any sense. I am running ollama and Open
| WebUI (which takes care of auth) in production.
| observationist wrote:
| There's a thing somewhat conspicuous in its absence - why isn't
| llama.cpp more directly credited and thanked for providing the
| base technology powering this tool?
|
| All the other cool "run local" software seems to have the
| appropriate level of credit. You can find llama.cpp references in
| the code, being set up in a kind of "as is" fashion such that it
| might be OK as far as MIT licensing goes, but it seems kind of
| petty to have no shout out or thank you anywhere in the
| repository or blog or ollama website.
|
| GPT4All - https://gpt4all.io/index.html
|
| LM Studio - https://lmstudio.ai/
|
| Both of these projects credit and attribute appropriately, Ollama
| seems to bend over backwards so they don't have to?
| jart wrote:
| ollama has made a lot of nice contributions of their own. It's
| a good look to give a hat tip to the great work llama.cpp is
| also doing, but they're strictly speaking not required to do
| that in their advertising any more than llama.cpp is required
| to give credit to Google Brain, and I think that's because
| llama.cpp has pulled off tricks in the execution that Brain
| never could have accomplished, just as ollama has had great
| success focusing on things that wouldn't make sense for
| llama.cpp. Besides everyone who wants to know what's up can
| read the source code, research papers, etc. then make their own
| judgements about who's who. It's all in the open.
| dheera wrote:
| > why isn't llama.cpp more directly credited
|
| Why don't people credit Bjarne Stroustrup every time a piece of
| .cpp code is released?
| observationist wrote:
| Honestly, they probably should keep a little Bjarne shrine on
| their desk and light a little votive candle every time they
| release to production.
| ronsor wrote:
| Realistically that's the only way to reduce the amount of
| bugs in your C++ code.
| jon_richards wrote:
| Do you want machine spirits? Because that's how you get
| machine spirits.
| jjice wrote:
| Just downloaded this and gave it a go. I have no experience with
| running any local models, but this just worked out of the box on
| my 7600 on Ubuntu 22. This is fantastic.
___________________________________________________________________
(page generated 2024-03-15 23:00 UTC)