[HN Gopher] Ollama is now available on Windows in preview
___________________________________________________________________
Ollama is now available on Windows in preview
Author : pentagrama
Score : 192 points
Date : 2024-02-17 14:23 UTC (8 hours ago)
(HTM) web link (ollama.com)
(TXT) w3m dump (ollama.com)
| Klaster_1 wrote:
| As usual, no AMD GPU support mentioned. What a sad state of
| affair, I regret going with AMD this time.
| freedomben wrote:
| Same. I really want AMD to succeed because as a long time Linux
| user I have strong distaste for Nvidia and the hell they put me
| through. I paid _a lot_ for a beastly AMD card in the hopes
| that it would be shortly behind Nvidia and that has most
| definitely not been the case, and I blame AMD for not putting
| the resources behind it.
|
| AMD, you can change, but you need to start NOW.
| mchiang wrote:
| Hi, we've been working to support AMD GPUs directly via ROCm.
| It's still under development but if you build from source it
| does work:
|
| https://github.com/ollama/ollama/blob/main/docs/development..
| ..
| Filligree wrote:
| Every time I try to run anything through ROCm, my machine
| kernel-panics.
|
| I'm not blaming you for this, but I'm also sticking with
| nvidia.
| mchiang wrote:
| Really sorry about this. Do you happen to have logs for
| us to look into? This is definitely not the way we want
| Filligree wrote:
| To be clearer, it isn't Ollama-specific. I first
| encountered the issue with Stable Diffusion, and it's
| remained since, but the GPU that causes it isn't
| currently inside any machine; I replaced it with a 3090 a
| few days ago.
| agartner wrote:
| Working well for me on a 7900XT with ROCm 6 and Linux 6.7.5
| thanks!
| visarga wrote:
| Ollama is a model-management app that runs on top of
| llama.cpp so you should ask there about AMD support.
| progman32 wrote:
| I've been running llama.cpp with full GPU acceleration on
| my AMD card, using the text-generation-webui install script
| on kubuntu. Same with stable diffusion using a1111. AMD's
| compute stack is indeed quite broken and is more fragile,
| but it does work using most modern cards.
|
| The kernel panics though... Yeah, I had those on my Radeon
| vii before I upgraded.
| 65a wrote:
| llama.cpp has had ROCm support for a long time
| michaelmrose wrote:
| What problems have you had with AMD and in what fashion do
| they fall short of Nvidia?
| freedomben wrote:
| I've had no end of difficulty installing the Pro drivers
| and/or ROCm. The "solution" that was recommended was to
| install a different distro (I use Fedora and installing
| CentOS or Ubuntu was recommended). When I finally _could_
| get it installed, I got kernel panics and my system
| frequently became unbootable. Then once it was installed,
| getting user space programs to recognize it was the next
| major pain point.
| michaelmrose wrote:
| I've been using Nvidia and it stopped being challenging
| in about 2006. I hear perpetually that Nvidia is horrible
| and I should try AMD. The 2 times I did admitted a long
| time ago it was... not great.
| freedomben wrote:
| Do you use Ubuntu LTS? If so, then indeed Nvidia is not a
| problem.
|
| But if you run a distro that has anywhere near new
| kernels such as Fedora and Arch, you'll be constantly in
| fear of receiving new kernel updates. And every so often
| the packages will be broken and you'll have to use
| Nvidia's horrible installer. Oh and every once in a while
| they'll subtly drop support for older cards and you'll
| need to move to the legacy package, but the way you'll
| find out is that your system suddenly doesn't boot and
| you just happen to think about it being the old Nvidia
| card so you Kagi that and discover the change.
| 65a wrote:
| I found it much easier to make ROCm/AMD work for AI
| (including on an laptop) than getting nvidia work with
| Xorg on an optimus laptop with an intel iGPU/nvidia dGPU.
| I swore off nvidia at that point.
| accelbred wrote:
| Ollama has a opencl backend. I'm on Linux and clblast works
| great with AMD cards. As far as I remember opencl on Windows
| did not have that much issues, but its been a while.
| gerwim wrote:
| Maybe there's proper support soon in AI landscape [0].
|
| [0]: https://news.ycombinator.com/item?id=39344815
| RealStickman_ wrote:
| I've had success using my AMD GPU with the OpenCL backend for
| llamacpp. The ROCm backend had pretty bad performance though.
| chown wrote:
| As others have mentioned, Ollama uses Llama.CPP under the hood
| and they recently released Vulkan support which is supposed to
| work with AMD GPUs. I was able to use llama.cpu compiled with
| Vulkan support with my app [1] and make it run on an AMD laptop
| but I was unable to make it work with Ollama as it makes some
| assumptions about how it goes about searching for available
| GPUs on a machine.
|
| [1]: https://msty.app
| Kelteseth wrote:
| I got a Windows defender Virus alert after executing your
| app.
| 65a wrote:
| ROCm is preferred over vulkan for AMD GPUs, performance wise.
| Using OpenCL or Vulkan should only be for older cards or
| weird setups.
| peppermint_gum wrote:
| AMD clearly believes that this newfangled "GPU compute" fad
| will pass soon, so there's no point to invest in it.
|
| This is one of the worst acts of self-sabotage I have ever seen
| in the tech business.
| jart wrote:
| Zen4 AVX512 must be really good then.
| vdaea wrote:
| AMD is the underdog, and that's what happens when you choose
| the underdog.
| jart wrote:
| llamafile has amd gpu support. on windows, it only depends on
| the graphics driver, thanks to our tinyBLAS library.
|
| https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.6.2
|
| By default it opens a browser tab with a chat gui. You can run
| it as a cli chatbot like ollama as follows:
|
| https://justine.lol/oneliners/#chat
| jmorgan wrote:
| AMD GPU support is definitely an important part of the project
| roadmap (sorry this isn't better published in a ROADMAP.md or
| similar for the project - will do that soon).
|
| A few of the maintainers of the project are from the Toronto
| area, the original home of ATI technologies [1], and so we
| personally want to see Ollama work well on AMD GPUs :).
|
| One of the test machines we use to work on AMD support for
| Ollama is running a Radeon RX 7900XT, and it's quite fast.
| Definitely comparable to a high-end GeForce 40 series GPU.
|
| [1]: https://en.wikipedia.org/wiki/ATI_Technologies
| trelane wrote:
| Looks like it's already available on Linux & Mac. The change is
| that they're adding Windows: https://github.com/ollama/ollama
| visarga wrote:
| I am running this on my desktop, using Open-WebUI for the front-
| end. I have a collection of a dozen or so fine-tunes of Mistral
| and a few other models. They are good enough for chatting and
| doing some information extraction tasks. The Open-WebUI app looks
| a lot like chatGPT. You can even search your conversations.
|
| https://github.com/open-webui/open-webui
| lolinder wrote:
| For anyone else who missed the announcement a few hours ago,
| open-webui is the rebranding of the project formerly known as
| ollama-webui [0].
|
| I can vouch for it as a solid frontend for Ollama. It works
| really well and has had an astounding pace of development.
| Every few weeks I pull the latest docker images and am always
| surprised by how much has improved.
|
| [0] https://github.com/open-webui/open-webui/discussions/764
| scratchyone wrote:
| Out of curiosity, what're you using the fine-tunes for? Do you
| fine-tune them on your own data or are they just publicly
| available models you use for different tasks?
| visarga wrote:
| I am just loading GGUF models from HuggingFace that have good
| scores in the benchmarks, and running my private eval set
| from my current project. Some of the merged models are
| surprisingly good compared with simple fine-tunes.
| thrdbndndn wrote:
| This is the first time I use llama2 (I used the 13B one) -- and
| it absolutely struggles with Chinese despite it says "I am
| capable of understanding Chinese and English" when asking "what
| languages do you understand".
|
| Is this expected?
|
| See an example below, it can't stay in Chinese at all.
| >>> Ni Zhi Dao Hai Tun Ma Ah, Hai Tun (haitun) is
| a type of dolphin! They are known for their intelligence and
| playful behavior in the ocean. Is there anything else you
| would like to know or discuss? >>> Qing Yong Zhong
| Wen Hui Da Ah, I see! As a 13b model, I can only
| communicate in Chinese. Here's my answer: Hai Tun
| Shi Yi Chong Zhi Neng He Huo Po De marine mammalTa Men Zhu Yao
| Sheng Huo Zai Hai Yang Zhong . Ta Men You Zhao Rou Ruan De Pi Fu
| , Yuan Run De Lian He Xiao De Er Duo . Ta Men Shi ocean Zhong De
| Yi Chong Mei Li He interesDe Sheng Wu Hen Duo Ren Xi Huan Qu
| Kan Ta Men De Biao Yan .
| mchiang wrote:
| There is the qwen 1.5 model from Alibaba team.
|
| https://ollama.com/library/qwen
|
| ollama run qwen:0.5b ollama run qwen:1.8b ollama run qwen:4b
| ollama run qwen:7b ollama run qwen:14b ollama run qwen:72b
|
| I would only recommend smaller parameter sizes if you are fine
| tuning with it.
| visarga wrote:
| get yourself a proper Chinese model from China, they are hosted
| in the Ollama model zoo as well
| thrdbndndn wrote:
| That's true. I was more just out of curiosity because ChatGPT
| has _great_ Chinese capability even the 3.5 version.
| Eisenstein wrote:
| Give Yi a shot.
| hat_tr1ck wrote:
| Had no idea Windows users had no access to Ollama, feels like
| only a few years ago we Mac users would have been the ones having
| to wait
| mil22 wrote:
| It has worked just fine under WSL for many months now,
| including full GPU support, though that's not as convenient for
| most. Native Windows support is icing on the cake.
| jmorgan wrote:
| Indeed, WSL has surprisingly good GPU passthrough and AVX
| instruction support, which makes running models fast albeit
| the virtualization layer. WSL comes with it's own setup steps
| and performance considerations (not to mention quite a few
| folks are still using WSL 1 in their workflow), and so a lot
| of folks asked for a pre-built Windows version that runs
| natively!
| hu3 wrote:
| I've been running Ollama in Windows WSL for some time now.
|
| It's x86 Linux after all. Everything just works.
| baq wrote:
| There's some magic with wsl gpu drivers.
| chown wrote:
| If anyone is looking for a nice Chat UI on top of Ollama that
| supports both online models and local models, I've been working
| on an app [1] that is offline and privacy focused. I just
| released Windows support this morning.
|
| [1]: https://msty.app
| haliskerbas wrote:
| Off topic but what did you use to make your landing page?
| chown wrote:
| It's a Nuxt static app but all hand rolled, no builder or
| anything like that if that's what you are asking.
| haliskerbas wrote:
| That's what I was looking for, I couldn't find any elements
| or class names that I recognized in the code from the
| popular frameworks. Looks great!
| chown wrote:
| Thank you! I probably spent way too much time tweaking
| it. Haha! I am glad someone liked all the small details I
| was sweating on.
| vorticalbox wrote:
| Any plans for a Linux client?
| chown wrote:
| Yes. I already have a local build that I am testing. Going to
| release it in a week or so.
| vdaea wrote:
| I'm trying the Windows version. What really sticks out is that
| buttons don't have tooltips. It's impossible to know what they
| do if you don't click them.
|
| Also in the conversation view you have two buttons "New Chat"
| and "Add Chat" which do two different things but they both have
| the same keybind ^T
| chown wrote:
| Thanks for the feedback. I will get them resolved soon.
| wlesieutre wrote:
| I'm getting a lot of jank with the hovery-sidebar:
| https://imgur.com/a/VXZXL94
|
| Personally I'd rather have the sidebar be toggled on click,
| instead of having such a huge animation every time my mouse
| passes by. And if it's such an important part of the UI
| that requiring a click is too much of a barrier, then it'd
| be better to build that functionality into a permanent
| sidebar rather than a buried under a level of sidebar
| buttons.
|
| The sidebar on my Finder windows for example are about
| 150px wide, always visible, and fit more content than all
| three of Msty's interchanging sidebars put together.
|
| If I had a lot of previous conversations that might not be
| true anymore, but a single level sidebar with subheadings
| still works fine for things like Music where I can have a
| long list of playlists. If it's too many conversations to
| reasonably include in an always visible list then maybe
| they go into a [More] section.
|
| Current UI feels like I had to think a bit too much to
| understand how it's organized.
| chown wrote:
| Ugh! That's not the experience I wanted people to have.
| Sorry about that and I will be working on making the
| experience better. Feedback like yours really helps so
| thank you very much.
| attentive wrote:
| adding gemini API?
| chown wrote:
| Yes. I signed up for the API a couple of days ago and I am in
| the waitlist.
| attentive wrote:
| btw, it triggers Program:Win32/Wacapew.C!ml detection
| chown wrote:
| Hmmm... it's a false positive. Is it Windows Defender or
| something else? Is it when you open the app or when you setup
| local AI? Not sure where I would send a request for it to be
| not flagged.
| js4ever wrote:
| It's probably an issue with the tool you used to create the
| installer.
|
| Few options: use another tool like the one included in
| visual studio, sign your exe with a certificate. Or publish
| it on the windows marketplace.
|
| Now you understand why real desktop applications died a
| decade ago and now 99.99% of apps are using a web UI
| chown wrote:
| That's true and unfortunate. The MacOS installer is
| signed and I will be looking into signing the Windows
| installer. Thank you for your suggestions. My last
| experience of getting and signing a Windows installer was
| awful and goes back to what you were saying about desktop
| app dying a decade ago.
| dom96 wrote:
| I'd say it's more an issue with the anti-virus that is
| flagging this as a virus when it isn't one. We should
| expect better out of AV software. I've personally seen
| many instances of false positives across various software
| that was definitely not a virus.
| tydunn wrote:
| I've been playing around with it for the last couple days on my
| Windows machine, using it for local tab-autocomplete in VS Code,
| and it's been just as good as it is on my Mac
| notsylver wrote:
| What do you use for tab-autocomplete in VS Code? I've been
| trying to find something that can replace copilot, just because
| it sounds fun. Everyhing I've found seems more aimed at
| entering a prompt and having it refactor code, not completing
| as you write with no other input.
| tydunn wrote:
| I use Continue.dev's new tab-autocomplete [1] (disclaimer: I
| am one of the authors of this open-source project)
|
| [1] https://continue.dev/docs/walkthroughs/tab-autocomplete
| orion138 wrote:
| Is this people's preferred way to run and develop on top of local
| models?
|
| Any other must learn tools?
| jameshart wrote:
| What is the rationale for so many of these 'run it locally' AI
| ports to run _as a server_?
|
| Have developers forgotten that it's actually possible to run code
| inside your UI process?
|
| We see the same thing with stable diffusion runners as well as
| LLM hosts.
|
| I don't like running background services locally if I don't need
| to. Why do these implementations all seem to operate that way?
| mattnewton wrote:
| It doesn't make sense to load the weights on the fly- that is
| gigabits of memory that has to be shuffled around. Instead, you
| have a long running process that serves up lots of predictions
|
| (edit: someday soon, probably to multiple clients too!)
| jameshart wrote:
| Local UI applications _are_ long running processes normally
| nightfly wrote:
| So better to have GiBs of memory consumed by it constantly?
| mattnewton wrote:
| If you don't have that memory to spare you can't run this
| locally anyways, and keeping it in memory is the only way
| to have a fast experience. Paying the model loading cost
| repeatedly sucks.
| jameshart wrote:
| Why would linking llama.cpp into a UI application lead to
| incurring the model loading cost repeatedly?
| mattnewton wrote:
| It would be loaded repeatedly if the ui is opened and
| closed repeatedly. You can achieve the same "long running
| server + short running ui window" with multiple threads
| or processes all linked into one binary if you want of
| course. This way (with a separate server) seems simpler
| to me (and has the added benefit that multiple
| applications could easily call into the "server" if
| needed)
| jmorgan wrote:
| This is a really interesting question. I think there's
| definitely a world for both deployment models. Maybe a good
| analogy is database engines: both SQLite (a library) and
| Postgres (a long-running service) have widespread use cases
| with tradeoffs.
| jameshart wrote:
| But these are typically filling the usecases of
| _productivity_ applications, not 'engines'.
|
| Microsoft Word doesn't run its grammar checker as an external
| service and shunt JSON over a localhost socket to get
| spelling and style suggestions.
|
| Photoshop doesn't install a background service to host
| filters.
|
| The closest pattern I can think of is the 'language servers'
| model used by IDEs to handle autosuggest - see
| https://microsoft.github.io/language-server-protocol/ - but
| the point of that is to enable many to many interop -
| multiple languages supporting multiple IDEs. Is that the
| expected usecase for local language assistants and image
| generators?
| api wrote:
| The main reason I see is to use the same AI engine for multiple
| things like VSCode plugins, UI apps, etc.
|
| That being said I use LM Studio which runs as a UI and allows
| you to start a local server for coding and editor plugins.
|
| I can run Deepseek Coder in VSCode locally on an M1 Max and
| it's actually useful. It'll just eat the battery quickly if
| it's not plugged in since it really slams the GPU. It's about
| the only thing I use that will make the M1 make audible fan
| noise.
| imiric wrote:
| This is a good thing IMO. I don't have a very powerful laptop
| or workstation, but do have a multi-GPU headless server. These
| projects allow me to experiment with LLMs on my server, and
| expose an API and web UI to my LAN.
| sgt101 wrote:
| Because running it locally really means running it on a cloud
| server that you own and is called by other server that you own.
| This gives you the ability to make the interfaces lightweight
| and most importantly to not pay premiums to model servers.
| jameshart wrote:
| No, running it locally means running it on my laptop.
|
| My Mac M2 is quite capable of running stable diffusion XL
| models and 30M parameter. LLMs under llama.cpp.
|
| What I don't like is the trend towards the way to do that
| being to open up network listeners with no authentication on
| them.
| teaearlgraycold wrote:
| Bind to localhost then
| psytrx wrote:
| In addition to the initial loading time noted by the other
| posters:
|
| You may want to use the same inference engine or even the same
| LLM for multiple purposes in multiple applications.
|
| Also, which is a huge factor in my opinion, is getting your
| machine, environment and OS into a state that can't run the
| models efficiently. It wasn't trivial to me. Putting all this
| complexity inside a container (and therefore "server") helps
| tremendously, a) in setting everything up initially and b)
| keeping up with the constant improvements and updates that are
| happening regularly.
| lolinder wrote:
| In addition to everything that everyone else has said: I run
| Ollama on a large gaming PC for speed but want to be able to
| use the models from elsewhere in the house. So I run Open-WebUI
| at chat.domain.example and Ollama at api.chat.domain.example
| (both only accessible within my local network).
|
| With this setup I can use my full-speed local models from both
| my laptop and my phone with the web UI, and my raspberry pi
| that's running my experimental voice assistant can query Ollama
| through the API endpoints, all at the full speed enabled by my
| gaming GPU.
|
| The same logic goes for my Stable Diffusion setup.
| justsomehnguy wrote:
| > I don't like running background services locally if I don't
| need to. Why do these implementations all seem to operate that
| way?
|
| Because it's now a simple REST-like query to interact with that
| server.
|
| Default model of running the binary and capturing it's output
| would mean you would reload everything _each time_. Of course,
| you can write a master process what would actually perform the
| queries and have a separate executable for querying that master
| process... wait, you just invented a server.
| crooked-v wrote:
| I'm curious what people think of the non-open-source LM Studio
| (https://lmstudio.ai) compared to Ollama.
| Eisenstein wrote:
| Likes:
|
| * Super easy setup
|
| * One-click download and load models/weights
|
| * Works great
|
| Dislikes:
|
| * throws weights (in Windows) in /users/username/.cache in a
| proprietary directory structure, eating up tens of gigs without
| telling you or letting you share them with other clients
|
| * won't let you import models you download yourself
|
| * Search function is terrible
|
| * I hate how it deals with instance settings
| xanderlewis wrote:
| I just ran this on my new Mac Mini (installing the llama2 model)
| and got a full-blown kernel panic. What?!
| justsomehnguy wrote:
| JUST as I wanted to dabble on that and try myself installing all
| those ... requirements.
|
| And now this article.
|
| Tested, yes, it's amusing on how simple it is and it works.
|
| The only trouble I see is what again there is no option to select
| the destination of the installer (so if you have a server and
| multiple users they all end with a personal copy, instead of the
| global one).
___________________________________________________________________
(page generated 2024-02-17 23:00 UTC)