[HN Gopher] Ollama is now available on Windows in preview
       ___________________________________________________________________
        
       Ollama is now available on Windows in preview
        
       Author : pentagrama
       Score  : 192 points
       Date   : 2024-02-17 14:23 UTC (8 hours ago)
        
 (HTM) web link (ollama.com)
 (TXT) w3m dump (ollama.com)
        
       | Klaster_1 wrote:
       | As usual, no AMD GPU support mentioned. What a sad state of
       | affair, I regret going with AMD this time.
        
         | freedomben wrote:
         | Same. I really want AMD to succeed because as a long time Linux
         | user I have strong distaste for Nvidia and the hell they put me
         | through. I paid _a lot_ for a beastly AMD card in the hopes
         | that it would be shortly behind Nvidia and that has most
         | definitely not been the case, and I blame AMD for not putting
         | the resources behind it.
         | 
         | AMD, you can change, but you need to start NOW.
        
           | mchiang wrote:
           | Hi, we've been working to support AMD GPUs directly via ROCm.
           | It's still under development but if you build from source it
           | does work:
           | 
           | https://github.com/ollama/ollama/blob/main/docs/development..
           | ..
        
             | Filligree wrote:
             | Every time I try to run anything through ROCm, my machine
             | kernel-panics.
             | 
             | I'm not blaming you for this, but I'm also sticking with
             | nvidia.
        
               | mchiang wrote:
               | Really sorry about this. Do you happen to have logs for
               | us to look into? This is definitely not the way we want
        
               | Filligree wrote:
               | To be clearer, it isn't Ollama-specific. I first
               | encountered the issue with Stable Diffusion, and it's
               | remained since, but the GPU that causes it isn't
               | currently inside any machine; I replaced it with a 3090 a
               | few days ago.
        
             | agartner wrote:
             | Working well for me on a 7900XT with ROCm 6 and Linux 6.7.5
             | thanks!
        
           | visarga wrote:
           | Ollama is a model-management app that runs on top of
           | llama.cpp so you should ask there about AMD support.
        
             | progman32 wrote:
             | I've been running llama.cpp with full GPU acceleration on
             | my AMD card, using the text-generation-webui install script
             | on kubuntu. Same with stable diffusion using a1111. AMD's
             | compute stack is indeed quite broken and is more fragile,
             | but it does work using most modern cards.
             | 
             | The kernel panics though... Yeah, I had those on my Radeon
             | vii before I upgraded.
        
             | 65a wrote:
             | llama.cpp has had ROCm support for a long time
        
           | michaelmrose wrote:
           | What problems have you had with AMD and in what fashion do
           | they fall short of Nvidia?
        
             | freedomben wrote:
             | I've had no end of difficulty installing the Pro drivers
             | and/or ROCm. The "solution" that was recommended was to
             | install a different distro (I use Fedora and installing
             | CentOS or Ubuntu was recommended). When I finally _could_
             | get it installed, I got kernel panics and my system
             | frequently became unbootable. Then once it was installed,
             | getting user space programs to recognize it was the next
             | major pain point.
        
               | michaelmrose wrote:
               | I've been using Nvidia and it stopped being challenging
               | in about 2006. I hear perpetually that Nvidia is horrible
               | and I should try AMD. The 2 times I did admitted a long
               | time ago it was... not great.
        
               | freedomben wrote:
               | Do you use Ubuntu LTS? If so, then indeed Nvidia is not a
               | problem.
               | 
               | But if you run a distro that has anywhere near new
               | kernels such as Fedora and Arch, you'll be constantly in
               | fear of receiving new kernel updates. And every so often
               | the packages will be broken and you'll have to use
               | Nvidia's horrible installer. Oh and every once in a while
               | they'll subtly drop support for older cards and you'll
               | need to move to the legacy package, but the way you'll
               | find out is that your system suddenly doesn't boot and
               | you just happen to think about it being the old Nvidia
               | card so you Kagi that and discover the change.
        
               | 65a wrote:
               | I found it much easier to make ROCm/AMD work for AI
               | (including on an laptop) than getting nvidia work with
               | Xorg on an optimus laptop with an intel iGPU/nvidia dGPU.
               | I swore off nvidia at that point.
        
         | accelbred wrote:
         | Ollama has a opencl backend. I'm on Linux and clblast works
         | great with AMD cards. As far as I remember opencl on Windows
         | did not have that much issues, but its been a while.
        
         | gerwim wrote:
         | Maybe there's proper support soon in AI landscape [0].
         | 
         | [0]: https://news.ycombinator.com/item?id=39344815
        
         | RealStickman_ wrote:
         | I've had success using my AMD GPU with the OpenCL backend for
         | llamacpp. The ROCm backend had pretty bad performance though.
        
         | chown wrote:
         | As others have mentioned, Ollama uses Llama.CPP under the hood
         | and they recently released Vulkan support which is supposed to
         | work with AMD GPUs. I was able to use llama.cpu compiled with
         | Vulkan support with my app [1] and make it run on an AMD laptop
         | but I was unable to make it work with Ollama as it makes some
         | assumptions about how it goes about searching for available
         | GPUs on a machine.
         | 
         | [1]: https://msty.app
        
           | Kelteseth wrote:
           | I got a Windows defender Virus alert after executing your
           | app.
        
           | 65a wrote:
           | ROCm is preferred over vulkan for AMD GPUs, performance wise.
           | Using OpenCL or Vulkan should only be for older cards or
           | weird setups.
        
         | peppermint_gum wrote:
         | AMD clearly believes that this newfangled "GPU compute" fad
         | will pass soon, so there's no point to invest in it.
         | 
         | This is one of the worst acts of self-sabotage I have ever seen
         | in the tech business.
        
           | jart wrote:
           | Zen4 AVX512 must be really good then.
        
         | vdaea wrote:
         | AMD is the underdog, and that's what happens when you choose
         | the underdog.
        
         | jart wrote:
         | llamafile has amd gpu support. on windows, it only depends on
         | the graphics driver, thanks to our tinyBLAS library.
         | 
         | https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.6.2
         | 
         | By default it opens a browser tab with a chat gui. You can run
         | it as a cli chatbot like ollama as follows:
         | 
         | https://justine.lol/oneliners/#chat
        
         | jmorgan wrote:
         | AMD GPU support is definitely an important part of the project
         | roadmap (sorry this isn't better published in a ROADMAP.md or
         | similar for the project - will do that soon).
         | 
         | A few of the maintainers of the project are from the Toronto
         | area, the original home of ATI technologies [1], and so we
         | personally want to see Ollama work well on AMD GPUs :).
         | 
         | One of the test machines we use to work on AMD support for
         | Ollama is running a Radeon RX 7900XT, and it's quite fast.
         | Definitely comparable to a high-end GeForce 40 series GPU.
         | 
         | [1]: https://en.wikipedia.org/wiki/ATI_Technologies
        
       | trelane wrote:
       | Looks like it's already available on Linux & Mac. The change is
       | that they're adding Windows: https://github.com/ollama/ollama
        
       | visarga wrote:
       | I am running this on my desktop, using Open-WebUI for the front-
       | end. I have a collection of a dozen or so fine-tunes of Mistral
       | and a few other models. They are good enough for chatting and
       | doing some information extraction tasks. The Open-WebUI app looks
       | a lot like chatGPT. You can even search your conversations.
       | 
       | https://github.com/open-webui/open-webui
        
         | lolinder wrote:
         | For anyone else who missed the announcement a few hours ago,
         | open-webui is the rebranding of the project formerly known as
         | ollama-webui [0].
         | 
         | I can vouch for it as a solid frontend for Ollama. It works
         | really well and has had an astounding pace of development.
         | Every few weeks I pull the latest docker images and am always
         | surprised by how much has improved.
         | 
         | [0] https://github.com/open-webui/open-webui/discussions/764
        
         | scratchyone wrote:
         | Out of curiosity, what're you using the fine-tunes for? Do you
         | fine-tune them on your own data or are they just publicly
         | available models you use for different tasks?
        
           | visarga wrote:
           | I am just loading GGUF models from HuggingFace that have good
           | scores in the benchmarks, and running my private eval set
           | from my current project. Some of the merged models are
           | surprisingly good compared with simple fine-tunes.
        
       | thrdbndndn wrote:
       | This is the first time I use llama2 (I used the 13B one) -- and
       | it absolutely struggles with Chinese despite it says "I am
       | capable of understanding Chinese and English" when asking "what
       | languages do you understand".
       | 
       | Is this expected?
       | 
       | See an example below, it can't stay in Chinese at all.
       | >>> Ni Zhi Dao Hai Tun Ma               Ah, Hai Tun  (haitun) is
       | a type of dolphin!  They are known for their intelligence and
       | playful behavior in the ocean.         Is there anything else you
       | would like to know or discuss?              >>> Qing Yong Zhong
       | Wen Hui Da               Ah, I see!  As a 13b model, I can only
       | communicate in Chinese. Here's my answer:              Hai Tun
       | Shi Yi Chong Zhi Neng He Huo Po De  marine mammalTa Men Zhu Yao
       | Sheng Huo Zai Hai Yang Zhong . Ta Men You Zhao Rou Ruan De Pi Fu
       | , Yuan Run De Lian He Xiao De Er Duo . Ta Men Shi  ocean Zhong De
       | Yi  Chong Mei Li He  interesDe Sheng Wu Hen Duo Ren Xi Huan Qu
       | Kan Ta Men De Biao Yan .
        
         | mchiang wrote:
         | There is the qwen 1.5 model from Alibaba team.
         | 
         | https://ollama.com/library/qwen
         | 
         | ollama run qwen:0.5b ollama run qwen:1.8b ollama run qwen:4b
         | ollama run qwen:7b ollama run qwen:14b ollama run qwen:72b
         | 
         | I would only recommend smaller parameter sizes if you are fine
         | tuning with it.
        
         | visarga wrote:
         | get yourself a proper Chinese model from China, they are hosted
         | in the Ollama model zoo as well
        
           | thrdbndndn wrote:
           | That's true. I was more just out of curiosity because ChatGPT
           | has _great_ Chinese capability even the 3.5 version.
        
         | Eisenstein wrote:
         | Give Yi a shot.
        
       | hat_tr1ck wrote:
       | Had no idea Windows users had no access to Ollama, feels like
       | only a few years ago we Mac users would have been the ones having
       | to wait
        
         | mil22 wrote:
         | It has worked just fine under WSL for many months now,
         | including full GPU support, though that's not as convenient for
         | most. Native Windows support is icing on the cake.
        
           | jmorgan wrote:
           | Indeed, WSL has surprisingly good GPU passthrough and AVX
           | instruction support, which makes running models fast albeit
           | the virtualization layer. WSL comes with it's own setup steps
           | and performance considerations (not to mention quite a few
           | folks are still using WSL 1 in their workflow), and so a lot
           | of folks asked for a pre-built Windows version that runs
           | natively!
        
         | hu3 wrote:
         | I've been running Ollama in Windows WSL for some time now.
         | 
         | It's x86 Linux after all. Everything just works.
        
           | baq wrote:
           | There's some magic with wsl gpu drivers.
        
       | chown wrote:
       | If anyone is looking for a nice Chat UI on top of Ollama that
       | supports both online models and local models, I've been working
       | on an app [1] that is offline and privacy focused. I just
       | released Windows support this morning.
       | 
       | [1]: https://msty.app
        
         | haliskerbas wrote:
         | Off topic but what did you use to make your landing page?
        
           | chown wrote:
           | It's a Nuxt static app but all hand rolled, no builder or
           | anything like that if that's what you are asking.
        
             | haliskerbas wrote:
             | That's what I was looking for, I couldn't find any elements
             | or class names that I recognized in the code from the
             | popular frameworks. Looks great!
        
               | chown wrote:
               | Thank you! I probably spent way too much time tweaking
               | it. Haha! I am glad someone liked all the small details I
               | was sweating on.
        
         | vorticalbox wrote:
         | Any plans for a Linux client?
        
           | chown wrote:
           | Yes. I already have a local build that I am testing. Going to
           | release it in a week or so.
        
         | vdaea wrote:
         | I'm trying the Windows version. What really sticks out is that
         | buttons don't have tooltips. It's impossible to know what they
         | do if you don't click them.
         | 
         | Also in the conversation view you have two buttons "New Chat"
         | and "Add Chat" which do two different things but they both have
         | the same keybind ^T
        
           | chown wrote:
           | Thanks for the feedback. I will get them resolved soon.
        
             | wlesieutre wrote:
             | I'm getting a lot of jank with the hovery-sidebar:
             | https://imgur.com/a/VXZXL94
             | 
             | Personally I'd rather have the sidebar be toggled on click,
             | instead of having such a huge animation every time my mouse
             | passes by. And if it's such an important part of the UI
             | that requiring a click is too much of a barrier, then it'd
             | be better to build that functionality into a permanent
             | sidebar rather than a buried under a level of sidebar
             | buttons.
             | 
             | The sidebar on my Finder windows for example are about
             | 150px wide, always visible, and fit more content than all
             | three of Msty's interchanging sidebars put together.
             | 
             | If I had a lot of previous conversations that might not be
             | true anymore, but a single level sidebar with subheadings
             | still works fine for things like Music where I can have a
             | long list of playlists. If it's too many conversations to
             | reasonably include in an always visible list then maybe
             | they go into a [More] section.
             | 
             | Current UI feels like I had to think a bit too much to
             | understand how it's organized.
        
               | chown wrote:
               | Ugh! That's not the experience I wanted people to have.
               | Sorry about that and I will be working on making the
               | experience better. Feedback like yours really helps so
               | thank you very much.
        
         | attentive wrote:
         | adding gemini API?
        
           | chown wrote:
           | Yes. I signed up for the API a couple of days ago and I am in
           | the waitlist.
        
         | attentive wrote:
         | btw, it triggers Program:Win32/Wacapew.C!ml detection
        
           | chown wrote:
           | Hmmm... it's a false positive. Is it Windows Defender or
           | something else? Is it when you open the app or when you setup
           | local AI? Not sure where I would send a request for it to be
           | not flagged.
        
             | js4ever wrote:
             | It's probably an issue with the tool you used to create the
             | installer.
             | 
             | Few options: use another tool like the one included in
             | visual studio, sign your exe with a certificate. Or publish
             | it on the windows marketplace.
             | 
             | Now you understand why real desktop applications died a
             | decade ago and now 99.99% of apps are using a web UI
        
               | chown wrote:
               | That's true and unfortunate. The MacOS installer is
               | signed and I will be looking into signing the Windows
               | installer. Thank you for your suggestions. My last
               | experience of getting and signing a Windows installer was
               | awful and goes back to what you were saying about desktop
               | app dying a decade ago.
        
               | dom96 wrote:
               | I'd say it's more an issue with the anti-virus that is
               | flagging this as a virus when it isn't one. We should
               | expect better out of AV software. I've personally seen
               | many instances of false positives across various software
               | that was definitely not a virus.
        
       | tydunn wrote:
       | I've been playing around with it for the last couple days on my
       | Windows machine, using it for local tab-autocomplete in VS Code,
       | and it's been just as good as it is on my Mac
        
         | notsylver wrote:
         | What do you use for tab-autocomplete in VS Code? I've been
         | trying to find something that can replace copilot, just because
         | it sounds fun. Everyhing I've found seems more aimed at
         | entering a prompt and having it refactor code, not completing
         | as you write with no other input.
        
           | tydunn wrote:
           | I use Continue.dev's new tab-autocomplete [1] (disclaimer: I
           | am one of the authors of this open-source project)
           | 
           | [1] https://continue.dev/docs/walkthroughs/tab-autocomplete
        
       | orion138 wrote:
       | Is this people's preferred way to run and develop on top of local
       | models?
       | 
       | Any other must learn tools?
        
       | jameshart wrote:
       | What is the rationale for so many of these 'run it locally' AI
       | ports to run _as a server_?
       | 
       | Have developers forgotten that it's actually possible to run code
       | inside your UI process?
       | 
       | We see the same thing with stable diffusion runners as well as
       | LLM hosts.
       | 
       | I don't like running background services locally if I don't need
       | to. Why do these implementations all seem to operate that way?
        
         | mattnewton wrote:
         | It doesn't make sense to load the weights on the fly- that is
         | gigabits of memory that has to be shuffled around. Instead, you
         | have a long running process that serves up lots of predictions
         | 
         | (edit: someday soon, probably to multiple clients too!)
        
           | jameshart wrote:
           | Local UI applications _are_ long running processes normally
        
           | nightfly wrote:
           | So better to have GiBs of memory consumed by it constantly?
        
             | mattnewton wrote:
             | If you don't have that memory to spare you can't run this
             | locally anyways, and keeping it in memory is the only way
             | to have a fast experience. Paying the model loading cost
             | repeatedly sucks.
        
               | jameshart wrote:
               | Why would linking llama.cpp into a UI application lead to
               | incurring the model loading cost repeatedly?
        
               | mattnewton wrote:
               | It would be loaded repeatedly if the ui is opened and
               | closed repeatedly. You can achieve the same "long running
               | server + short running ui window" with multiple threads
               | or processes all linked into one binary if you want of
               | course. This way (with a separate server) seems simpler
               | to me (and has the added benefit that multiple
               | applications could easily call into the "server" if
               | needed)
        
         | jmorgan wrote:
         | This is a really interesting question. I think there's
         | definitely a world for both deployment models. Maybe a good
         | analogy is database engines: both SQLite (a library) and
         | Postgres (a long-running service) have widespread use cases
         | with tradeoffs.
        
           | jameshart wrote:
           | But these are typically filling the usecases of
           | _productivity_ applications, not 'engines'.
           | 
           | Microsoft Word doesn't run its grammar checker as an external
           | service and shunt JSON over a localhost socket to get
           | spelling and style suggestions.
           | 
           | Photoshop doesn't install a background service to host
           | filters.
           | 
           | The closest pattern I can think of is the 'language servers'
           | model used by IDEs to handle autosuggest - see
           | https://microsoft.github.io/language-server-protocol/ - but
           | the point of that is to enable many to many interop -
           | multiple languages supporting multiple IDEs. Is that the
           | expected usecase for local language assistants and image
           | generators?
        
         | api wrote:
         | The main reason I see is to use the same AI engine for multiple
         | things like VSCode plugins, UI apps, etc.
         | 
         | That being said I use LM Studio which runs as a UI and allows
         | you to start a local server for coding and editor plugins.
         | 
         | I can run Deepseek Coder in VSCode locally on an M1 Max and
         | it's actually useful. It'll just eat the battery quickly if
         | it's not plugged in since it really slams the GPU. It's about
         | the only thing I use that will make the M1 make audible fan
         | noise.
        
         | imiric wrote:
         | This is a good thing IMO. I don't have a very powerful laptop
         | or workstation, but do have a multi-GPU headless server. These
         | projects allow me to experiment with LLMs on my server, and
         | expose an API and web UI to my LAN.
        
         | sgt101 wrote:
         | Because running it locally really means running it on a cloud
         | server that you own and is called by other server that you own.
         | This gives you the ability to make the interfaces lightweight
         | and most importantly to not pay premiums to model servers.
        
           | jameshart wrote:
           | No, running it locally means running it on my laptop.
           | 
           | My Mac M2 is quite capable of running stable diffusion XL
           | models and 30M parameter. LLMs under llama.cpp.
           | 
           | What I don't like is the trend towards the way to do that
           | being to open up network listeners with no authentication on
           | them.
        
             | teaearlgraycold wrote:
             | Bind to localhost then
        
         | psytrx wrote:
         | In addition to the initial loading time noted by the other
         | posters:
         | 
         | You may want to use the same inference engine or even the same
         | LLM for multiple purposes in multiple applications.
         | 
         | Also, which is a huge factor in my opinion, is getting your
         | machine, environment and OS into a state that can't run the
         | models efficiently. It wasn't trivial to me. Putting all this
         | complexity inside a container (and therefore "server") helps
         | tremendously, a) in setting everything up initially and b)
         | keeping up with the constant improvements and updates that are
         | happening regularly.
        
         | lolinder wrote:
         | In addition to everything that everyone else has said: I run
         | Ollama on a large gaming PC for speed but want to be able to
         | use the models from elsewhere in the house. So I run Open-WebUI
         | at chat.domain.example and Ollama at api.chat.domain.example
         | (both only accessible within my local network).
         | 
         | With this setup I can use my full-speed local models from both
         | my laptop and my phone with the web UI, and my raspberry pi
         | that's running my experimental voice assistant can query Ollama
         | through the API endpoints, all at the full speed enabled by my
         | gaming GPU.
         | 
         | The same logic goes for my Stable Diffusion setup.
        
         | justsomehnguy wrote:
         | > I don't like running background services locally if I don't
         | need to. Why do these implementations all seem to operate that
         | way?
         | 
         | Because it's now a simple REST-like query to interact with that
         | server.
         | 
         | Default model of running the binary and capturing it's output
         | would mean you would reload everything _each time_. Of course,
         | you can write a master process what would actually perform the
         | queries and have a separate executable for querying that master
         | process... wait, you just invented a server.
        
       | crooked-v wrote:
       | I'm curious what people think of the non-open-source LM Studio
       | (https://lmstudio.ai) compared to Ollama.
        
         | Eisenstein wrote:
         | Likes:
         | 
         | * Super easy setup
         | 
         | * One-click download and load models/weights
         | 
         | * Works great
         | 
         | Dislikes:
         | 
         | * throws weights (in Windows) in /users/username/.cache in a
         | proprietary directory structure, eating up tens of gigs without
         | telling you or letting you share them with other clients
         | 
         | * won't let you import models you download yourself
         | 
         | * Search function is terrible
         | 
         | * I hate how it deals with instance settings
        
       | xanderlewis wrote:
       | I just ran this on my new Mac Mini (installing the llama2 model)
       | and got a full-blown kernel panic. What?!
        
       | justsomehnguy wrote:
       | JUST as I wanted to dabble on that and try myself installing all
       | those ... requirements.
       | 
       | And now this article.
       | 
       | Tested, yes, it's amusing on how simple it is and it works.
       | 
       | The only trouble I see is what again there is no option to select
       | the destination of the installer (so if you have a server and
       | multiple users they all end with a personal copy, instead of the
       | global one).
        
       ___________________________________________________________________
       (page generated 2024-02-17 23:00 UTC)