[HN Gopher] LM Studio - Discover, download, and run local LLMs
___________________________________________________________________
LM Studio - Discover, download, and run local LLMs
Author : victormustar
Score : 394 points
Date : 2023-11-22 09:38 UTC (13 hours ago)
(HTM) web link (lmstudio.ai)
(TXT) w3m dump (lmstudio.ai)
| d_meeze wrote:
| So if you suspect I am using this for business related purposes
| you may take any action you want to spy on me? Such Terms. Very
| Use.
| tmikaeld wrote:
| It's a standard clause for most apps. If a breach of the terms
| of conditions (such as using it for commercial purposes, like
| selling the software), they are allowed to launch an
| investigation. No where does this mention "spying" on modifying
| the app for such use.
| isilofi wrote:
| Maybe the app is already modified for such use and just needs
| to be given a "start spying" command.
| tmikaeld wrote:
| Maybe, personally I don't trust closed source apps like
| these for that reason.
|
| And when I do try them, it's with Little Snitch blocking
| outgoing connections.
| busssard wrote:
| Thats why as a business i would rather use a trusted FOSS LLm
| interface like textgeneration-WEBUI
| https://github.com/oobabooga/text-generation-webui
| nacs wrote:
| Or a simpler alternative: https://ollama.ai/
| hobofan wrote:
| ollama doesn't come packaged with an easy to invoke UI
| though.
| manyoso wrote:
| Gpt4all does
| moffkalast wrote:
| Much Sus. Wow.
| ionwake wrote:
| Sorry I haven't used this product yet - Do the chat messages
| get uploaded to a server ?
| sumedh wrote:
| How does this App make money?
| tmikaeld wrote:
| Looks like they'll charge for a "Pro" version that can be used
| commercially. Though they'd have to confirm that, I only
| deducted from their ToS.
| eurekin wrote:
| This app could use some simple UI improvements:
|
| - The chatbox field has a normal "write here" state, when no chat
| is really selected. I thought my keyboard broke until I
| discovered that
|
| - I didn't find a way to set cuda acceleration before loading a
| model, only managed to set gpu offloaded layers and using
| "relaunch to apply"
|
| - Some HugginFace models are simply not listed and there's no
| indication about why. I guess models are really curated, but
| somehow presented as a HuggingFace browser?
|
| - Scrolling in the accordion parts of the interface seems to be
| responding to mouse wheel scroll only. I have a mouse with a
| damaged one and couldn't find a way to reliably navigate to
| bottom drawers
|
| That said, I really liked the server tab, which allowed for
| initial debugging very easily
| easygenes wrote:
| It's basically a front end for llama.cpp, so it will only show
| models with GGUF quantizations.
| eurekin wrote:
| Ah, makes sense
| hahn-kev wrote:
| Those are really weird bugs, how do you even manage that these
| days
| eurekin wrote:
| Not sure if sarcastic or not. Assuming not.
|
| I did find it quite useful for opening a socket for remote
| tooling (played around withe the continue plugin).
|
| The quirky UI did slow me down,but nothing really
| showstopping
| pointlessone wrote:
| M1 is only 3 years old and no one cares to support intel macs any
| more. There are surely a lot of them out there. Are they that
| much worse to run LLMs on?
| amrrs wrote:
| Ollama works super fine on Intel Mac
|
| The demo on this video is from Intel Mac
| https://youtu.be/C0GmAmyhVxM?si=puTCpGWButsNvKA5
|
| It also supports openai compatible api and completely open-
| source unlike LM studio
| eurekin wrote:
| Yeah, ollama since way nicer in UX
| urtrs wrote:
| Ollama's API is not openai compatible.
| rsolva wrote:
| Correct, but LocalAI has a compatible API for those who
| needs it.
| isilofi wrote:
| Disappointing, no proper Linux support. Just "ask on discord".
| apstls wrote:
| from a pinned message on their discord:
| https://s3.amazonaws.com/releases.lmstudio.ai/prerelease/LM+...
| kgeist wrote:
| For my experiments with new self-hostable models on Linux, I've
| been using a script to download GGUF-models from TheBloke on
| HuggingFace (currently, TheBloke's repository has 657 models in
| the GGUF format) which I feed to a simple program I wrote which
| invokes llama.cpp compiled with GPU support. The GGUF format and
| TheBloke are a blessing, because I'm able to check out new models
| basically on the day of their release (TheBloke is very fast) and
| without an issue. However, the only frontend I have is console.
| Judging by their site, their setup is exactly the same as mine
| (which I implemented over a weekend), except that they also added
| a React-based UI on top. I wonder, how they're planning to
| commercialize it, because it's pretty trivial to replicate, and
| there're already open-source UI's like oogabooga.
| bambax wrote:
| I'd like to build myself a headless server to run models, that
| could be queried from various clients locally on my LAN, but am
| usure where to start and what the hardware requirements would
| be. Software can always be changed later but I'd rather buy the
| hardware parts only once.
|
| Do you have recommendations about this? or blog posts to get
| started? What would be a decent hardware configuration?
| fintechie wrote:
| You can currently do this in an M2 Max with ollama and a
| Nextjs UI [0] running in a docker container. Any devices in
| the network can use the UI... and I guess if you want a LAN
| API you just need to run another container with with OAI
| compatible API that can query ollama.. eg [1]
|
| [0]https://github.com/ivanfioravanti/chatbot-ollama
|
| [1]https://github.com/BerriAI/litellm
| rsolva wrote:
| Ollama does this. I run it in a container on my homelab
| (Proxmox on a HP EliteDesk SFF G2 800) and 7B models run
| decently fast on CPU-only. Ollama has a nice API and makes it
| easy to manage models.
|
| Together with ollama-webui, it can replace ChatGPT 3.5 for
| most tasks. I also use it in VSCode and nvim with plugins,
| works great!
|
| I have been meaning to write a short blog post about my
| setup...
| mfalcon wrote:
| I've been trying Ollama locally. I've yet to know how it'll
| behave in a production setting.
| kkielhofner wrote:
| Depending on what you mean by "production" you'll
| probably want to look at "real" serving implementations
| like HF TGI, vLLM, lmdeploy, Triton Inference Server
| (tensorrt-llm), etc. There are also more bespoke
| implementations for things like serving large numbers of
| LoRA adapters[0].
|
| These are heavily optimized for more efficient memory
| usage, performance, and responsiveness when serving large
| numbers of concurrent requests/users in addition to
| things like model versioning/hot load/reload/etc,
| Prometheus metrics, things like that.
|
| One major difference is at this level a lot of the more
| aggressive memory optimization techniques and support for
| CPU aren't even considered. Generally speaking you get
| GPTQ and possibly AWQ quantization + their optimizations
| + CUDA only. Their target users and their use cases are
| often using A100/H100 and just trying to need fewer of
| them. Support for lower VRAM cards, older CUDA compute
| architectures, etc come secondary to that (for the most
| part).
|
| [0] - https://github.com/S-LoRA/S-LoRA
| mfalcon wrote:
| Thanks! Really helpful. I've a 3090 at home and my idea
| is to do some testing on a similar config in the cloud to
| have an idea of the amount of requests that could be
| served.
| kkielhofner wrote:
| The good news is the number of requests and performance
| is very impressive. For example, on my RTX 4090 from
| testing many months ago with lmdeploy (it was the first
| to support AWQ) I was getting roughly 70 tokens/s each
| across 10 simultaneous sessions with LLama2-13b-Chat -
| almost 700 tokens/s total. If I were to test again now
| with all of the impressive stuff that's been added to all
| of these I'm sure it would only be better (likely
| dramatically).
|
| The bad news is because "low VRAM cards" like the 24GB
| RTX 3090 and RTX 4090 aren't really targetted by these
| frameworks you'll eventually run into "Yeah you're going
| to need more VRAM for that model/configuration. That's
| just how it is." as opposed to some of the approaches for
| local/single session serving that emphasize memory
| optimization first and tokens/s for a single session
| next. Often with no consideration or support at all for
| multiple simultaneous sessions.
|
| It's certainly possible that with time these serving
| frameworks will deploy more optimizations and strategies
| for low VRAM cards but if you look at timelines to even
| implement quantization support (as one example) it's
| definitely an after-thought and typically only
| implemented when it aligns with the overall "more tokens
| for more users across more sessions on the same hardware"
| goals.
|
| Loading a 70B model on CPU and getting 3 tokens/s (or
| whatever) is basically seen as an interesting yet
| completely impractical and irrelevant curiosity to these
| projects.
|
| In the end "the right tool for the job" always applies.
| Loic wrote:
| If I may ask, which plugins are you using in VSCode?
| rsolva wrote:
| I'm using the extension Continue: https://marketplace.vis
| ualstudio.com/items?itemName=Continue...
|
| The setup of connecting to Ollama is a bit clunky, but
| once it's set up it works well!
| akx wrote:
| Just compile llama.cpp's server example, and you have a local
| HTTP API. It also has a simple UI (disclaimer: to which I've
| contributed).
|
| https://github.com/ggerganov/llama.cpp/blob/master/examples/.
| ..
| guntars wrote:
| LM Studio can start a local server with the APIs matching
| OpenAI. You can't do concurrent requests, but that should get
| you started.
| Havoc wrote:
| > usure where to start and what the hardware requirements
| would be
|
| Have a look at the localllama subreddit
|
| In short though dual 3090 is common, single 4090 or various
| flavours of M123 macs. Alternatively p40 can be jury-rigged
| too but research that carefully. In fact anything with more
| than one gpu is going to require careful research
| ekianjo wrote:
| You dont need thebloke. Its trivial to make gguf files from bin
| models by yourself.
| cooperaustinj wrote:
| What a comment. Why do it the easy way when the more
| difficult and slower way works ok it to the same result!? For
| people who just want to USE models and not back at them,
| TheBloke is exactly the right place to go.
|
| Like telling someone interested in 3D printing minis to build
| a 3D printer instead of buying one. Obviously that helps them
| get to their goal of printing minis faster right?
| paul_n wrote:
| Actually, consider that the commenter may have helped un-
| obfuscate this world a little bit by saying that it is in
| fact easy. To be honest the hardest part about the local
| LLM scene is the absurd amount of jargon introduced -
| everything looks a bit more complex than it is. It's really
| is easy with llama.cpp, someone even wrote a tutorial here:
| https://github.com/ggerganov/llama.cpp/discussions/2948 .
|
| But yes, TheBloke tends to have conversions up very quickly
| as well and has made a name for himself for doing this
| (+more)
| andy99 wrote:
| I don't mean this as a criticism, I'm just curious because I work
| in this space too: who is this for? What is the niche of people
| savvy enough to use this who can't run one of the many open
| source local llm software? It looks in the screenshot like it's
| exposing much of the complexity of configuration anyway. Is the
| value in the interface and management of conversation and models?
| It would be nice to see info or even speculation about the
| potential market segments of LLM users.
| pridkett wrote:
| In most workplaces that deal with LLMs you've got a few classes
| of people:
|
| 1. People who understand LLMs and know how to run them and have
| access to run them on the cloud. 2. People who understand LLMs
| well enough but don't have access to cloud resources - but
| still have a decent MacBook Pro. Or maybe access to cloud
| resources is done via overly tight pipelines. 3. People who are
| interested in LLMs but don't have enough technical chops/time
| to get things going with Llama CPP. 4. People who are fans of
| LLMs but can't even install stuff in their computer.
|
| This is clearly for #3 and it works well for that group of
| people. It could also be for #2 when they don't want to spin up
| their own front end.
| wiihack wrote:
| I got Mistral-7b running locally, and although it wasn't hard,
| it did take some time nonetheless. I just wanted to try it out
| and was not that interested in the technical details.
| 3cats-in-a-coat wrote:
| It's for people who want to discover LLMs and either don't have
| the skill to deploy it, or value their time, and prefer not to
| fool around for hours getting it to work before they can try
| it.
|
| The fact it has configuration is good, as long as it has some
| defaults.
| transcriptase wrote:
| Exactly. People like me have been waiting for a tool like
| this.
|
| I'm more than capable of compiling/installing/running pretty
| much any software, but all I want is the ability to chat with
| a LLM of my choice without spending an afternoon tabbing back
| to a 30 step esoteric GitHub .md full of caveats,
| assumptions, and requiring dependencies to be installed and
| configured according to preferences I don't have.
| paul_n wrote:
| Yeah, I think I fit into this category. If I see a new model
| announced, it's been nice to just click and evaluate for
| myself if it's useful for me. If anyone knows other tools for
| this kind workflow I'd love to hear about them. Right now I
| just keep my "test" prompts in a text file.
| mannyv wrote:
| It's actually quite handy. I built all the various things by
| hand at one point, but had to wipe it all. Instead of following
| the directions again I just downloaded this.
|
| Being able to swap out models is also handy. This probably
| saved a couple of hours of my life, which I appreciate.
| smusamashah wrote:
| Why purple or some shade of purple is the color of all AI
| products? For some reason, the landing pages of AI products
| immediately remind of Crypto products. This one does not have
| Crypto vibes but the colour is purple. I don't get why.
| 3cats-in-a-coat wrote:
| Because apps mostly prefer dark theme now, and dark red, brown,
| dark green and so on look weird, and gray is OK, but very
| boring, like someone desaturated the UI. Which leaves shades of
| blue and purple.
| chatmasta wrote:
| It's a default color in Tailwind.css and is used in a lot of
| the templates and examples. Nine times out of ten, if you check
| the source of a page with this flavor of purple, you'll see
| it's using Tailwind, as the OP site in fact does.
| smusamashah wrote:
| Ah! that makes more sense. New startup, new tech and
| therefore the new default color. I hope its just that and
| because I only tend to notice AI startups purple is what I
| end up seeing.
| replete wrote:
| RIP Intel users
| hugovie wrote:
| LMStudio is great to run local LLMs, also support OpenAI-
| compatible API. In the case you need more advance UI/UX, you can
| use LMStudio with MindMac(https://mindmac.app), just check this
| video for details https://www.youtube.com/watch?v=3KcVp5QQ1Ak.
| smcleod wrote:
| Thanks for sharing MindMac - just tried it out and it's exactly
| what I was looking for, great to see Ollama integration is
| coming soon!
| hugovie wrote:
| Thank you for your support. I just found a workaround
| solution to use Ollama with MindMac. Please check this video
| https://www.youtube.com/watch?v=bZfV70YMuH0 for more details.
| I will integrate Ollama deeply in the future version.
| shanusmagnus wrote:
| MindMac is the first example I've seen where the UI for working
| w/ LLMs is not complete and utter horseshit and starts to
| support workflows that are sensible.
|
| I will buy this with so much enthusiasm if it holds up. Argh,
| this has been such a pain point.
| hnuser123456 wrote:
| This works, but I've noticed that my CPU use goes up to about 30
| percent, all in kernel time (windows), after installing and
| opening this, even when it's not doing anything, on two separate
| machines... I also hear the fan spinning fast on my laptop.
|
| Killed the LM studio process and re-opened it and the ghost
| background usage is down to about 5%.
| manyoso wrote:
| For those looking for an open source alternative with Mac,
| Windows, Linux support check out GPT4All.io
| thejohnconway wrote:
| Terrible name, given that its value is that it runs locally,
| and you can't do that with ChatGPT.
| reustle wrote:
| This looks great!
|
| If you're looking to do the same with open source code, you could
| likely run Ollama and a UI.
|
| https://github.com/jmorganca/ollama + https://github.com/ollama-
| webui/ollama-webui
| nigma wrote:
| I'm having a lot of fun chatting with characters using Faraday
| and koboldcpp. Faraday has a great UI that lets you adjust
| character profiles, generate alternative model responses, undo,
| or edit dialogue, and experiment with how models react to your
| input. There's also SillyTavern that I have yet to try out.
|
| - https://faraday.dev/
|
| - https://github.com/LostRuins/koboldcpp
|
| - https://github.com/SillyTavern/SillyTavern
| eshack94 wrote:
| Ollama is fantastic. I cannot recommend this project highly
| enough. Completely open-source, lightweight, and a great
| community.
| activatedgeek wrote:
| I looked at Ollama before, but couldn't quite figure something
| out from the docs [1]
|
| It looks like a lot of the tooling is heavily engineered for a
| set of modern popular LLM-esque models. And looks like
| llama.cpp also supports LoRA models, so I'd assume there is a
| way to engineer a pipeline from LoRA to llama.cpp deployments,
| which probably covers quite a broad set of possibilities.
|
| Beyond llama.cpp, can someone point me to what the broader
| community uses for general PyTorch model deployments?
|
| I haven't quite ever self-hosted models, and am really keen to
| do one. Ideally, I am looking for something that stays close to
| the PyTorch core, and therefore allows me the flexibility to
| take any nn.Module to production.
|
| [1]:
| https://github.com/jmorganca/ollama/blob/main/docs/import.md.
| seydor wrote:
| oobabooga is no longer popular?
| oceanplexian wrote:
| oobabooga is the king.
|
| As far as I know, ollama doesn't support exllama, qlora fine
| tuning, multi-GPU, etc. Text-generation-webui might seem like
| a science project, but it's leagues ahead (Like 2-4x faster
| inference with the right plugins) of everything else. Also
| has a nice openai mock API that works great.
| smcleod wrote:
| With a couple of other folks I'm currently working on an Ollama
| GUI, it's in early stages of development -
| https://github.com/ai-qol-things/rusty-ollama
| omneity wrote:
| I really like LM Studio and had it open when I came across this
| post. LM Studio is an interesting mixture of:
|
| - A local model runtime
|
| - A model catalog
|
| - A UI to chat with the models easily
|
| - An openAI compatible API
|
| And it has several plugins such as for RAG (using ChromaDB) and
| others.
|
| Personally I think the positioning is very interesting. They're
| well positioned to take advantage of new capabilities in the OS
| ecosystem.
|
| It's still unfortunate that it is not itself open-source.
| addandsubtract wrote:
| Does it also let you connect to the ChatGPT API and use it?
| omneity wrote:
| I haven't found that option. I know it exists in Gpt4all
| though.
|
| Personally I use a locally served frontend to use ChatGPT via
| API.
| SOLAR_FIELDS wrote:
| How does it compare with something like FastChat?
| https://github.com/lm-sys/FastChat
|
| Feature set seems like a decent amount of overlap. One
| limitation of FastChat, as far as I can tell, is that one is
| limited to the models that FastChat supports (though I think it
| would be minor to modify it to support arbitrary models?)
| pentagrama wrote:
| Curious about this and I just download it.
|
| Want to try uncensored models.
|
| I have a question, looking for the most popular "uncensored"
| model I just find "TheBloke/Luna-AI-Llama2-Uncensored-GGML", but
| it has 14 files to download between 2 to 7 GB, I just download
| the first one: https://imgur.com/a/DE2byOB
|
| I try the model and it works: https://imgur.com/a/2vtPcui
|
| I should download all the 14 files to get better results?
|
| Also, asking how to make a bomb it looks that at least this model
| isn't "uncesored": https://imgur.com/a/iYz7VYQ
| jwblackwell wrote:
| basically no open source fine tunes are censored, you can get
| an idea of popular models people are using here:
| https://openrouter.ai/models?o=top-weekly
|
| teknium/openhermes-2.5-mistral-7b is a good one
|
| You don't need all 14 files, just pick one that is recommended
| with a slight loss of quality - hover over the little (i)icon
| to find out
| pentagrama wrote:
| Thank you! Now I checked the (i) tooltips. Just downloaded
| the bigger file (7GB) that says "Minimal loss of quality".
| nerdenough wrote:
| The readme of their repositories each have tables that detail
| the quality of each file. The QK_4_M and QK_5_M seem to be the
| two main recommended ones for low quality loss while too being
| too large.
|
| Only need 1 of the files, but recommend checking out the GGUF
| version of the model (just replace GGML in the URL) instead of
| GGML. Llama.cpp no longer supports GGML, and not sure if
| TheBloke still uploads new GGML versions of models.
| 0xEF wrote:
| Honest question from someone new to exploring and using these
| models; why do you need uncensored? What are the use-cases that
| would call for it?
|
| Again, not questioning your motives or anything, just straight
| up curious. To use your example, any of us can find bomb
| building info online fairly easily, and has been a point of
| social contention since the Anarchist's cookbook. Nobody needs
| an uncensored LLM for that, of course.
| xcv123 wrote:
| For the entertainment value
| lloydatkinson wrote:
| It's very easy to hit absurd "moral" limits on chatgpt for
| the most stupid things.
|
| Earlier I was looking for "a phrase that is used as an insult
| for someone who writes with too much rambling" and all I got
| was some bullshit about how it's sorry but it can't do that
| because it's allegedly against its OpenAI rules.
|
| So I asked again "a phrase negatively used to mean someone
| that writes too much while rambling" and it worked.
|
| I simply cannot be bothered to deal with stupid insipid
| "corporate friendly language" and other dumb restrictions.
|
| Imagine having a real conversation with someone and they
| freaked out any time anything negative was discussed?
|
| TLDR: Thought police ruining LLMs
| pentagrama wrote:
| Curiosity and entertainment. Also have an experience that
| isn't available on the current popular consumer products like
| this.
| CamperBob2 wrote:
| None of your business.
| rgbrgb wrote:
| perhaps they are in a war zone and must learn to make molotov
| cocktails without connecting to the internet
| OJFord wrote:
| Where do you typically have the 'safe search' setting when
| you web search? Personally I have it 'off' ('moderate' when I
| worked in an office I think) even though I'm not looking for
| anything that _ought_ to be filtered by it.
|
| I'm not using models censored or otherwise, but I imagine I'd
| feel the same way there - I won't be offended, so don't try
| to be too clever, just give me the unfiltered results and let
| me decide what's correct.
|
| (Bing AI actually banned me for trying to generate images a
| rough likeness of myself by combining traits of minor
| celebrities - in combination it shouldn't have looked like
| those people either, so I don't think should violate ToS,
| certainly it didn't in intention (I wanted _myself_ , but it
| doesn't know what I look like and I couldn't provide a photo
| at the time, idk if you can now (banned!)) so it does happen,
| 'false positive censoring' if you like.)
| mark_l_watson wrote:
| No, each download is just a different quantization of the same
| model.
| chatmasta wrote:
| Am I missing something here? I'm on a recent M2 machine. Every
| model I've downloaded fails to load immediately when trying to
| load it. Is there some way to get feedback on the reason for
| failure, like a log file or something?
|
| EDIT: The problem is I'm on macOS 13.2 (Ventura). According to a
| message in Discord, the minimum version for some (most?) models
| is 13.6.
| rgbrgb wrote:
| LMStudio is great, if a bit daunting. If you're on Mac and want a
| native open source interface, try out FreeChat
| https://www.freechat.run
| NoMoreNicksLeft wrote:
| Thanks for the link.
|
| I expected it to not let me run this. I have an intel Macbook,
| was expecting that I'd need Apple Silicon... am I
| misunderstanding something? I get fairly fast results at the
| prompt with the default model. How's this thing running with
| whatever shitty GPU I have in my laptop?
| rgbrgb wrote:
| that's the magic of llama.cpp!
|
| I include a universal binary of llama.cpp's server example to
| do inference. What's your machine? The lowest spec I've heard
| it running on is a 2017 iMac with 8GB RAM (~5.5 tokens/s). On
| my m1 with 64GB RAM I get ~30 tokens per second on the
| default 7B model.
| NoMoreNicksLeft wrote:
| Macbook Pro 2020 with 16gb of system ram. I think the gpu
| is Iris Plus? But I don't much keep up on those.
|
| I'm now delving into getting this running in Terminal...
| there are a few things I want to try that I don't think the
| simple interface allows.
|
| Also, I've noticed that when chats get a few kilobytes
| long, it just seizes up and can't go further. I complained
| to it, it spent a sentence apologizing, started up where it
| left off... and got about 12 words further.
| rgbrgb wrote:
| hm yeah i think I need to update llama.cpp to get this
| fix https://github.com/ggerganov/llama.cpp/pull/3996
|
| Thanks for trying it!
| cloudking wrote:
| Is anyone using open source models to actually get work done or
| solving problems in their software architecture? So far I haven't
| found anything near the quality of GPT-4.
| voiceblue wrote:
| WizardCoder is probably close to state of the art for open
| models as of right now.
|
| https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder
|
| https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0
|
| Top of the line consumer machines can run this at a good clip,
| though most machines will need to use a quantized model
| (ExLlamaV2 is quite fast). I found a model for that as well,
| though I haven't used it myself:
|
| https://huggingface.co/oobabooga/CodeBooga-34B-v0.1-EXL2-4.2...
| gsuuon wrote:
| Zephyr is coherent enough to bounce ideas off of, but I'm
| eagerly awaiting when open-source models are on par
| productivity wise with the big providers. I imagine some folks
| are utilizing codellama 34b somehow, but I haven't been able to
| effectively.
| xcv123 wrote:
| > So far I haven't found anything near the quality of GPT-4
|
| GPT-4 has an estimated 1.8 trillion parameters. Orders of
| magnitude beyond open source models and ~10x GPT-3.5 which has
| 175 billion parameters.
|
| https://the-decoder.com/gpt-4-architecture-datasets-costs-an...
| ramoz wrote:
| The reality is there is no general use case for open source
| models, such as there is gpt-4.
|
| The decent chat ones are based on gpt data and they're
| basically shitty distilled models.
|
| The best use case is a narrow one that you decide and can
| create adequate fine-tuning data around. Plenty of real
| production ability here.
| cvhashim04 wrote:
| Wow this is sleek, good job.
| machiaweliczny wrote:
| https://github.com/enricoros/big-agi seems better and is open
| source
| RecycledEle wrote:
| This is what I use on Windows 10.
|
| I have an HP z440 with an E5-1630 v4 and 64GB DDR4 quad channel
| RAM.
|
| I run LLMs on my CPU, and the 7 billion parameter models spit out
| text faster than I can read it.
|
| I wish it supported LMMs (multi modal models.)
| deskamess wrote:
| Newbie question... Is this purely for hosting _text_ language
| models? Is there something similar for image models? i.e., upload
| an image and have some local model provide some detection
| /feedback on it.
| stuckinhell wrote:
| After the latest Chatgpt debacles, the poor performance I'm
| getting from 4 turbo, I'd really like a local version of chatgpt4
| or equivalent. I'd even buy a new pc if I had too.
| ramoz wrote:
| Wouldn't everyone.
| sunsmile wrote:
| Considering the code is closed source and they can change the ToS
| anytime to send conversation data to their servers whenever they
| want, i would like to know what would be the benefit of using
| this over ChatGPT?
| porcoda wrote:
| Amusing qualifications for the senior engineering roles they're
| hiring for:
|
| "Deep understanding of what is a computer, what is computer
| software, and how the two relate."
|
| Right after the senior ML role that requires people understand
| how to write "algorithms and programs."
|
| Kinda hard to take those kinds of requirements seriously.
| wongarsu wrote:
| The second one isn't _that_ bad in context. But the Senior
| Systems Software Engineer is wild, with "Deep understanding of
| what is a computer, what is computer software, and how the two
| relate" followed by "Experience writing and maintaining
| production code in C++14 or newer". You'd think the latter
| would imply the former, but maybe not...
|
| They seem to have even lowered expectations a bit. Two months
| ago [1] they were already hiring for that role (or a very, very
| similar one), but back then you needed experience with
| "mission-critical code in C++17", now just "production code in
| C++14".
|
| 1:
| http://web.archive.org/web/20230922170941/https://lmstudio.a...
| xcv123 wrote:
| > You'd think the latter would imply the former, but maybe
| not...
|
| I wouldn't put C++ devs on too high of a pedestal. I got away
| with writing shitty C++ code for years before I really knew
| what I was doing. It still worked though.
| xcv123 wrote:
| > "Deep understanding of what is a computer, what is computer
| software, and how the two relate."
|
| Seems like a joke, but many developers do not really understand
| what's going on behind the scenes. This gets straight to the
| point. They don't care about HR keyword matching on your CV, or
| how many years of experience you have of being a mediocre
| developer with language X or framework Y. I guess during the
| interview they will investigate whether you truly understand
| the fundamentals.
| FooBarWidget wrote:
| The settings say saving chats can take 2 GB. Why? What states do
| chat LLMs have? Isn't the only state the chat history text?
| whartung wrote:
| I'm late to the game about this. So I'll ask a stupid question.
|
| As a contrived example, what happens if you feed the LoTR books,
| the Hobbit, the Silmarillion, and whatever else is germane, into
| an LLM?
|
| Is there a base, empty, "ignorant" LLM that is used as a seed?
|
| Do you end up with a Middle Earth savant?
|
| Just how does all this work?
| LeoPanthera wrote:
| There isn't enough text in the Tolkein works to generate a
| functional LLM. You need to start with a base model that
| contains enough English (or language of your choice) to become
| functional.
|
| Such a model (LLaMa is a good example) is not "ignorant," but
| rather a generalized model capable of a wide range of language
| tasks. This base model does not have specialized knowledge in
| any one area but has a broad understanding based on the diverse
| training data.
|
| If you were to "feed" Tolkien's specific books into this
| general LLM, the model wouldn't become a Middle Earth savant.
| It would still provide responses based on its broad training.
| It might generate text that reflects the style or themes of
| Tolkien's work if it has learned this from the broader training
| data, but its responses would be based on patterns learned from
| the entire dataset, not just those books.
| whartung wrote:
| So, it wouldn't necessarily "know" much about Middle Earth,
| but might take a stab at writing like Tolkien?
| jhoechtl wrote:
| Is this an alternative to privategpt or GPT4All?
| cztomsik wrote:
| Also, if you don't know what all the toggles are for, this is a
| simpler attempt by me: https://www.avapls.com/
| MuffinFlavored wrote:
| 1. Mistral
|
| 2. Llama 2
|
| 3. Code Llama
|
| 4. Orca Mini
|
| 5. Vicuna
|
| What can I do with any of these models that won't result in 50%
| hallucinations/it recommending code with APIs that don't exist/it
| recommending basically regurgitated StackOverflow historical out
| of date answers (that it was trained on) for libraries that have
| had their versions/APIs change, etc?
|
| Can somebody share one real use case they are using any of these
| models for?
| shironandonon_ wrote:
| Why don't you wrap it with a verification system (eg: web
| scraper) and auto-regenerate / jailbreak any things you don't
| like?
| MuffinFlavored wrote:
| Because I pay $20/mo for GPT-4 and don't understand why
| anybody would run a "less-good" version locally that you can
| trust less/that has less functionality.
|
| That's why I wanted to try to understand, what am I missing
| about local-toy LLMs. How are they not just noise/nonsense
| generators?
___________________________________________________________________
(page generated 2023-11-22 23:01 UTC)