[HN Gopher] Show HN: LlamaGPT - Self-hosted, offline, private AI...
___________________________________________________________________
Show HN: LlamaGPT - Self-hosted, offline, private AI chatbot,
powered by Llama 2
Author : mayankchhabra
Score : 111 points
Date : 2023-08-16 15:05 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| belval wrote:
| Nice project! I could not find the information in the README.md,
| can I run this with a GPU? If so what do I need to change? Seems
| like it's hardcoded to 0 in the run script:
| https://github.com/getumbrel/llama-gpt/blob/master/api/run.s...
| crudgen wrote:
| Had the same thought, since it is kinda slow (only have 4
| pyhsical/8 logical cores though). But I think vRAM might be a
| problem (8gb can work, if one has a rather recent gpu (here
| m1/2 might be interesting)).
| mayankchhabra wrote:
| Ah yes, running on GPU isn't supported at the moment. But CUDA
| (for Nvidia GPUs) and Metal support is on the roadmap!
| samspenc wrote:
| Ah fascinating, just curious, what's the technical blocker? I
| thought most of the Llama models were optimized to run on
| GPUs?
| caesil wrote:
| So many projects still using GPT in their name.
|
| Is the thinking here that OpenAI is not going to defend that
| trademark? Or just kicking the can down the road on rebranding
| until the C&D letter arrives?
| schappim wrote:
| They don't have the trademark yet.
|
| OpenAI has applied to the United States Patent and Trademark
| Office (USPTO) to seek domestic trademark registration for the
| term "GPT" in the field of AI.[64] OpenAI sought to expedite
| handling of its application, but the USPTO declined that
| request in April 2023.
| khaledh wrote:
| This reminds me of the first generation of computers in the 40s
| and early 50s following the ENIAC: EDSAC, EDVAC, BINAC, UNIVAC,
| SEAC, CSIRAC, etc. It took several years for the industry to
| drop this naming scheme.
| super256 wrote:
| Well, GPT is simply an initialism for "Generative Pre-trained
| Transformer".
|
| In Germany, a trademark can be lost if it becomes a
| "Gattungsbegriff" (generic term). This happens when a trademark
| becomes so well-known and widely used that it becomes the
| common term for a product or service, rather than being
| associated with a specific company or brand.
|
| For example, if a company invented a new type of vacuum cleaner
| and trademarked the name, but then people started using that
| name to refer to all vacuum cleaners, not just those made by
| the company, the trademark could be at risk of becoming a
| generic term; which would lead to a deletion of the trademark.
| I think this is basically what happens to GPT here.
|
| Btw, there are some interesting exampls from the past were
| trademarks were lost due to the brand name becoming too
| popular: Vaseline and Fon (hairdryer; everyone in Germany uses
| the term "Fon").
|
| I also found some trademarks which are at risk of being lost:
| "Lego", "Tupperware", "Post" (Deutsche Post/DHL), and "Jeep".
|
| I don't know how all this stuff works in America though. But it
| would honestly suck if you'd approve such a generic term as a
| trademark :/
| raffraffraff wrote:
| Actually in the UK and Ireland a vacuum cleaner is called a
| Hoover. But in general I think we do that less than
| Americans. For example, we don't call a public announcement
| system a "Tannoy". That's a brand of hifi speakers. And we'd
| say "photo copier" instead of Xerox.
| lee101 wrote:
| [dead]
| SubiculumCode wrote:
| I didn't see any info on how this is different than
| installing/running llamacpp or koboldcpp. New offerings are
| awesome of course, but what is it adding?
| mayankchhabra wrote:
| The main difference is setting everything up yourself manually,
| downloading the modal, optimizing the parameters for best
| performance, running an API server and a UI front-end - which
| is out of reach for most non-technical people. With LlamaGPT,
| it's just one command: `docker compose up -d` or one click
| install for umbrelOS home server users.
| SubiculumCode wrote:
| thanks. yeah, that IS useful.
|
| Anyone see if it contains utilities to import models from
| huggingface/github?
| DrPhish wrote:
| Maybe I've been at this for too long and can't see the
| pitfalls of a normal user, but how is that easier than using
| an oobabooga one-click installer (an option that's been
| around "forever")?
|
| I guess ooba one-click doesn't come with a model included,
| but is that really enough of a hurdle to stop someone from
| getting it going?
|
| Maybe I'm not seeing the value proposition of this. Glad to
| be enlightened!
| albert_e wrote:
| Oh I thought this was a quick guide to host it on any server (AWS
| / other clouds) of our choosing.
| mayankchhabra wrote:
| Yes! It can run on any home server or cloud server.
| ryanSrich wrote:
| Interesting. I might try to get this to work on my NAS.
| reneberlin wrote:
| Good luck! The token/sec will be under your expectations or
| it will overheat. You really shouldn't play games with your
| data-storage. You could try it with an old laptop to see
| how bad it performs. Ruining your NAS for this is a bit
| over the top to show, that "it worked somehow". But i don't
| know, maybe your NAS has a powerful processor and is tuned
| to the max and you have redundancy and don't care to loose
| a NAS? Or this was just a joke and i fell for it! ;)
| mayankchhabra wrote:
| Not sure how powerful their NAS is, but on Umbrel Home
| (which has an N5105 CPU), it's pretty useable with ~3
| tokens generated per second.
| samspenc wrote:
| I had the same question initially, was a bit confused by the
| Umbrel reference at the top, but there's a section right below
| it titled "Install LlamaGPT anywhere else" which I think should
| work on any machine.
|
| As an aside, UmbrelOS actually seems like a cool concept by
| itself btw, good to see these "self hosted cloud" projects
| coming together in a unified UI, I may investigate this more at
| some point.
| QuinnyPig wrote:
| I've been looking for something like this for a while. Nice!
| Atlas-Marbles wrote:
| Very cool, this looks like a combination of chatbot-ui and llama-
| cpp-python? A similar project I've been using is
| https://github.com/serge-chat/serge. Nous-Hermes-Llama2-13b is my
| daily driver and scores high on coding evaluations
| (https://huggingface.co/spaces/mike-ravkine/can-ai-code-
| resul...).
| lazzlazzlazz wrote:
| (1) What are the best more creative/less lobotomized versions of
| Llama 2? (2) What's the best way to get one of those running in a
| similarly easy way?
| mritchie712 wrote:
| try llama2-uncensored
|
| https://github.com/jmorganca/ollama
| lkbm wrote:
| https://github.com/jmorganca/ollama was extremely simple to get
| running on my M1 and has a couple uncensored models you can
| just download and use.
| brucemacd wrote:
| https://github.com/jmorganca/ollama/tree/main/examples/priva.
| .. there's an example using PrivateGPT too
| dealuromanet wrote:
| Is it private and offline via ollama? Are all ollama models
| private and offline?
| [deleted]
| avereveard wrote:
| I like this for turn by turn conversations:
| https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b
|
| this for zero shot instructions: https://huggingface.co/Open-
| Orca/OpenOrcaxOpenChat-Preview2-...
|
| easiest way would be https://github.com/oobabooga/text-
| generation-webui
|
| a little more complex way I do is I have a stack with llama.cpp
| server, a openai adapter, and bettergpt as frontend using the
| openai adapter as the custom endpoint. bettergpt ux beats
| oogaboga by a long way (and chatgpt on certain aspects)
| synaesthesisx wrote:
| How this compare to just running llama.cpp locally?
| mayankchhabra wrote:
| It's an entire app (with a chatbot UI) that takes away the
| technical legwork to run the model locally. It's a simple one
| line `docker compose up -d` on any machine, or one click
| install on umbrelOS home servers.
| chasd00 wrote:
| is it a free model or is the politically-correct-only response
| constraints in place?
| mayankchhabra wrote:
| It's powered by Nous Hermes Llama2 7b. From their docs: "This
| model stands out for its long responses, lower hallucination
| rate, and absence of OpenAI censorship mechanisms. [...] The
| model was trained almost entirely on synthetic GPT-4 outputs.
| Curating high quality GPT-4 datasets enables incredibly high
| quality in knowledge, task completion, and style."
| benreesman wrote:
| I'm a little out of date (busy few weeks), didn't the Vicuna
| folks un-housebreak the LLaMA 2 language model (which is world
| class) with a slightly less father-knows-best Instruct tune?
| Havoc wrote:
| Llama is definitely "censored" though I've not found this to be
| an issue in practice. Guess it depends on what you want to do
| with it
| redox99 wrote:
| llama-chat is censored, not base llama
| ccozan wrote:
| Ok, since is running all private, how can I add my own private
| data? For example I have a 20+ years of an email archive that I'd
| like to be ingested.
| rdedev wrote:
| A simple way would be to do some form of retrieval on those
| emails and add those back to the original prompt
| cromka wrote:
| I imagine this means you'd need to come up with own model, even
| if based on existing one.
| ravishi wrote:
| And is that hard? Sorry if this is a newbie question, I'm
| really out of the loop on this tech. What would be required?
| Computing power and tagging? Or can you like improve the
| model without much human intervention? Can it be done
| incrementally with usage and user feedback? Would a single
| user even be able to generate enough feedback for this?
| phillipcarter wrote:
| Yes, this would be quite hard. Fine-tuning an LLM is no
| simple task. The tools and guidance around it are very new,
| and arguably not meant for non-ML Engineers.
___________________________________________________________________
(page generated 2023-08-16 23:00 UTC)