[HN Gopher] Nvidia's Chat with RTX is an AI chatbot that runs lo...
___________________________________________________________________
Nvidia's Chat with RTX is an AI chatbot that runs locally on your
PC
Author : nickthegreek
Score : 138 points
Date : 2024-02-13 14:27 UTC (8 hours ago)
(HTM) web link (www.theverge.com)
(TXT) w3m dump (www.theverge.com)
| navjack27 wrote:
| 30 and 40 series only? My 2080 Ti scoffs at the artificial
| limitation
| phone8675309 wrote:
| Don't worry, they'll be happy to charge you $750 for an entry
| level card next generation that can run this.
| tekla wrote:
| Yes peasants, Nvidia requires you to buy the latest and
| greatest expensive luxury gear, and you will BEG for it.
| nickthegreek wrote:
| You can use an older 3-series card. No latest & greatest
| required.
| nickthegreek wrote:
| A 4060 8gb is $300.
| haunter wrote:
| Cheapest 40xx is $288
|
| https://pcpartpicker.com/products/video-
| card/#c=552&sort=pri...
|
| Chepest 8GB 30xx is $220
|
| https://pcpartpicker.com/products/video-
| card/#sort=price&c=5...
| andy_xor_andrew wrote:
| so they branded this "Chat with RTX", using the RTX branding.
| Which, originally, meant "ray tracing". And the full title of
| your 2080 Ti is the "RTX 2080 Ti".
|
| So, reviewing this...
|
| - they are associating AI with RTX (ray tracing) now (??)
|
| - your RTX card cannot chat with RTX (???)
|
| wat
| startupsfail wrote:
| No support for bf16 in a card that was released more than 5
| years ago, I guess? Support starts with Ampere?
|
| Although you'd realistically need 5-6 bit quantization to get
| anything large/usable enough running on a 12GB card. And I
| think it's just CUDA then, so you should be able to use 2080
| Ti.
| 0x457 wrote:
| > I pull my PC with Intel 8086 out of closet
|
| > I try to run windows 10 on it
|
| > It doesn't work
|
| > pff, Intel cpu cannot run OS meant for intel CPUs
|
| wat
|
| Jokes aside, nvidia been using RTX branding for products that
| use Tensor Cores for a long-time now. Limitation due to 1st
| gen tensor cores not supporting precisions required.
| a13o wrote:
| The marketing whiff on ray tracing happened long ago. DLSS is
| the killer app on RTX cards, another 'AI'-enabled workload.
| nottorp wrote:
| That was my first question, does it display pretty ray traced
| images instead of answers?
| speckx wrote:
| I, too, was hoping that my 2080 Ti from 2019 would suffice. =(
| operator-name wrote:
| Yeah, seems a bit odd because the TensorRT-LLM repo lists
| Turing as supported architecture.
|
| https://github.com/NVIDIA/TensorRT-LLM?tab=readme-ov-file#pr...
| mdrzn wrote:
| Requirement:
|
| NVIDIA GeForce(tm) RTX 30 or 40 Series GPU or NVIDIA RTX(tm)
| Ampere or Ada Generation GPU with at least 8GB of VRAM"
| strangecasts wrote:
| Unfortunately the download is taking its time - which kind of
| base model is it using and what techniques (if any) are they
| using to offload weights?
|
| Since the demo is 35 GB, my first assumption was it's bundling
| a ~13B parameter model, but if the requirement is 8 GB VRAM, I
| assume they're either doing quantization on the user's end or
| offloading part of the model to the CPU.
|
| (I also hope that Windows 11 is a suggested and not a hard
| requirement)
| operator-name wrote:
| For some reason it's actually bundling both LLaMA 13b
| (24.5GB) and Ministral 7b (13.6GB), but only installed
| Ministral 7b. I have a 3070ti 8GB, so maybe it installs the
| other one if you have more VRAM?
| ReFruity wrote:
| I have 3070 and when I choose LLaMA in config it just
| changes it back to Mistral on launch
| nottorp wrote:
| 8 Gb minimum? So they're excluding the new 3050 6 Gb that is
| only powered from pcie?
| vdaea wrote:
| I suppose this app despite running locally will also be heavily
| censored.
|
| Is there some local chatbot application like this, for Windows,
| that isn't hell to set up and that is not censored?
| dvngnt_ wrote:
| it uses Mistral or Llama 2
| theshrike79 wrote:
| https://lmstudio.ai
|
| You can use it to directly search any models, download and run
| 100% locally.
|
| M-series Macs are the simplest, they Just Work. Even faster if
| you tick the GPU box.
|
| Windows needs the right kind of GPU to get workable speed.
| thejohnconway wrote:
| Can that interact with files on your computer, like they show
| in the video?
| operator-name wrote:
| Last time I used it, LM Studio doesn't include RAG.
| mrdatawolf wrote:
| I just installed it yesterday and you are right it does not
| seem to have RAG but you can use something like anythingLLM
| to do the rag work and ut has built in integration with
| studio LM.
| purpleflame1257 wrote:
| You can start with Kobold.cpp which should handhold you through
| the process.
| fortran77 wrote:
| Trying it now. Doesn't seem censored to me.
| Const-me wrote:
| You could try my Mistral implementation:
| https://github.com/Const-me/Cgml/blob/master/Mistral/Mistral...
| McAtNite wrote:
| I'm struggling to understand the point of this. It appears to be
| a more simplified way of getting a local LLM running on your
| machine, but I expect less technically inclined users would
| default to using the AI built into Windows while the more
| technical users will leverage llama.cpp to run whatever models
| they are interested in.
|
| Who is the target audience for this solution?
| papichulo2023 wrote:
| Does windows uses the pc's gpu or just cpu or cloud?
| robotnikman wrote:
| If they are talking about the Bing AI, just using whatever
| OpenAI has in the cloud
| McAtNite wrote:
| I'm referring to CoPilot which for your average non
| technical user who doesn't care whether something is local
| or not has the huge benefit of not requiring the purchase
| an expensive GPU.
| zamadatix wrote:
| Never underestimate people's interest in running
| something which lets them generate crass jokes about
| their friends or smutty conversation when hosted
| solutions like CoPilot could never allow such non-puritan
| morals. If this delivers on being the easiest way to run
| local models quickly then many people will be interested.
| seydor wrote:
| Windows users who haven't bought an Nvidia card yet
| SirMaster wrote:
| This lets you run Mistral or Llama 2, so whomever has an RTX
| card and wants to run either of those models?
|
| And perhaps they will add more models in the future?
| McAtNite wrote:
| I suppose I'm just struggling to see the value add. Ollama
| already makes it dead simple to get a local LLM running, and
| this appears to be a more limited vendor locked equivalent.
|
| From my point of view the only person who would be likely to
| use this would be the small slice of people who are willing
| to purchase an expensive GPU, know enough about LLMs to not
| want to use CoPilot, but don't know enough about them to know
| of the already existing solutions.
| SirMaster wrote:
| I just looked up Ollama and it doesn't look like it
| supports Windows. (At least not yet)
| McAtNite wrote:
| Oh my apologies for the wild goose chase. I thought they
| had added support for Windows already. Should be possible
| to run it through WSL, but I suppose that's a solid point
| for Nvidia in this discussion.
| SirMaster wrote:
| I think there's a market for a user who is not very
| computer savvy who at least understands how to use LLMs
| and would potentially run a chat one on their GPU
| especially if it's just a few clicks to turn on.
| kkielhofner wrote:
| With all due respect this comment has fairly strong (and
| infamous) HN Dropbox thread vibes.
|
| It's an Nvidia "product", published and promoted via their
| usual channels. This is co-sign/official support from
| Nvidia vs "Here's an obscure name from a dizzying array of
| indistinguishable implementations pointing to some random
| open source project website and Github repo where your eyes
| will glaze over in seconds".
|
| Completely different but wider and significantly less
| sophisticated audience. The story link is on The Verge and
| because this is Nvidia it will also get immediately
| featured in every other tech publication, website,
| subreddit, forum, twitter account, youtube channel, etc.
|
| This will get more installs and usage in the next 72 hours
| than the entire Llama/open LLM ecosystem has had in its
| history.
| McAtNite wrote:
| Unfortunately I'm not aware of the reference to the HN
| Dropbox thread.
|
| I suppose my counter point is only that the user base
| that relies on simplified solutions is largely already
| addressed with the wide number of cloud offerings from
| OpenAi, Microsoft, Google, whatever other random company
| has popped up. Realistically I don't know if the people
| who don't want to use those, but also don't want to look
| at GitHub pages is really that wide of an audience.
|
| You could be right though. I could be out of touch with
| reality on this one, and people will rush to use the
| latest software packaged by a well known vendor.
| thecal wrote:
| It is probably the most famous HN comment ever made and
| comes up often. It is a dismissive response to Dropbox
| years ago:
|
| https://news.ycombinator.com/item?id=9224
| McAtNite wrote:
| Thanks for the explanation. I guess my only hope for not
| looking like I had a bad opinion is people's intertia to
| move beyond CoPilot.
| anonymousab wrote:
| > the user base that relies on simplified solutions is
| largely already addressed
|
| There is a wide spectrum of users for which a more white-
| labelled locally-runnable solution might be exactly what
| they're looking for. There's much more than just the two
| camps of "doesn't know what they're doing" and
| "technically inclined and knows exactly what to do" with
| LLMs.
| Capricorn2481 wrote:
| I have no idea what you're talking about and am waiting
| for an answer to OPs question. Downloading text-
| generation-webui takes a minute, let's you use any model
| and get going. I don't really understand what this Nvidia
| thing adds? It seems even more complicated than the open
| source offerings.
|
| I don't really care how many installs it gets, does it do
| anything differently or better?
| sevagh wrote:
| It brings more authority than "oh just use <string of
| gibberish from the frontpage of hn>"
| tracerbulletx wrote:
| It's a different inference engine with different
| capabilities. It should be a lot faster on Nvidia cards.
| I don't have comp benchmarks for llama.cpp but if you
| find some compare them to this.
|
| https://nvidia.github.io/TensorRT-LLM/performance.html
| https://github.com/lapp0/lm-inference-engines/
| pquki4 wrote:
| Anyone who bothers to distinguish a product from
| Microsoft/nvidia/meta/someone else already know what they
| are doing.
|
| Most users don't care whether whether the model is run,
| online or local. They go to ChatGPT or Bing/Copilot to
| get answers, as long as they are free. Well, if it
| becomes a (mandatory) subscription, they are more likely
| to pay for it rather than figure out how to run a local
| LLM.
|
| Sounds like you are the one who's not getting the
| message.
|
| So basically the only people who runs a local LLM are
| those who are interested enough in this. Any why would
| brand name matter? What matters is whether a model is
| good, whether it can run on a specific machine and how
| fast it is etc, and there are objectives for it. People
| who run local LLM don't automatically choose Nvidia's
| product over something just because nvidia is famous.
| se4u wrote:
| You are forgetting about developers who may want to develop
| on top of something stable and with long term support.
| That's a big market.
| McAtNite wrote:
| Would they not prefer to develop for CoPilot? In
| comparison this seems niche.
| dist-epoch wrote:
| There are developers which fail to install
| Ollama/CUDA/Python/create-venv/download-models on their
| computer after many hours of trying.
|
| You think a regular user has any chance?
| McAtNite wrote:
| Not really. I expect those users will just use copilot.
| ribosometronome wrote:
| Gamers who bought an expensive card and see this advertised
| to them in Nvidia's Geforce app?
| pquki4 wrote:
| I don't think your comment answers the question? Basically,
| those who bother to know underlying model's name can already
| run their model without this tool from nvidia?
| fortran77 wrote:
| It seems really clear to me! I downloaded it, pointed it to my
| documents folder, and started running it. It's nothing like the
| "AI built into Windows" and it's much easier than dealing with
| rolling my own.
| operator-name wrote:
| This is a tech demo for TensorRT, which is ment to greatly
| improve inference time for compatible models.
| dkarras wrote:
| >It appears to be a more simplified way of getting a local LLM
| running on your machine
|
| No, it answers questions from the documents you provide. Off
| the shelf local LLMs don't do this by default. You need a RAG
| stack on top of it or fine tune with your own content.
| westurner wrote:
| From "Artificial intelligence is ineffective and potentially
| harmful for fact checking" (2023)
| https://news.ycombinator.com/item?id=37226233 : pdfgpt,
| knowledge_gpt, elasticsearch :
|
| > _Are LLM tools better or worse than e.g. meilisearch or
| elasticsearch for searching with snippets over a set of
| document resources?_
|
| > _How does search compare to generating things with
| citations?_
|
| pdfGPT: https://github.com/bhaskatripathi/pdfGPT :
|
| > _PDF GPT allows you to chat with the contents of your PDF
| file by using GPT capabilities._
|
| GH "pdfgpt" topic: https://github.com/topics/pdfgpt
|
| knowledge_gpt: https://github.com/mmz-001/knowledge_gpt
|
| From https://news.ycombinator.com/item?id=39112014 : paperai
|
| neuml/paperai: https://github.com/neuml/paperai :
|
| > _Semantic search and workflows for medical /scientific
| papers_
|
| RAG: https://news.ycombinator.com/item?id=38370452
|
| Google Desktop (2004-2011):
| https://en.wikipedia.org/wiki/Google_Desktop :
|
| > _Google Desktop was a computer program with desktop search
| capabilities, created by Google for Linux, Apple Mac OS X,
| and Microsoft Windows systems. It allowed text searches of a
| user 's email messages, computer files, music, photos, chats,
| Web pages viewed, and the ability to display "Google Gadgets"
| on the user's desktop in a Sidebar_
|
| GNOME/tracker-miners: https://gitlab.gnome.org/GNOME/tracker-
| miners
|
| src/miners/fs: https://gitlab.gnome.org/GNOME/tracker-
| miners/-/tree/master/...
|
| SPARQL + SQLite: https://gitlab.gnome.org/GNOME/tracker-
| miners/-/blob/master/...
|
| https://news.ycombinator.com/item?id=38355385 : LocalAI,
| braintrust-proxy; promptfoo, chainforge, mixtral
| joenot443 wrote:
| The immediate value prop here is the ability to load up
| documents to train your model on the fly. 6mos ago I was
| looking for a tool to do exactly this and ended up deciding to
| wait. Amazing how fast this wave of innovation is happening.
| amelius wrote:
| With only 8GB of VRAM, it can't be that good ...
| speed_spread wrote:
| Newer models such as Phi2 run comfortably with 4GB and are good
| enough to be useful for casual interaction. Sticking with local
| inference, multiple small models tuned for specific usage
| scenarios is where it's at.
| tuananh wrote:
| this is exactly what i want: a personal assistant.
|
| a personal assistant to monitor everything i do on my machine,
| ingest it and answer question when i need.
|
| it's not there yet (still need to manually input url, etc...)
| though but it's very much feasible.
| Xeyz0r wrote:
| But it sounds kinda creepy don't you think?
| spullara wrote:
| it is all local so, no?
| autoexec wrote:
| It generates responses locally, but does your data stay
| local? It's fine if you only ever use it on a device that
| you leave offline 100% of the time, but otherwise I'd pay
| close attention to what it's doing. Nvidia doesn't have a
| great track record when it comes to privacy (for example:
| https://news.ycombinator.com/item?id=12884762).
| operator-name wrote:
| The source is available, minus the installer. You could
| always use the base repo after verifying it:
|
| https://github.com/NVIDIA/trt-llm-rag-windows
| gmueckl wrote:
| You'd be the one controlling the off-switch and the physical
| storage devices for the data. I'd think that this fact takes
| most of the potential creep out. What am I not seeing here?
| Capricorn2481 wrote:
| > You'd be the one controlling the off-switch and the
| physical storage devices for the data
|
| Based on what? The CPU is a physical storage device on my
| PC but it still can phone home and has backdoors.
|
| Is there any reason to think Nvidia isn't collecting my
| data?
| pixl97 wrote:
| If you're on linux just monitor and block any traffic to
| random addresses.
|
| If you're on Windows, what makes you think they are not
| already?
| chollida1 wrote:
| > But it sounds kinda creepy don't you think?
|
| is the bash history command creepy?
|
| Is your browsers history command creepy?
| Nullabillity wrote:
| Yes to both?
|
| But those also don't try to reinterpret what I wrote.
| tuananh wrote:
| if it's 100% local then fine.
| mistermann wrote:
| I'd like something that monitors my history on all browsers
| (mobile and desktop, and dedicated client apps like substance,
| Reddit, etc) and then ingests the articles (and comments, other
| links with some depth level maybe) and then allows me to ask
| questions....that would be amazing.
| tuananh wrote:
| yes, i want that too. not sure if anyone is building sth like
| this?
| majestic5762 wrote:
| rewind.ai
| majestic5762 wrote:
| mykin.ai is building this with privacy in mind. Runs small
| models on-device, while large ones in confidential VMs in the
| cloud.
| yuck39 wrote:
| Interesting. Since you are running it locally do they still have
| to put up all the legal guardrails that we see from Chat GPT and
| the like?
| dist-epoch wrote:
| Yes, because otherwise there would be news articles "NVIDIA
| installs racist/sexist/... LLM on users computers"
| phone8675309 wrote:
| Gaming company
|
| Gaming LLM
|
| Checks out
| mchinen wrote:
| Given that you can pick llama or mistral in the NVIDIA interface,
| I'm curious if this is built around ollama or reimplementing
| something similar. The file and URL retrieval is a nice addition
| in any case.
| RockRobotRock wrote:
| Why can't this run on older devices?
| Legend2440 wrote:
| It's an LLM, older devices don't have the juice.
|
| Newer devices only barely have the juice.
| RockRobotRock wrote:
| What does that mean, though? Is it a VRAM thing? I have a 20
| series card with 11 GB and okay performance in CUDA for
| things like OpenAI Whisper.
|
| I think it could run it, albeit slowly.
| htrp wrote:
| Are there benchmarks on how much faster TensorRT vs native
| torch/cuda?
| operator-name wrote:
| I found some official benchmarks for enterprise GPUs, but no
| comparison data. I couldn't find any benchmarks for commercial
| GPUs.
|
| https://nvidia.github.io/TensorRT-LLM/performance.html
| fisf wrote:
| https://nvidia.github.io/TensorRT-LLM/performance.html
|
| It was one of the fastest backends last time I checked (with
| vLLM and lmdeploy being comparable), but the space moves fast.
| It uses cuda under the hood, torch is not relevant in this
| context.
| operator-name wrote:
| This looks quite cool! It's basically a tech demo for TensorRT-
| LLM, a framework that amongst other things optimises inference
| time for LLMs on Nvidia cards. Their base repo supports quite a
| few models.
|
| Previously there was TensorRT for Stable Diffusion[1], which
| provided pretty drastic performance improvements[2] at the cost
| of customisation. I don't forsee this being as big of a problem
| with LLMs as they are used "as is" and augmented with RAG or
| prompting techniques.
|
| [1]: https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT
| [2]:
| https://reddit.com/r/StableDiffusion/comments/17bj6ol/hows_y...
| operator-name wrote:
| Having installed this, this is an incredibly then wrapper
| around the following github repos:
|
| https://github.com/NVIDIA/trt-llm-rag-windows
| https://github.com/NVIDIA/TensorRT-LLM
|
| It's quite a thin wrapper around putting both projects into
| %LocalAppData%, along with a miniconda environment with the
| correct dependnancies installed. Also for some reason the LLaMA
| 13b (24.5GB) and Ministral 7b (13.6GB) but only installed
| Ministral?
|
| Ministral 7b runs about as accurate as I remeber, but responses
| are faster than I can read. This seems at the cost of context
| and variance/temperature - although it's a chat interface the
| implementation doesn't seem to take into account previous
| questions or answers. Asking it the same question also gives
| the same answer.
|
| The RAG (llamaindex) is okay, but a little suspect. The
| installation comes with a default folder dataset, containing
| text files of nvidia marketing materials. When I tried asking
| questions about the files, it often cites the wrong file even
| if it gave the right answer.
| kkielhofner wrote:
| The wrapping of TensorRT-LLM alone is significant.
|
| I've been working with it for a while and it's... Rough.
|
| That said it is extremely fast. With TensorRT-LLM and Triton
| Inference Server with conservation performance settings I get
| roughly 175 tokens/s on an RTX 4090 with Mistral-Instruct 7B.
| Following commits, PRs, etc I expect this to increase
| significantly in the future.
|
| I'm actually working on a project to better package Triton
| and TensorRT-LLM and make it "name and model and press enter"
| level usable with support for embeddings models, Whisper,
| etc.
| FirmwareBurner wrote:
| _> LLaMA 13b (24.5GB) and Ministral 7b (13.6GB) _
|
| But the HW requirements state 8GB of VRAM. How do those
| models fit in that?
| krunck wrote:
| Does this communicate with other people's (cloud) computers at
| all?
| spullara wrote:
| So you download a 35G zipfile, then extract that, then run the
| setup where you can decide not to install one of the models. Why
| doesn't it just download the model you want to use? Who did this?
|
| Oh and now the install failed with no error message. Lovely.
| 9front wrote:
| In my case the installer finished okay. It pulled in miniconda,
| nvidia-toolkit and "Mistral 7b int4". Upon launch it opens a
| gradio-chat session. For the datasets it supports .txt, .pdf,
| .doc files. There's a "YouTube URL" option and "AI model
| default".
|
| When asked "What is ChatWithRTX" the reply was:
| Chat with RTX is not mentioned in the given context
| information. It is possible that it is a separate
| event or feature that is related to the NVIDIA Lounge
| at LTX 2023 or QuakeCon 2023, where attendees can interact
| with a virtual assistant or chatbot powered by NVIDIA's GeForce
| RTX technology. However, without more information, it
| is impossible to provide a more detailed answer.
| Reference files: portal-prelude-rtx-inside-the-game-
| interview.txt
| 9front wrote:
| It's a 35.1GB download!
| politelemon wrote:
| It's the L in LLM!
| bugbuddy wrote:
| This is amazing and it shows that Nvidia is at least 3 decades
| ahead of the competitors. Imagine this turning into a powerful
| agent that can answer everything about your life. It will
| revolutionize life as we know it. This is why Nvidia stock is
| green and everything else is red today. I am glad that I went all
| in on the green team. I wished I could get more leveraged at this
| point.
| dingnuts wrote:
| calm down NVIDIA marketing department, 30 years is a long time
| v4lheru wrote:
| i have 3050 and its failing for me. do i have to install it on
| Windows Drive - cause i don't have that much space there?
| haunter wrote:
| Do you have an 8GB 3050?
| mleroy wrote:
| I actually thought AMD would release something like that. But
| somehow they don't seem to see their chance.
| randerson wrote:
| The Creative Labs sound cards of the early 90's came with Dr.
| Sbaitso, an app demoing their text-to-speech engine by pretending
| to an AI psychologist. Someone needs to remake that!
___________________________________________________________________
(page generated 2024-02-13 23:01 UTC)