[HN Gopher] Dot - A standalone open source app meant for easy us...
       ___________________________________________________________________
        
       Dot - A standalone open source app meant for easy use of local LLMs
       and RAG
        
       Author : irsagent
       Score  : 150 points
       Date   : 2024-04-07 00:41 UTC (22 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | NKosmatos wrote:
       | Looks promising, especially if you can select just your docs and
       | avoid interacting with Mistral. I'll give it a try to see how it
       | performs. So far I've had mixed results with other similar
       | solutions.
       | 
       | https://news.ycombinator.com/item?id=39925316
       | 
       | https://news.ycombinator.com/item?id=39896923
        
       | reacharavindh wrote:
       | I'm curious to try it out. There seem to be many options to
       | upload a document and ask stuff about it.
       | 
       | But, the holy grail is an LLM that can successfully work on a
       | large corpus of documents and data like slack history, huge wiki
       | installations and answer useful questions with proper references.
       | 
       | I tried a few, but they don't really hit the mark. We need the
       | usability of a simple search engine UI with private data sources.
        
         | snowfield wrote:
         | Rag is limited in that sense. Since the max amount of data you
         | can send is still limited by the token amount that the LLM can
         | process.
         | 
         | But if all you wanted is a search engine that's a bit easier.
         | 
         | The problem is often that a huge wiki installation etc will
         | have a lot of outdated data etc. Which will still be an issue
         | for an llm. And if you had fixed the data you might as well
         | just search for the things you need no?
        
           | rgrieselhuber wrote:
           | This gets to the heart of it. Humans are good at keeping a
           | working memory, as a group or individuals, as lore.
        
           | boredemployee wrote:
           | I think it depends of what they want. Like a search is indeed
           | an easy solution, but if they want a summarization or a
           | generated, straight answer so then things get a little bit
           | harder.
        
           | HeavyStorm wrote:
           | The LLM would have to be trained on the local data. Not
           | impossible, but maybe too costly?
        
             | snowfield wrote:
             | It sounds nice in theory but your dataset is most likely
             | too small for the LLM to "learn" anything.
        
           | IanCal wrote:
           | I'd like to play with giving it more turns. When answering a
           | question the note interesting ones require searching,
           | reading, then searching again, reading more etc.
        
         | haizhung wrote:
         | Differentiale search indices go into this direction:
         | https://arxiv.org/abs/2202.06991
         | 
         | The approach in the paper has rough edges, but the metrics are
         | bonkers (double digit percentage POINTS improvement over dual
         | encoders). This paper was written before the LLM craze, and I
         | am not aware of any further developments in that area. I think
         | that this area might be ripe for some break through innovation.
        
         | verdverm wrote:
         | https://www.kapa.ai/ seems to be the most popular saas for
         | developer tools & docs. I'm seeing it all over the place
        
           | victor106 wrote:
           | Used it, it's just glorified marketing and among all the
           | solutions we tried it ranked in the bottom three.
           | 
           | The best at least for now is to just use OpenAI's custom gpt
           | and with some clever (but not hard) it's quite good.
        
             | CharlesW wrote:
             | Assuming the word "promoting" got lost, can you share more
             | about this?
        
         | oulipo wrote:
         | Have you tried https://markprompt.com/ ?
        
       | PhilippGille wrote:
       | Previous submission from 20 days ago:
       | https://news.ycombinator.com/item?id=39734406
        
       | eole666 wrote:
       | Looks nice! But some informations about the hardware requirement
       | are often missing in this kind of project :
       | 
       | - how much ram is needed
       | 
       | - what CPU do you need for decent performances
       | 
       | - can it run on a GPU? And if it does how much vram do you need /
       | does it work only on Nvidia?
        
         | prosunpraiser wrote:
         | Not sure if this helps but this is from tinkering with Mistral
         | 7B on both my M1 Pro (10 Core, 16 GB RAM) and WSL 2 w/ CUDA
         | (Acer Predator 17, i7-7700HK, GTX 1070 Mobile, 16GB DRAM, 8GB
         | VRAM). - Got 15 - 18 Tokens / sec on WSL 2 with slightly higher
         | on M1. Can think of that to about 10 - 15 words per second.
         | Both were using GPU. Haven't tried CPU on M1 but on WSL 2 it
         | was low single digits - super slow for anything productive. -
         | Used Mistral 7B via llamafile cross-platform APE executable. -
         | For local-uses I found increasing the context size increased
         | the RAM a lot - but it's fast enough. I am considering adding
         | another 16x1 or 8x2.
         | 
         | Tinkering with building a RAG with some of my documents using
         | the vector stores and chaining multiple calls now.
        
           | spxneo wrote:
           | how does 7b match up to Mistral 8x7B?
           | 
           | coming from chatgpt4 it was a huge breath of fresh air to not
           | deal with the judeo-christian biased censorship.
           | 
           | i think this is the ideal localllama setup--uncensored,
           | unbiased, unlimited (only by hardware) LLM+RAG
        
         | alexpinel wrote:
         | Right now the minimum amount of RAM I would recommend is 16gb,
         | I think it can run with less memory but that will require a few
         | changes here and there (although they might reduce
         | performance). I would also strongly recommend using a GPU over
         | CPU, in my experience it can make the LLM run twice as fast if
         | not more. Only Nvidia GPUs are supported for now and the CUDA
         | toolkit 12.2 is required to run Dot.
        
       | logro wrote:
       | I have a reasonably wast library of technical/scientific
       | epubs/documents. Could I use this to import them and the quiz the
       | books?
        
         | alexpinel wrote:
         | Yes! Of course because the LLM is running locally it is not as
         | advanced as bigger models like Claude or GPT, but you can
         | definately quiz the documents. From my experience it performs
         | better with specific questions rather than more ambigous
         | questions that require extensive understanding of the whole
         | document.
        
       | turnsout wrote:
       | Curious about the choice of FAISS. It's a bit older now, and
       | there are many more options for creating and selecting
       | embeddings. Does FAISS still offer some advantages?
        
         | simonw wrote:
         | What options do you think work better?
        
           | J_Shelby_J wrote:
           | I'm trying build an exhaustive list of realistic options. See
           | the spreadsheet here
           | 
           | https://shelbyjenkins.github.io/blog/retrieval-is-all-you-
           | ne...
        
           | turnsout wrote:
           | I don't have an opinion, just wondering why they didn't
           | choose other another option such as Sentence Embeddings,
           | OpenAI embeddings, etc.
        
             | simonw wrote:
             | FAISS can be used with OpenAI embeddings (and any other
             | embedding model).
             | 
             | FAISS is technology for fast indexed similarity vector
             | search - using it is an independent decision from which
             | model you use to create those vectors.
        
         | alexpinel wrote:
         | Hi! I'm the guy who made Dot. I remember experimenting with a
         | few different vector stores in the early stages of the project
         | but decided to settle with FAISS. I mainly chose it because it
         | made it easy to perform the whole embedding process locally and
         | also because it allows to merge vector stores which is what I
         | use to load multiple types of documents at once. But I am
         | definately not an expert on the topic and would really
         | appreciate suggestions on other alternatives that might work
         | better! :)
        
       | bee_rider wrote:
       | Imagine the marketing coup, when we're all saying "Machine
       | learning? Eh, it's all just a bunch of Dot's products."
        
       | pentagrama wrote:
       | Not sure if install the Windows GPU or CPU app version [1].
       | 
       | I have:
       | 
       | Processor: Ryzen 5 3600
       | 
       | Video card: Geforce GTX 1660 TI 6Gb DDR6 (Zotac)
       | 
       | RAM: 16Gb DDR4 2666mhz
       | 
       | Any recommendations?
       | 
       | [1] https://dotapp.uk/download.html
        
         | alexpinel wrote:
         | With those settings I would recommend GPU. CUDA acceleration
         | really makes it faster, but keep in mind the CUDA toolkit 12.2
         | install will be a some 3-4gb
        
           | pentagrama wrote:
           | Thank you!
        
       | MasterYoda wrote:
       | I have collected so much information in text files on my computer
       | that it has become unmanageable to find anything. Now with local
       | AI solutions, I wondered if I could create a smart search engine
       | that could provide answers to the information that exists on my
       | personal data.
       | 
       | My question is.
       | 
       | 1 - Even if there is so much data that I can no longer find
       | stuff, how much text data is needed to train an LLM to work ok?
       | Im not after an AI that could answer general question, only an AI
       | that should be able to answer what I already know exist in the
       | data.
       | 
       | 2 - I understand that the more structured the data are, the
       | better, but how important is it when training an LLM with
       | structured data? Does it just figuring stuff out anyways in a
       | good way mostly?
       | 
       | 3 - Any recommendation where to start, how to run an LLM AI
       | locally, train on your own data?
        
       | gavmor wrote:
       | Thanks for sharing! I look forward to playing with this once I
       | get off my phone. Took a look at the code, though, to see if
       | you've implemented any of the tricks I've been too lazy to try.
       | 
       | `text_splitter=RecursiveCharacterTextSplitter( chunk_size=8000,
       | chunk_overlap=4000)`
       | 
       | Does this simple numeric chunking approach actually work? Or are
       | more sophisticated splitting rules going to make a difference?
       | 
       | `vector_store_ppt=FAISS.from_documents(text_chunks_ppt,
       | embeddings)`
       | 
       | So we're embedding all 8000 chars behind a single vector index. I
       | wonder if certain documents perform better at this fidelity than
       | others. To say nothing of missed "prompt expansion"
       | opportunities.
        
       ___________________________________________________________________
       (page generated 2024-04-07 23:01 UTC)