[HN Gopher] Show HN: IncarnaMind-Chat with your multiple docs us...
       ___________________________________________________________________
        
       Show HN: IncarnaMind-Chat with your multiple docs using LLMs
        
       Author : joeyxiong
       Score  : 25 points
       Date   : 2023-09-15 19:32 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | smcleod wrote:
       | Only supports private / closed LLMs like OpenAI and Claud. People
       | need to design for local LLM first, then for-profit providers.
        
         | joeyxiong wrote:
         | Yeah, This can definatly be used for local models, but the
         | problem is that most personal computers cannot host large LLMs
         | and the cost is not cheaper than closed LLMs. But for
         | organisations, local LLMs are a better choice.
        
       | gsuuon wrote:
       | Those diagrams are nice! What did you use to make them? The
       | sliding window mechanic is interesting but I'm not seeing how the
       | first, second and third retrievers relate. Only the final medium
       | chunks are used, but how are those arrived at?
        
         | joeyxiong wrote:
         | Hi, I created the diagrams using Figma.
         | 
         | The retrieval process consists of three stages. The first stage
         | retrieves small chunks from multiple documents to create a
         | document filter using their metadat. This filter is then
         | applied in the second stage to extract relevant large chunks,
         | essentially sections of documents, which further refines our
         | search parameters. Finally, using both the document and large
         | chunk filters, the third stage retrieves the most pertinent
         | medium-sized chunks of information to be passed to the Language
         | Model, ensuring a focused and relevant response to your query.
        
       | skeptrune wrote:
       | Can we talk about how dynamic chunking works by any chance? That
       | is the most interesting piece imo.
       | 
       | We have a similar thing (w/ UIs for search/chat) at
       | https://github.com/arguflow/arguflow .
       | 
       | - nick@arguflow.gg
        
       | all2 wrote:
       | A team where I work recently rolled out a doc-answer LLM and
       | context was an issue we ran into. Retrieved doc chunks didn't
       | have nearly enough context to answer some of the broader
       | questions well.
       | 
       | Another issue I've run into with doc-answer LLMs is that they
       | don't handle synonyms well. If I don't know the terminology for
       | the tool, say llama-index [0], I can't ask around the concept to
       | see if something _like_ what I 'm describing exists.
       | 
       | A part of me thinks a lang-chain with the LLM in it might be
       | useful.
       | 
       | Something like
       | 
       | 1. User makes vague query "hey, llama-index, how do I create a
       | moving chunk answer thing with llama-index?"
       | 
       | 2. Initial context comes back to the LLM, and the LLM determines
       | there is not straight forward answer to the question.
       | 
       | 2a. The LLM might ask followup questions "when you say X, what do
       | you mean?" to clarify terms it doesn't have ready answers for.
       | 
       | 2b. The LLM says "hm, let me think about that. I'll email you
       | when I have a good answer."
       | 
       | 2c. The LLM reads the docs and relevant materials and attempts to
       | solve the problem.
       | 
       | 3. Email the user with a potential answer to the question.
       | 
       | 4. Stashes the solution text in the docs if the user OKs the
       | plan. Updates an embedding table to include words/terms used that
       | the docs didn't contain.
       | 
       | This last step is the most important. Some kind of method to
       | capture common questions and answers, synonyms, etc. would ensure
       | that the model has access to (potentially) increasingly robust
       | information.
        
         | sergiotapia wrote:
         | You can have a pre-qualification step to qualify the answer
         | into several highly specific categories. These categories have
         | highly tailored context that allow much better answers.
         | 
         | Of course, you can only generate these categories once you see
         | what kind of questions your users ask, but this means your
         | product can continuously improve.
        
       | pstorm wrote:
       | I'm impressed by your chunking and retrieval strategies. I think
       | this aspect is often overly simplistic.
       | 
       | One aspect I don't quite understand is why you filter by the
       | sliding window chunks vs just using the medium chunks? If I
       | understand it correctly, you find the large chunks that contain
       | the matched small chunks from the first retrieval. Then in the
       | third retrieval, you are getting the medium chunks that comprise
       | the large chunks? What extra value does that provide?
        
         | joeyxiong wrote:
         | Thank you for your comment. The sliding window approach allows
         | me to dynamically identify relevant "large chunks," which can
         | be thought of as sections in a document. Often, your questions
         | may pertain to multiple such sections. Using only medium chunks
         | for retrieval could result in sparse or fragmented information.
         | 
         | The third retrieval focuses on "medium chunks" within these
         | identified large chunks. This ensures that only the most
         | relevant information is passed to the Language Model, enhancing
         | both time efficiency and focus. For example, if you're asking
         | for a paper summary, I can zero in on medium chunks within the
         | Abstract, Introduction, and Conclusion sections, eliminating
         | noise from other irrelevant sections. Additionally, this
         | strategy helps manage token limitations, like GPT-3.5's
         | 4000-token cap, by selectively retrieving information
        
           | pstorm wrote:
           | Ah I see! So, the large/sliding window chunks act as a pre-
           | filter for the medium chunks. That makes a lot of sense. I
           | appreciate the response
        
       | dilap wrote:
       | I feel like an LLM trained on Slack could be something like the
       | perfect replacement for trying to maintain docs.
        
       | SamBam wrote:
       | Testing it out. I'm getting an error after I added my pdfs to the
       | data directory and then ran                   % python docs2db.py
       | Processing files:   6%             Traceback (most recent call
       | last):               File "[...]/IncarnaMind/docs2db.py", line
       | 179, in process_metadata               file_name =
       | doc[0].metadata["source"].split("/")[-1].split(".")[0]
       | IndexError: list index out of range
        
         | joeyxiong wrote:
         | Hi, I've pushed the new commit to the main branch. Could you
         | please test it out? If it still has this error, you can check
         | if your doc has relevant metadata.
         | 
         | ```` for d in doc: print("metadata:", d.metadata) ```
         | 
         | before file_name =
         | doc[0].metadata["source"].split("/")[-1].split(".")[0]
        
       | SamBam wrote:
       | This looks awesome, and really useful.
       | 
       | A few weeks ago I asked in Hacker News "I'm in the middle of a
       | graduate degree and am reading lots of papers, how could I get
       | ChatGPT to use my whole library as context when answering
       | questions?"
       | 
       | And I was told, basically, "It's really easy! Just First you just
       | extract all of the text from the PDFs into arxiv, parse to
       | separate content from style, then store that in a a DuckDB
       | database, with zstd compression, then just use some encoder model
       | to process all of these texts into Qdrant database. Then use
       | Vicuna or Guanaco 30b GPTQ, with langcgain, and....."
       | 
       | I was like, ok... guess I won't be asking ChatGPT where I can
       | find which paper talked about which thing after all.
        
         | jarvist wrote:
         | https://github.com/whitead/paper-qa
         | 
         | >This is a minimal package for doing question and answering
         | from PDFs or text files (which can be raw HTML). It strives to
         | give very good answers, with no hallucinations, by grounding
         | responses with in-text citations.
        
         | skeptrune wrote:
         | I don't know why you need the "ask chatGPT" piece. Why not just
         | semantic search on the documents?
         | 
         | What is the value add of generative output?
        
           | all2 wrote:
           | I think the value is "Hey, I remember a paper talking X topic
           | with Y sentiment, it also mentioned data from <vague source>.
           | Which paper was that?"
           | 
           | If you're dealing with 100s of papers, then having a front
           | end that can deal with vague queries would be a huge benefit.
        
             | skeptrune wrote:
             | You could just write "X topic with Y sentiment similar to
             | foo/<vague-source>" into a search bar.
             | 
             | Then, plain old vector distance on your data would find the
             | chunks relevant. No need for generative AI.
             | 
             | citation to prove this works: chat.arguflow.ai /
             | search.arguflow.ai
        
       ___________________________________________________________________
       (page generated 2023-09-15 23:00 UTC)