[HN Gopher] Show HN: Khoj - Chat offline with your second brain ...
___________________________________________________________________
Show HN: Khoj - Chat offline with your second brain using Llama 2
Hi folks, we're Debanjum and Saba. We created Khoj as a hobby
project 2+ years ago because: (1) Search on the desktop sucked; we
just had keyword search on the desktop vs google for the internet;
and (2) Natural language search models had become good and easy to
run on consumer hardware by this point. Once we made Khoj search
incremental, I completely stopped using the default incremental
search (C-s) in Emacs. Since then Khoj has grown to support more
content types, deeper integrations and chat (using ChatGPT). With
Llama 2 released last week, chat models are finally good and easy
enough to use on consumer hardware for the chat with docs scenario.
Khoj is a desktop application to search and chat with your personal
notes, documents and images. It is accessible from within Emacs,
Obsidian or your Web browser. It works with org-mode, markdown,
pdf, jpeg files and notion, github repositories. It is open-source
and can work without internet access (e.g on a plane). Our chat
feature allows you to extract answers and create content from your
existing knowledge base. Example: _" What was that book Trillian
mentioned at Zaphod's birthday last week"_. We personally use the
chat feature regularly to find links, names and addresses
(especially on mobile) and collate content across multiple, messy
notes. It works online or offline: you can chat without internet
using Llama 2 or with internet using GPT3.5+ depending on your
requirements. Our search feature lets you quickly find relevant
notes, documents or images using natural language. It does not use
the internet. Example: Search for _" bought flowers at grocery
store"_ will find notes about _" roses at wholefoods"_.
Quickstart: pip install khoj-assistant && khoj
See https://docs.khoj.dev/#/setup for detailed instructions We
also have desktop apps (in beta) at https://github.com/khoj-
ai/khoj/releases/tag/0.10.0 if you want to try them out. Please do
try out Khoj and let us know if it works for your use cases?
_Looking forward to the feedback!_
Author : 110
Score : 210 points
Date : 2023-07-30 17:14 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| mlajtos wrote:
| Have anyone got something valuable from talking to your second
| brain? What kind of conversations are you trying to have?
| bozhark wrote:
| Traumatic Brain Injury. I can't remember yesterday.
|
| Would be hella nice to connect all the scattered lines of
| thoughts in various notes on a variety of platforms.
| andai wrote:
| If you're on mac I would strongly recommend Notational
| Velocity (or the Alt version), if they still run (I know
| Apple likes to break compatibility).
|
| I've tried dozens of notetaking apps and that's the only one
| that truly felt like a second brain.
|
| It's because of the speed. Infuriatingly, Obsidian for
| example can search just as fast, but they intentionally
| programmed in a lag after each keystroke... (I know because I
| removed it.)
| mandmandam wrote:
| Dear Lord, why would they do such a thing. I think I've
| experienced this, and decided I hated Obsidian because it
| made my computer feel slow (it's not).
| nmarinov wrote:
| > they intentionally programmed in a lag after each
| keystroke
|
| Yeah, it's seems they've added a debounce. I'd prefer to
| set it to 0ms as well. Do you remember how you removed it?
| mlajtos wrote:
| I am sorry.
|
| Would some summary of previous day would be helpful to you?
| Is your memory problem only episodic, or does it extend to
| factual and kinesthetic as well?
| samstave wrote:
| I want a body cam that I wear and it transcribes into
| something searchable from things I did...
|
| Basically like a gopro on steroids with searchable context
| - or even the ability for me to say outloud "KEEP A NOTE OF
| THIS" and it will keep a segment tagged and can give me
| summaries of moments I wanted particularly logged...
|
| I applied to YC with an idea 'sorta' like this almost a
| decade ago.
|
| The idea was to have a timeline of communications between
| all my contacts such that I could side-scroll a timeline
| with dots of actions such a "sent email" "made call" "sent
| text" received txt" and I could see all these in filters by
| contacts/day whatever...
|
| This was pre-snowden, so I didnt have confirmation that
| there were already people doing this for me, just not
| letting me browse my own data ;-)
| sabakhoj wrote:
| I quite like this concept. It would be neat if you could
| relay the data to a personal server for processing and
| insight extraction. Seems feasible with phone camera. I
| think gopros would be limited based on battery life (in
| my experience).
| andai wrote:
| Yeah, it bugs me that I don't know where I was a year ago
| but my phone company does.
|
| Can I get that via GDPR? Has anyone tried?
|
| For Android users a more straightforward option is
| location history, but you should probably turn that off.
| 110 wrote:
| Wow that's an intense use-case. I don't know how but we'd
| love to be able to support this.
|
| If you can collate your notes into markdown or some such,
| then messy notes can be handled, at least using Khoj with
| GPT3.5+.
|
| Do let us know how we can help out and what your current
| biggest pain-points are?
| andai wrote:
| If you're on windows check out TimeSnapper. The classic
| version is free and works fine.
|
| It screencaps your desktop every 5 sec so you can watch a
| timelapse of how you spent your day. (Assuming it was on the
| computer!)
|
| I did find it heavy on the disk usage so I wrote a ffmpeg
| script to convert it to video (much more efficient).
| spdustin wrote:
| I see you're using gpt4all; do you have a supported way to change
| the model being used for local inference?
|
| A number of apps that are designed for OpenAI's completion/chat
| APIs can simply point to the endpoints served by llama-cpp-python
| [0], and function in (largely) the same way, while using the
| various models and quants supported by llama.cpp. That would
| allow folks to run larger models on the hardware of their choice
| (including Apple Silicon with Metal acceleration or NVIDIA GPUs)
| or using other proxies like openrouter.io. I enjoy openrouter.io
| myself because it supports Anthropic's 100k models.
|
| [0]: https://github.com/abetlen/llama-cpp-python
| 110 wrote:
| No, we don't yet. Lots of developer folks want to try different
| models, we want to provide simple to use, but deep assistance.
| Kind of unsure what to focus on given our limited resources.
| vunderba wrote:
| I really like the idea of running a dedicated server that
| serves up various large language models via a standardized
| API, and then Khoj could just be pointed at one. Depending on
| the notes and the type of conversation I want to have, that
| would even allow for Khoj to swap models on the fly.
| syntaxing wrote:
| The point of gpt4all is that you can change the model with
| minimal breaking. You should be able to change this line
| https://github.com/khoj-ai/khoj/blob/master/src/khoj/process...
| to the model you want. You'll need to build your own local
| image with docker-compose but should be relatively straight
| forward.
| LanternLight83 wrote:
| It's funny that you mention `C-s`, because `isearch-forward` is
| usually used for low-latency literal matches. In what workflow
| can Khoj offer acceptable latency or superior utility as a drop-
| in replacement for isearch? Is there an example of how you might
| use it to navigate a document?
| 110 wrote:
| That's (almost) exactly what khoj search provides a search-as-
| you-type experience but with a natural language (instead of
| keyword) search interface.
|
| My workflow looks like: 1. Search with Khoj search[1]: `C-c s
| s` <search-query> RET 2. Use speed key to jump to relevant
| entry[2]: with `n n o 2`
|
| [1]: `C-c s` is bound to `khoj` transient menu [2]
| https://orgmode.org/manual/Speed-Keys.html
| umanwizard wrote:
| Markdown doesn't work on HN...
| asynchronous wrote:
| This is very cool, the Obsidian integration is a neat feature.
|
| Please, someone make a home-assistant Alexa clone for this.
| 110 wrote:
| Thanks!
|
| We've just been testing integrating over voice, whatsapp over
| the last few days[1][2] :)
|
| [1]: https://github.com/khoj-ai/khoj/tree/khoj-chat-over-
| whatsapp...
|
| [2]: https://github.com/khoj-
| ai/khoj/compare/master...features/wh...
| jigneshdarji91 wrote:
| This would be even great if available as a Spotlight Search
| replacement (with some additional features that Spotlight
| supports).
| tough wrote:
| Should be easy to plug it in with a Raycast.app or Alfred.app
| plugin.
| IshKebab wrote:
| Interesting. The obvious question you haven't answered anywhere
| (as far as I can see) is what are the hardware requirements to
| run this locally?
| 110 wrote:
| Ah, you're right, forgot to mention that. We use the Llama 2 7B
| 4 bit quantized model. The machine requirements are:
|
| Ideal: 16Gb (GPU) RAM
|
| Less Ideal: 8GB RAM and CPU
| RomanHauksson wrote:
| Awesome work, I've been looking for something like this. Any
| plans to support Logseq in the future?
| 110 wrote:
| Yes, we hope to get to it soon! This has been an ask on our
| Github since a while[1]
|
| [1]: https://github.com/khoj-ai/khoj/issues/141
| ramesh31 wrote:
| Something I've noticed playing around with Llama 7b/13b on my
| Macbook is that it clearly points out just how little RAM 16GB
| really is these days. I've had a lot of trouble running both
| inference and a web UI together locally when browser tabs take up
| 5GB alone. Hopefully we will see a resurgence of lightweight
| native UIs for these things that don't hog resources from the
| model.
| Kwpolska wrote:
| Or hopefully we will see an end of the LLM hype.
|
| Or at least models that don't hog so much RAM.
| ramesh31 wrote:
| >Or at least models that don't hog so much RAM
|
| The RAM usage is kind of the point though; we're trading
| space for time. It's not a problem that the model is using
| it, it's just that with the default choice for UI being web
| based now, the unnecessary memory usage of browsers is
| actually starting to be a real pain point.
| 110 wrote:
| 1. I hear you on going back to lightweight native apps.
| Unfortunately the Python ecosystem is not great for this.
| We use pyinstaller to create the native desktop app but
| it's a pain to manage.
|
| 2. The web UI isn't required if you use Obsidian or Emacs.
| That's just a convenient, generic interface that everyone
| can use.
| overnight5349 wrote:
| Could this do something like take in the contents of my web
| history for the day and summarize notes on what I've been
| researching?
|
| This is getting very close to my ideal of a personal AI. It's
| only gonna be a few more years until I can have a digital brain
| filled with everything I know. I can't wait
| usehackernews wrote:
| Interesting, this is the exact question that came to mind for
| me. This would address a pain point for me.
|
| Does anyone have recommendations for a tool that does it?
|
| Or, anyone want to build it together?
| agg23 wrote:
| Just a heads up, your landing page on your website doesn't seem
| to mention Llama/the offline usecase at all, only online via
| OpenAI.
|
| ----
|
| What model size/particular fine-tuning are you using, and how
| have you observed it to perform for the usecase? I've only
| started playing with Llama 2 at 7B and 13B sizes, and I feel
| they're awfully RAM heavy for consumer machines, though I'm
| really excited by this possibility.
|
| How is the search implemented? Is it just an embedding and vector
| DB, plus some additional metadata filtering (the date commands)?
| 110 wrote:
| Thanks for the pointer, yeah the website content has gone
| stale. I'll try update it by end of day
|
| Khoj is using the Llama 7B, 4bit quantized, GGML by TheBloke.
|
| It's actually the first offline chat model that gives coherent
| answers to user queries given notes as context.
|
| And it's interestingly more conversational than GPT3.5+, which
| is much more formal
| agg23 wrote:
| Oh interesting, so you're not using Llama 2, you're using the
| original. Have you begun to evaluate Llama 2 to determine the
| differences in performance?
|
| How are you determining what notes (or snippets of notes?) to
| be injected as context? Especially given the small 2048
| context limit with Llama 1.
| sabakhoj wrote:
| Quick clarification, we _are_ using LlamaV2 7B. We didn 't
| experiment with Llama 1 because we weren't sure of the
| licensing limitations.
|
| We determine note relevance by using cosine similarity
| between the query and the knowledge base (your note
| embeddings). We limit the context window for Llama2 to 3
| notes (while OpenAI might comfortably take up to 9). The
| notes are ranked based on most to least similar and
| truncated based on the context window limit. For the model
| we're using, we're still limited to 2048 tokens for Llama
| v2.
| jmorgan wrote:
| This is a super cool project. Congrats! If you're looking at
| trying different models with one API check out an open-source
| project a few folks and I have been working on in July in
| case it's helpful https://github.com/jmorganca/ollama
|
| Llama 2 gives great answers, even the 7B model. There's an
| "uncensored" 7B version as well George Sung has fine-tuned
| for topics that the default Llama2 model won't discuss - eg I
| had trouble having Llama2 review authentication/security code
| or topics:
| https://huggingface.co/TheBloke/llama2_7b_chat_uncensored-
| GG...
|
| From just playing around with it the uncensored model still
| seems to know where to "draw the line" on sensitive topics
| but YMMV
|
| If you do end up checking out Ollama you can try it with with
| this command or there's an API too (it's not in the docs yet)
| ollama run llama2-uncensored
| matmulbro wrote:
| [flagged]
| isoprophlex wrote:
| Please don't post low effort, shallow dismissals; without
| substantiation you're not posting anything useful, you're just
| a loud asshole.
| coder543 wrote:
| This seems like a cool project.
|
| It would be awesome if it could also index a directory of PDFs,
| and if it could do OCR on those PDFs to support indexing scanned
| documents. Probably outside of the scope of the project for now,
| but just the other day I was just thinking how nice it would be
| to have a tool like this.
| 110 wrote:
| Yeah being able to search and chat with PDF files is quite
| useful.
|
| Khoj can index directory of PDFs for search and chat. But it
| does not currently work with scanned PDF files (i.e not with
| ones without selectable text).
|
| Being able to work with those would be awesome. We just need to
| get to it. Hopefully soon
| adr1an wrote:
| Check pdftotext it's a CLI tool (maybe a library too) that
| makes pdf text selectable. Oh sorry, I meant to say ocrmypdf.
| But hey, maybe it's worth checking both.
| samstave wrote:
| Ive wanted a crawler on my machine for auto-categorizing and
| organizing, tagging and moving ALL my files around based on all
| my machines - so the ability to crawl PDFs, downloads,
| screenshots, pictures, etc and give me a logical tree of the
| org of the files - and allow me to modify it by saying "add all
| PDF related to [subject] here and the organize by source/author
| etc... and then move all my screenshots, ordered by date here
|
| etc...
|
| I've wanted a "COMPUTER.", uh... I say "COMPUTER!", ' _sir, you
| have to use the keyboard_ ', ah a Keyboard, how quaint....
| forever.
| 110 wrote:
| That.would.be.awesome! Khoj isn't their yet, but that
| actually shouldn't be too far away if you give it a voice
| interface and terminal access.
|
| Of course, having it be stable enough to not `rm -rf /` soon
| after is definitely not part of the warranty
| mmanfrin wrote:
| As someone who's been getting int o using Obsidian and messing
| around with chat ais, this is excellent, thank you!
| tarwin wrote:
| Really encourages me to move to Obsidion :D
| 110 wrote:
| Thanks! Do try it out and let us know if it works for your use-
| case?
| Ilnsk wrote:
| [flagged]
| tudorw wrote:
| hi, you seem keen to share something neat you took less than 10
| minutes to implement, I'd love to see that?
| 110 wrote:
| 2.5 years! We're kind of slow :P
| calnayak wrote:
| How does one access this from a web browser?
| wg0 wrote:
| I have not tried it but something like this should exist. I don't
| think it is going to be as useable on consumer hardware as yet
| unless you have a good enough GPU but within couple of years (or
| less), we'll be there I am sure.
|
| Irrelevant opinion - The logo is beautiful, I like it and so are
| the colours used.
|
| Lastly, LLMA2 for such use cases, I think is capable enough that
| paying for ChatGPT won't be as lucrative especially when privacy
| is of concern.
|
| Keep it up. Good craftsmanship. :)
| sabakhoj wrote:
| Thanks! I do think Llama V2 is going to be a good enough
| replacement for ChatGPT (aka GPT3.5) for a lot of use cases.
| bozhark wrote:
| I'm not a software dev.
|
| Is there a way to have this bot read from a discord and google
| drive?
| syntaxing wrote:
| gpt4all itself (the library on the backend for this) has a
| similar program [1]. You just need to put everything into a
| folder. This should be straight forward for google drive.
| Harder for discord though but I'm sure theres a bot online that
| can do the extraction.
|
| [1] https://gpt4all.io/index.html
___________________________________________________________________
(page generated 2023-07-30 23:00 UTC)