[HN Gopher] MemoryCache: Augmenting local AI with browser data
___________________________________________________________________
MemoryCache: Augmenting local AI with browser data
Author : NdMAND
Score : 356 points
Date : 2023-12-12 16:56 UTC (6 hours ago)
(HTM) web link (future.mozilla.org)
(TXT) w3m dump (future.mozilla.org)
| Jayakumark wrote:
| Was just talking about this on reddit like two days ago
|
| Instead of data going to models, we need models come to our data
| which is stored locally and stay locally.
|
| While there are many OSS for Loading personal data, they dont do
| images or videos. In the future everyone may get their own Model
| but for now tech is there but product/OSS is missing for everyone
| to get their own QLORA or RAG or Summarizer.
|
| Not just messages/docs: What we read or write, and our thoughts
| are part of what makes an individual unique. Our browsing history
| tells a lot about what we read but no one seems to make use of it
| other than google for ads.. Almost everyone has a habit of
| reading x news site, x social network, x youtube videos etc.. Ok,
| here are the summary for you from these 3 today.
|
| Was just watching this yesterday
| https://www.youtube.com/watch?v=zHLCKpmBeKA and thought, why we
| still don't have a computer secretary like her after almost 30
| years, who is one step ahead of us.
| butz wrote:
| I assume that training LLMs locally require high-end hardware.
| Even running a model requires a decent CPU or, even better, a
| high end GPU, but it is not so expensive as training a model.
| And usually you have to use hardware that is available on the
| cloud, so not much of privacy here.
| cjbprime wrote:
| You don't need to train the model on your data: you can use
| retrieval augmented generation to add the relevant documents
| to your prompt at query time.
| butz wrote:
| Thank you for explanation. I see there is still a lot I
| have to learn about LLMs.
| Art9681 wrote:
| This works if the document plus prompt fit in the context
| window. I suspect the most popular task for this workflow
| is summary which presumably means large documents. That's
| when you begin scaling out to a vector store and
| implementing those more advanced workflows. It does work
| even by sending a large document on certain local models,
| but even with the highest tier MacBook Pro a large document
| can quickly choke up any LLM and bring inference speed to a
| crawl. Meaning, a powerful client is still required no
| matter what. Even if you generate embeddings in "real-time"
| and dump to a vector store that process would be slow in
| most consumers hardware.
|
| If you're passing in smaller documents then it works pretty
| good for real-time feedback.
| smcleod wrote:
| As someone else said you don't need to train any models, also
| - small LLMs (7b~) can run really well even on a base M1
| Macbook air from 3 years ago.
| simonw wrote:
| "While there are many OSS for Loading personal data, they dont
| do images or videos"
|
| Local models for images are getting pretty good.
|
| LLaVA is an LLM with multi-modal image capabilities that runs
| pretty well on my laptop:
| https://simonwillison.net/2023/Nov/29/llamafile/
|
| Models like Salesforce BLIP can be used to generate captions
| for images too - I built a little CLI tool far that here:
| https://github.com/simonw/blip-caption
| orbital-decay wrote:
| CogVLM blows LLaVA out of the water, although it needs a
| beefier machine (quantized low-res version barely fits into
| 12GB VRAM, not sure about the accuracy of that).
| cinntaile wrote:
| I have no actual knowledge in this area so I'm not sure if
| it's entirely relevant but an update from the 7th of
| December on the CogVLM repo says it now works with 11GB of
| VRAM.
| csbartus wrote:
| > Instead of data going to models, we need models come to our
| data which is stored locally and stay locally.
|
| That's the most important idea I've read since ChatGPT / last
| year.
|
| I'll wait for this. Then build my own private AI. And share it
| / pair it for learning with other private AIs, like a blogroll.
|
| As always, there will be two 'different' AIs: a.) the
| mainstream, centralized, ad/revenue-driven, capitalist,
| political, controlling / exploiting etc. b.) personal,
| trustworthy, polished on peer networks, fun, profitable for one
| / a small community.
|
| If by chance, commercial models will be better than open source
| models, due to better access to computing power / data, please
| let me know. We can go back to SETI and share our idle
| computing power / existing knowledge
| jakderrida wrote:
| > Our browsing history tells a lot about what we read but no
| one seems to make use of it other than google for ads.. Almost
| everyone has a habit of reading x news site, x social network,
| x youtube videos etc.. Ok, here are the summary for you from
| these 3 today.
|
| I was imagining something a little more ambitious. Like a model
| that uses our search history and behavior to derive how to best
| compose a search query. Bing Chat's search queries look like
| what my uncle would type right after I explained to him what a
| search engine is. Throw in some advanced operators like site:
| or filetype: or at least parentheses along with AND/OR. Surely,
| we can fine tune it to emulate the search processes of the most
| impressive researchers, paralegals, and teenagers on the
| spectrum that immediately factcheck your grandpop's Ellis
| Island story, with evidence he both arrived at first and was
| naturalized in Chicago.
| amelius wrote:
| Local compute is so 80s, when people moved away from dumb
| terminals and mainframes, to PCs.
| simondotau wrote:
| Yes but this time we call it "distributed computing"or "edge
| computing" instead.
| gpderetta wrote:
| remote computing is so late '90s when people moved away from
| PCs to servers (the dot in dot com).
|
| Turns out this sort of stuff is cyclical.
| conradev wrote:
| > Instead of data going to models, we need models come to our
| data which is stored locally and stay locally.
|
| We are building this over at https://software.inc! We collect
| data about you (from your computer and the internet) into a
| local database and then teach models how to use it. The models
| can either be local or cloud-based, and we can route requests
| based on the sensitivity of the data or the capabilities
| needed.
|
| We're also hiring if that sounds interesting!
| gardenhedge wrote:
| Wow, nice domain. I'd work there for the name alone haha.
| voakbasda wrote:
| Am I cynical thinking the opposite? I can't imagine they
| got that domain for a song. Spending a pile of cash on
| vanity such as that is a real turn off for me; it signals
| more flash than bang. Am I wrong to think this?
| thepra wrote:
| As fare as I can see it's just a MacOS image, nothing is
| happening
| herval wrote:
| site's pretty funny, but would likely be more useful with
| more information and less clicking-around-nostalgia 8-)
| smith7018 wrote:
| That's because the company is more or less in
| stealth/investigatory mode. It's the same team that built
| Workflow which was acquired by Apple and then turned into
| Shortcuts.
| pradn wrote:
| Yes, should have local models in addition to remote models.
| Remote ones are always going to be more capable, and we
| shouldn't throw that away. Augmentation is orthogonal - you can
| augment either of these with your own data.
| nullc wrote:
| Just having an archiver that gives you a tradition search over
| every webpage you've loaded-- forget the AI stuff, would be a
| major advance.
|
| I don't know about everyone but a majority of searches are for
| stuff I've seen before, and they're often frustrated by things
| that have gone offline or are downranked by search engines
| (e.g. old documentation on HTTP only sites) or burred by SEO.
| timenova wrote:
| I believe that's exactly what GitHub Copilot does. It first
| scans and indexes your entire codebase including dependencies
| (I think). So when it auto-completes, it heavily uses the
| context of your code, which actually makes Copilot so useful.
|
| You're absolutely right about models coming to our data! If we
| could have Copilot-like intelligence, completely on-device,
| scanning all sorts of personal breadcrumbs like messages,
| browsing history, even webpage content, it would be a game-
| changer!
| bloopernova wrote:
| Regarding PrivateGPT, if I have a 12GB Nvidia 4070 and an 11GB
| 2080ti, which LLM should I run?
|
| Edited to add: https://www.choosellm.com/ by the PrivateGPT folks
| seems to have what I needed.
| SkyMarshal wrote:
| There's a big community discussing exactly that over at
| https://www.reddit.com/r/LocalLLaMA/.
| smcleod wrote:
| +1 r/localllama, 23GB should allow you to run 30b~ models,
| but honestly some of the new smaller models such as Mistral &
| friends (Zephyr etc..) are really interesting. You could also
| Give Mixtral a try if you get a low quant format such as this
| q3 - https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0
| .1-G...
| avallach wrote:
| PrivateGPT repository in case anyone's interested:
| https://github.com/imartinez/privateGPT . It doesn't seem to be
| linked from their official website.
| nightski wrote:
| It's linked from the MemoryCache repo listed at the bottom of
| the article: https://github.com/Mozilla-Ocho/Memory-Cache
| linsomniac wrote:
| I would sure love a way to "chat" with my browsing history and
| page content. Is there any way to automatically save off pages
| that I've visited for later processing? I looked a decade or more
| ago and didn't really find a good solution.
| kaynelynn wrote:
| Rewind.ai is pretty much this - I just installed it and am very
| happy so far.
| Alifatisk wrote:
| Isn't it Apple devices only?
| thekevan wrote:
| "Coming soon to Windows"
|
| https://www.rewind.ai/windows
| emptysongglass wrote:
| Just need a Linux version or an open source alternative
| now
| solarkraft wrote:
| I think WorldBrain (https://github.com/WorldBrain/Memex)
| promises this. While I'm also excited by the idea, I think
| there was some reason I ended up not using it.
| wizardwes wrote:
| Zotero _might_ work, but only as a highly imperfect solution,
| since it is more focused on research
| rhn_mk1 wrote:
| Check out Recoll with Recoll-WE https://addons.mozilla.org/en-
| US/firefox/addon/recoll-we/
| Dwedit wrote:
| Very misleading name. The word "Memory" has a distinct meaning in
| relation to computing, but this is more about human memories.
| Sai_ wrote:
| I was going to ignore this as a troll comment because computer
| memory has its antecedents in human memory but the commenter is
| right - the combination of memory and cache to talk about human
| memory seems misleading.
| CollinEMac wrote:
| I'm confused by the example they gave.
|
| > What is the meaning of a life well-lived?
|
| Is the response to this based on browser data? Based on the
| description I was expecting queries more like:
|
| > What was the name of that pdf I downloaded yesterday?
|
| > What are my top 3 most visited sites?
|
| > What type of content do I generally interact with?
| ipaddr wrote:
| That information is already available. You want a better search
| interface.
| ape4 wrote:
| They could private a local url called "about:wrapped" that
| gives a summary of your usage like Spotify Wrapped. The top
| 100 sites, you can click on a site for more info like what
| pages did you visit, when, how often, etc.
| lacker wrote:
| Yes, exactly, I want a search interface that's an LLM instead
| of a bunch of menus.
| atomicUpdate wrote:
| One thing you'll see from a lot of these LLM examples and demos
| are intentionally subjective queries, so they can't be judged
| on pass/fail criteria.
|
| For example, you'll see things like "where should I visit in
| Japan?" or "how should I plan a bachelor party?", because they
| are a huge variety of answers that are all "correct",
| regardless of how much you disagree with them. There is also a
| huge number of examples from them to draw from, especially
| compared to something as specific as your browsing history.
| reqo wrote:
| What I would love to see is this model being able to learn ti
| automate some tasks that I usually do! e.g. sign up for
| events/buy tickets etc! If this has access to your login details,
| and could log in, it could be a great assistant!
| redblacktree wrote:
| Or shoe bot
| candiddevmike wrote:
| Teach it to press the skip ad button
| lacker wrote:
| Or it could click "hide" on cookie banners for me!
| k1t wrote:
| They actually already added this, but it's still in a
| limited trial phase.
|
| https://support.mozilla.org/en-US/kb/cookie-banner-
| reduction
| jml7c5 wrote:
| I hope this encourages Mozilla to focus more on page archiving
| support on the web. I feel as though they missed a huge
| opportunity by not making it easy to archive pages with DOM
| snapshots, or easy to snag videos or images. (Go to Instagram and
| try to right-click -> download the image; you can't.) Would have
| been a very good way to differentiate from Chrome, as Google
| wouldn't want that available for Youtube. And "our browser can
| download videos and images from anywhere" is pretty easy to sell
| for potential users.
| eigenvalue wrote:
| Agree, it seems like it's insanely hard to back up a modern JS-
| enabled web page in a usable way that results in a single file
| which can be easily shared.
| nekitamo wrote:
| Have you tried SingleFile? It sounds like what you're looking
| for:
|
| https://github.com/gildas-lormeau/SingleFile
| eigenvalue wrote:
| Will check it out, thanks.
| BlueTemplar wrote:
| I'm baffled that the support of single file, offline HTML is
| still so bad today :
|
| https://www.russellbeattie.com/notes/posts/the-decades-long-...
|
| (I'm suspecting because this goes against the wants of some of
| the biggest players who have the incentive of making us leave
| as many online footprints as possible ?)
|
| Even here, Mozilla recommends converting to PDF for easier
| (?!?) human readability. Except PDF is a very bad format for
| digital documents, with no support for reflow and very bad
| support of multimedia. (PDF is perhaps good for archival of
| _offline_ documents, even despite its other issues).
| Dwedit wrote:
| "Save Page WE" will capture a DOM snapshot to a single HTML
| file. The only problem is that Data URLs encoded using Base64
| are highly bloated.
| yeukhon wrote:
| Maybe it is just me, since I lived through the Firefox OS era as
| a past intern: this feels like a possible re-entrance of offering
| a Mozilla-built OS in the future. They said Internet was born to
| connect people - but building everything into a browser is not
| the most optimal way of adding all these fancy stuff. Firefox OS
| was basically a small linux kernel plus Gecko plus HTML5 for
| rendering. So much like iOS and iPadOS Mozilla could offer
| similar OS for devices/platform. I mean, for the past 5 years
| they have been invested in AR and VR. So I won't be surprised if
| they eventually bet on another Firefox OS...
| orbital-decay wrote:
| Classic bookmarks have failed because mnemonic organization
| doesn't scale. This kind of interface does, and can replace it
| entirely if done right.
|
| Thinking of it, something like this can be used for all your
| local files as well, acting as a better version of the old
| filesystem-as-a-database idea. Or for a specific knowledge base
| (think LLM-powered Zotero).
| AureliusMA wrote:
| Something like Orbit would be perfect
|
| https://withorbit.com/
| wintogreen74 wrote:
| Sounds like you just invented the modern version of Windows
| Longhorn
| danielovichdk wrote:
| My usage with browsing is not relevant for this. I don't want to
| "chat" with my browsin g history. I would simply love my browser
| would index my bookmarks on my OS so I could search the actual
| content of those bookmarks.
|
| The feedback loop coming gained from chatgtp will I assume always
| be way better than my local gpt equivalent.
|
| But often I bookmark pages where I know the information on there
| are important enough for me to come back to more than once.
|
| So I have started crafting out a solution for this. It crawls
| your bookmarks on your local browser storage, downloads those
| pages and adds them to your search index on your OS.
|
| That's been an itch for me for years.
| groestl wrote:
| Small data sets suffer from bad recall in full text search. So
| a bit of smart fuzzyness added to the search by AI could
| improve the experience on locally indexed bookmarks quite well.
| jval43 wrote:
| Didn't Chrome do this at the very beginning, when it was
| initially released? I faintly remember that being a feature.
|
| Personally I would already be content if my browsers didn't
| forget their history all the time, both Firefox and Safari
| history is way too short-lived.
| overstay8930 wrote:
| Isn't this just Safari? *using a modern chip
| politelemon wrote:
| Did you mean to link to a forked repo?
|
| https://memorycache.ai/developer-blog/2023/11/30/we-have-a-w...
|
| links to https://github.com/misslivirose/Memory-Cache
|
| but did you mean https://github.com/Mozilla-Ocho/Memory-Cache
| no_time wrote:
| Good idea. Mozilla gets a lot of rightful hate for their
| mishandling of FF and their political preaching, but I believe
| they are still capable of developing tech that is both privacy
| preserving and user friendly at the same time.
|
| I use the offline translator built into FF regularly and It's
| magic. I would've never thought something like that can run
| locally, without a server park worth of hardware thrown at it.
|
| Here's hoping this experiment turns out the same way.
| pixxel wrote:
| Well said; I agree wholeheartedly.
| lofaszvanitt wrote:
| What is happening at Firefox is quite strange. Like they are
| walking backwards.
| smcleod wrote:
| This seems like a sensible step in the right direction IMO,
| (optional) features such as local, privacy respecting LLMs will
| help to augment peoples online research, bookmarking,
| contextual search etc....
|
| It's important that we have Firefox working on such experiments
| otherwise as Google adds more of their privacy invading
| features to chrome / chromium it will likely impact negatively
| on peoples desire to find alternative browsers.
| lofaszvanitt wrote:
| Yeah, but maybe, if you are constantly losing market share...
| maybe you should work on things that appeals to a wider
| audience. Except if you have a trump card and intend to use
| it as a deus ex machina to suddenly show people you are THE
| browser, the way forward.
| nullc wrote:
| You don't gain market share by doing the same stuff the
| other FREE alternative does.
|
| You gain market share by doing what they refused to do, no
| matter how much it's in the user's interest, because their
| business is stealing the user's data and yours isn't.
| ath3nd wrote:
| They might be onto something here.
|
| Instead of doing lots of back-n-forth with the giants, enriching
| them with each prompt, you get a smaller local model that's much
| more respectful of your privacy.
|
| That's an operating model I am willing to do some OSS
| contributions to, or even bankroll.
|
| Gotta love the underdogs, even if admittedly, I am not a big
| Mozilla org fan.
| altairprime wrote:
| It's what Apple's been doing for a few years, though it remains
| unclear how much of that is "AI". So it makes sense that
| someone else would enter that niche.
| visarga wrote:
| In the future their AIs are going to talk to our AIs. Because
| we need protection.
| SpaceManNabs wrote:
| This seems completely overkill.
|
| I don't even like having to clear my history and wtv regularly. I
| use incognito mode most times.
|
| Now I have monitor what my local AI collects?
|
| "through the lens of privacy" my ass, man.
|
| Why would I ask my browser what the meaning of a life well lived
| is?
| nektro wrote:
| you're better than this mozilla. hopping on the ai trend is
| disgusting given your alleged morals
| tesdinger wrote:
| I wish they would fix basic features such as downloading pictures
| on Firefox for Android. Often long pressing the image on your
| screen opens a context menu that does not allow download, only
| following the link associated with the image.
| huy77 wrote:
| so this is how growth hacking look like, building a landing page
| for a imaginary product to test market-fit idea?
| ChrisArchitect wrote:
| Could barely get a sense of what any of this meant from the
| shared link.
|
| Went back a bit further/to the official site:
|
| > _MemoryCache is an experimental development project to turn a
| local desktop environment into an on-device AI agent._
|
| Okayy...
|
| And this from November
|
| _Introducing Memory Cache_
|
| https://memorycache.ai/developer-blog/2023/11/06/introducing...
___________________________________________________________________
(page generated 2023-12-12 23:00 UTC)