[HN Gopher] Launch HN: LlamaFarm (YC W22) - Open-source framewor...
       ___________________________________________________________________
        
       Launch HN: LlamaFarm (YC W22) - Open-source framework for
       distributed AI
        
       Hi HN! We're Rob, Matt, and Rachel from LlamaFarm
       (https://llamafarm.dev). We're building an open-source AI framework
       based on a simple belief: the future isn't one massive model in the
       cloud--it's specialized models running everywhere, continuously
       fine-tuned from real usage.  The problem: We were building AI tools
       and kept falling into the same trap. AI demos die before
       production. We built a bunch of AI demos but they were impossible
       to get to production. It would work perfectly on our laptop, but
       when we deployed it, something broke, and RAG would degrade. If we
       were running our own model, it would quickly become out of date.
       The proof-of-concept that impressed the team couldn't handle real-
       world data.  Our solution: declarative AI-as-code. One YAML defines
       models, policies, data, evals, and deploy. Instead of one brittle
       giant, we orchestrate a Mixture of Experts--many small, specialized
       models you continuously fine-tune from real usage. With RAG for
       source-grounded answers, systems get cheaper, faster, and
       auditable.  There's a short demo here:
       https://www.youtube.com/watch?v=W7MHGyN0MdQ and a more in-depth one
       at https://www.youtube.com/watch?v=HNnZ4iaOSJ4.  Ultimately, we
       want to deliver a single, signed bundle--models + retrieval +
       database + API + tests--that runs anywhere: cloud, edge, or air-
       gapped. No glue scripts. No surprise egress bills. Your data stays
       in your runtime.  We believe that the AI industry is evolving like
       computing did. Just as we went from mainframes to distributed
       systems and monolithic apps to microservices, AI is following the
       same path: models are getting smaller and better. Mixture of
       Experts is here to stay. Qwen3 is sick. Llama 3.2 runs on phones.
       Phi-3 fits on edge devices. Domain models beat GPT-5 on specific
       tasks.  RAG brings specialized data to your model: You don't need a
       1T parameter model that "knows everything." You need a smart model
       that can read _your_ data. Fine-tuning is democratizing: what cost
       $100k last year now costs $500. Every company will have custom
       models.  Data gravity is real: Your data wants to stay where it is:
       on-prem, in your AWS account, on employee laptops.  Bottom line:
       LlamaFarm turns AI from experiments into repeatable, secure
       releases, so teams can ship fast.  What we have working today: Full
       RAG pipeline: 15+ document formats, programmatic extraction (no LLM
       calls needed), vector-database embedding, universal model layer
       that runs the same code for 25+ providers, automatic failover,
       cost-based routing; Truly portable: Identical behavior from laptop
       - datacenter - cloud; Real deployment: Docker Compose works now
       with Kubernetes basics and cloud templates on the way.  Check out
       our readme/quickstart for easy install instructions:
       https://github.com/llama-farm/llamafarm?tab=readme-ov-file#-...  Or
       just grab a binary for your platform directly from the latest
       release: https://github.com/llama-farm/llamafarm/releases/latest
       The vision is to be able to run, update, and continuously fine-tune
       dozens of models across environments with built-in RAG and
       evaluations, all wrapped in a self-healing runtime. We have an MVP
       of that today (with a lot more to do!).  We'd love to hear your
       feedback! Think we're way off? Spot on? Want us to build something
       for your specific use case? We're here for all your comments!
        
       Author : mhamann
       Score  : 53 points
       Date   : 2025-10-07 15:30 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | bobbyradford wrote:
       | I'm a contributor on this project and am very excited to hear
       | your feedback. We really hope that this will become a helpful
       | tool for building AI projects that you own and run yourself!
        
         | rgthelen wrote:
         | Likewise, it's been fun building in the open on this one. You
         | can download the CLI with just: curl -fsSL
         | https://raw.githubusercontent.com/llama-farm/llamafarm/main/...
         | | bash
         | 
         | Then do LF init to get the project started!
        
       | A4ET8a8uTh0_v2 wrote:
       | I am not sure if it is the future, but I am glad there is some
       | movement to hinder centralization in this sector as much as
       | possible ( yes, I recognize future risk, but for now it counts as
       | hindering it ).
        
         | mhamann wrote:
         | 100%. We don't know what's going to happen in the future.
         | Things are evolving so quickly. Hopefully pushing back on
         | centralization now will keep the ecosystem healthier and give
         | developers real options outside the big two/three cloud
         | providers.
        
         | rgthelen wrote:
         | I am kind of militant about this. The ability to run great AI
         | models locally is critical, not just for this sector, but for
         | innovation overall. The bar of "build a few datacenters" is far
         | too high for all but the largest countries and companies in the
         | world.
        
       | smogs wrote:
       | Looks great! Congrats on the launch. How is this different than
       | llamaindex?
        
         | rachelradulo wrote:
         | Hey thanks! I'm Rachel from LlamaFarm; we actually use
         | LlamaIndex as one of our components. It's great for RAG, and we
         | didn't want to reinvent what they've already done. LlamaFarm is
         | about bundling the best of open source into a complete,
         | production-ready AI project framework. Think of us like the
         | integration and orchestration layer that makes LlamaIndex, plus
         | model management, plus prompt engineering, plus deployment
         | tools all work together seamlessly.
         | 
         | Where LlamaIndex gives you powerful RAG primitives, we give you
         | the full production system - the model failover when OpenAI is
         | down, the strategy system that adapts from development to
         | production, the deployment configs for Kubernetes. We handle
         | all the boring stuff that turns a RAG prototype into a system
         | that actually runs in production. One YAML config, one CLI
         | command, and you have everything from local development to
         | cloud deployment. :)
        
       | darkro wrote:
       | I love the ethos of the project! I think your docs link might be
       | broken, however. Looking forward to checking this out!
        
         | rachelradulo wrote:
         | Hey thanks! Sorry about the broken link - here's a better docs
         | link for now https://docs.llamafarm.dev/docs/intro mind sharing
         | where it's broken?
        
           | darkro wrote:
           | Yea of course. I was trying to click the docs link from the
           | homepage on llamafarm.dev from two different networks on two
           | different browsers: edge and brave. Neither worked. Phone
           | didn't either. It takes me to a supabase link that errors
           | out. Hope that helps! Thanks for the link! (Btw I don't see
           | any errors in the browsers console)
        
             | rachelradulo wrote:
             | thank you! found and fixed 2 on the website - appreciate
             | the comment and detailed testing
        
         | rachelradulo wrote:
         | Just found and fixed a bad link on the bottom of the website -
         | thanks again for pointing that out !
        
       | jochalek wrote:
       | Very cool to see a serious local first effort. Looking back at
       | how far local models have come I definitely believe their
       | usefulness combined with RAG or in domain specific contexts is
       | soon to be (or already is) on par with general purpose gpt5-like
       | massive parameter cloud models. The ability to generate quality
       | responses without having to relinquish private data to the cloud
       | used to be a pipedream. It's exciting to see a team dedicated to
       | making this a reality.
        
         | mhamann wrote:
         | Thanks! It means a lot to hear you say that.
        
         | rgthelen wrote:
         | What are a few use-cases you want to see this used for?
        
           | jochalek wrote:
           | Few ideas related to healthcare
           | 
           | - AI assistants for smaller practices without enterprise EHR.
           | Epic at the moment integrates 3rd party AI assistants, but
           | those are of course cloud services and are aimed at contracts
           | with large hospital systems. They're a great step forward,
           | but leave much to be desired by doctors in actual usefulness.
           | 
           | - Consumer/patient facing products to help people synthesize
           | all of their health information and understand what their
           | healthcare providers are doing. Think of a n on device
           | assistant that can connect with something like
           | https://www.fastenhealth.com/ to make local RAG of their
           | health history.
           | 
           | Overall, users can feel more confident they know where their
           | PHI is, and potentially easier for smaller companies/start-
           | ups to get into the healthcare space without having to
           | move/store people's PHI.
        
             | rgthelen wrote:
             | Cool ideas, thank you.
        
       | Unheard3610 wrote:
       | but wait, why should I do this for my first home grown
       | orchestration instead of something else? Like, if I want to set
       | up a local LLM running on my old laptop for some kind of RAG on
       | all my hard drives why is this best? Or if I want agentic
       | monitoring of alarms instead of paying for simplisafe or ring or
       | whatever.
        
         | mhamann wrote:
         | Right...there are lots of ways you could do that. Most of the
         | ways we've seen enabling that sort of thing tend to be
         | programmatic in nature. That's great for some people, but you
         | have to deal with shifting dependencies, sorting out bugs,
         | making sure everything connects properly, etc. Some people will
         | want that for sure, because you do get control over every
         | little piece.
         | 
         | LlamaFarm provides an abstraction over most (eventually all) of
         | those pieces. Something that should work out of the box
         | wherever you deploy it but with various knobs to customize as
         | needed (we're working on an agent to help you with this as
         | well).
         | 
         | In your example (alarm monitoring), I think right now you'd
         | still need to write the agent, but you could use LlamaFarm to
         | deploy an LLM that relied on increasingly accurate examples in
         | RAG and very easily adjust your system prompt.
        
         | rgthelen wrote:
         | Good question -- that's actually the sweet spot for LlamaFarm.
         | 
         | You can wire things together yourself (LangChain, bash, Ollama,
         | etc.), but LlamaFarm tries to make that repeatable and
         | portable. It's declarative orchestration for AI systems -- you
         | describe what you want (models, RAG, agents, vector DBs) in
         | YAML, and it runs the same way anywhere: laptop, cloud, or
         | fully air-gapped edge.
         | 
         | So instead of gluing frameworks and breaking them every update,
         | you can do something like:
         | 
         | name: home_guarde runtimes: - detect_motion: {model: "phi-3",
         | provider: "lemonade"} - alert: {model: "gpt-5", fallback:
         | "llama3:8b"} rag: embedder: "nomic-embed-text" database:
         | chromaDB
         | 
         | ...and it just runs -- same config, same behavior, whether
         | you're doing local RAG or home monitoring. The goal isn't to
         | replace the DIY route, just to make it composable and
         | reproducible.
        
       | johnthecto wrote:
       | So this sounds like an application layer approach, maybe just shy
       | of a replit or base44, with the twist that you can own the
       | pipeline. While there's something to that, I think there are some
       | further questions around differentiation that need to be
       | answered. I think the biggest challenge is going to be the
       | beachead: what client demographic has the cash to want to own the
       | pipeline and not use SaaS, but doesn't have the staff on hand to
       | do it?
        
         | rgthelen wrote:
         | Yeah, that's a fair framing -- it is kind of an "application
         | layer" for AI orchestration, but focused on ownership and
         | portability instead of just convenience.
         | 
         | Yeah, the beachead will be our biggest issue - where to find
         | first hard-core users. I was thinking legal (they have a need
         | for AI, but data cannot leave their servers), healthcare (same
         | as legal, but more regualtions), and government (not right now,
         | but normally have deep pockets).
         | 
         | What do you think is a good starting place?
        
         | mhamann wrote:
         | I think that enterprises and small businesses alike need stuff
         | like this, regardless of whether they're software companies or
         | some other vertical like healthcare or legal. I worked at IBM
         | for over a decade and it was always preferable to start with an
         | open source framework if it fit your problem space, especially
         | for internal stuff. We shipped products with components built
         | on Elastic, Drupal, Express, etc.
         | 
         | You could make the same argument for Kubernetes. If you have
         | the cash and the team, why not build it yourself? Most don't
         | have the expertise or the time to find/train the people who do.
         | 
         | People want AI that works out of the box on day one. Not day
         | 100.
        
       | outfinity wrote:
       | Hope you are right and we decentralize AI properly...
        
         | rgthelen wrote:
         | The hardest part, honestly, is the runtime. How do we make it
         | super easy actually to deploy this. We are still working on
         | that. Where do you see a few good places to focus at first? I
         | was thinking AWS and Google, since both have good GPU pricing
         | models, but I am probably missing a few good ones!
        
       | bityard wrote:
       | Open source but backed by venture capital, so what is your
       | monetization strategy?
        
         | rachelradulo wrote:
         | Fair question. The core will always stay open source and free.
         | We'll monetize around it with things like managed hosting,
         | enterprise support, and compliance options (HIPAA, SOC2, etc).
         | Basically, we make money when teams want someone to stand
         | behind it in production, not for using the software itself. But
         | let us know if you have other ideas! We're still new to open
         | source
        
       | serjester wrote:
       | Congrats on the launch. YC 2022? I'm assuming this was a pivot -
       | what lead to it and how do you guys plan on making money long
       | term?
        
         | rachelradulo wrote:
         | Yep! We were working on an authentication startup
         | (https://news.ycombinator.com/item?id=30615352) and built it to
         | $1.5M in ARR, but then we saw even a bigger pain point; local
         | AI is hard. When we tried building a corporate knowledge base
         | with RAG and local models, we hit the same wall: a painful gap
         | between prototype and production.
         | 
         | Production-ready enterprise AI requires solving model
         | management, RAG pipelines, model fine-tuning, prompt
         | engineering, failover, cost optimization, and deployment
         | orchestration. You can't just be good at one or two of these,
         | you have to be great at all of them or your project won't
         | succeed. And so Llamafarm was born!
         | 
         | Monetization-wise - We're open source and free forever, with
         | revenue coming from enterprise support, managed deployments,
         | and compliance packages--basically, companies pay for
         | confidence, not code.
        
       | singlepaynews wrote:
       | Very cool. I jumped in here thinking it was gonna be something
       | else though: a packaged service for distributing on-prem model
       | running across multiple GPUs.
       | 
       | I'm basically imagining a vast.ai type deployment of an on-prem
       | GPT; assuming that most infra is consumer GPUs on consumer
       | devices, the idea of running the "company cluster" as combined
       | compute of the company's machines
        
         | mhamann wrote:
         | Great point. I can see how you'd land there. Also a great idea!
         | xD
         | 
         | Maybe a better descriptor is "self-sovereign AI?" "Self-hosted
         | AI?"
        
         | olokobayusuf wrote:
         | We're building something closer to this at Muna:
         | https://docs.muna.ai . Check us out and let me know what you
         | think!
        
           | dang wrote:
           | https://news.ycombinator.com/item?id=43119777
        
       | olokobayusuf wrote:
       | This is super interesting! I'm the founder of Muna
       | (https://docs.muna.ai) with much of the same underlying
       | philosophy, but a different approach:
       | 
       | We're building a general purpose compiler for Python. Once
       | compiled, developers can deploy across Android, iOS, Linux,
       | macOS, Web (wasm), and Windows in as little as two lines of code.
       | 
       | Congrats on the launch!
        
         | mhamann wrote:
         | Oh! Muna looks cool as well! I've just barely glanced at your
         | docs page so far, but I'm definitely going to explore further.
         | One of the biggest issues in the back of our minds is getting
         | models running on a variety of hardware and platforms. Right
         | now, we're just using Ollama with support for Lemonade coming
         | soon. But both of these will likely require some manual setup
         | before deploying LlamaFarm.
        
           | olokobayusuf wrote:
           | We should collab! We prefer to be the underlying
           | infrastructure behind the scenes, and have a pretty holistic
           | approach towards hardware coverage and performance
           | optimization.
           | 
           | Read more:
           | 
           | - https://blog.codingconfessions.com/p/compiling-python-to-
           | run... - https://docs.muna.ai/predictors/ai#inference-
           | backends
        
       ___________________________________________________________________
       (page generated 2025-10-07 23:00 UTC)