[HN Gopher] Comparing our Rust-based indexing and querying pipel...
___________________________________________________________________
Comparing our Rust-based indexing and querying pipeline to
Langchain
Author : tinco
Score : 92 points
Date : 2024-10-01 15:09 UTC (7 hours ago)
(HTM) web link (bosun.ai)
(TXT) w3m dump (bosun.ai)
| pjmlp wrote:
| Most of the Python libraries, are anyway bindings to native
| libraries.
|
| Any other ecosystem is able to plug into the same underlying
| native libraries, or even call them directly in case of being the
| same language.
|
| In a way it is kind of interesting the performance pressure that
| is going on Python world, otherwise CPython folks would never
| reconsider changing their stance on performance.
| OptionOfT wrote:
| Most of these native libraries' output isn't 1-1 mappable to
| Python. Based on the data you need to write native data
| wrappers, or worse, marshal the data into managed memory. The
| overhead can be high.
|
| It gets worse because Python doesn't expose you to memory
| management. This initially is an advantage, but later on causes
| bloat.
|
| Python is an incredibly easy interface over these native
| libraries, but has a lot of runtime costs.
| pjmlp wrote:
| Yet another reason to use native compiled languages with
| bindings to the same C and C++ libraries.
|
| If using C++20 onwards, then it is relatively easy to have
| similar high level abstractions, one just needs to let go of
| Cisms that many insist in using.
|
| Here Rust has clearly an advantage that it doesn't allow for
| copy-paste of C like code.
|
| Naturally D and Swift with their safety and C++ interop,
| would be an option as well.
| __coaxialcabal wrote:
| Have you had any success using LLMs to rewrite Python to
| rust?
| throwup238 wrote:
| They're very good at porting code between languages but
| going from a dynamically typed language with a large
| standard library to a static one with a large library
| ecosystem requires a bit more hand holding. It helps to
| specify the rust libraries you want to use (and their
| versions) and you'll probably want to give a few rounds of
| feedback and error correction before the code is ready.
| nicce wrote:
| > Python is an incredibly easy interface over these native
| libraries, but has a lot of runtime costs.
|
| It also means that many people use Python while they don't
| understand that what part of the code is actually fast. They
| mix Python code with wrappers to native libraries, and
| sometimes the Python code slows down the overall work
| substantially and people don't know that fault is there. E.g
| use Python Maths with the mix of Numpy math bindings, while
| they can do it with Numpy alone.
| oersted wrote:
| Indeed, but Python is used to orchestrate all these lower-level
| libraries. If you have Python on top, you often want to call
| these libraries on a loop, or more often, within parallelized
| multi-stage pipelines.
|
| Overhead and parallelization limitations become a serious issue
| then. Frameworks like PySpark take your Python code and are
| able to distribute it better, but it's still (relatively) very
| slow and clunky. Or they can limit what you can do to a
| natively implemented DSL (often SQL, or some DataFrame API, or
| an API to define DAGs and execute them within a native engine),
| but you can't to much serious data work without UDFs, where
| again Python comes in. There are tricks but you can never
| really avoid the limitations of the Python interpreter.
| lmeyerov wrote:
| At least for Louie.ai, basically genAI-native computational
| notebooks, where operational analysts ask for intensive analytics
| tasks for like pulling Splunk/Databricks/neo4j data, getting it
| wrangled in some runtime, cluster/graph/etc it, and generate
| interactive viz, Python has ups and downs:
|
| On the plus side, it means our backend gets to handle small/mid
| datasets well. Apache Arrow adoption in analytics packages is
| strong, so zero copy & and columnar flows on many rows is normal.
| Pushing that to the GPU or another process is also great.
|
| OTOH, one of our greatest issues is the GIL. Yes, it shows up a
| bit in single user code, and not discussed in the post,
| especially when doing divide-and-conquer flows for a user.
| However, the bigger issue is in stuffing many concurrent users
| into the same box to avoid blowing your budget. We would like the
| memory sharing benefits of threaded, but because of the GIL, want
| the isolation benefits of multiprocess. A bit same-but-different,
| we stream results to the browser as agents progress in your
| investigation, and that has not been as smooth as we have done
| with other languages.
|
| And moving to multiprocess is no panacea. Eg, a local embedding
| engine is expensive to do in-process per worker because modern
| models have high RAM needs. So that biases to using a local
| inference server for what is meant to be an otherwise local call,
| which is doable, but representative of that extra work needed for
| production-grade software.
|
| Interesting times!
| dmezzetti wrote:
| I've covered this before in articles such as this:
| https://neuml.hashnode.dev/building-an-efficient-sparse-keyw...
|
| You can make anything performant if you know the right buttons to
| push. While Rust makes it easy in some ways, Rust is also a
| difficult language to develop with for many developers. There is
| a tradeoff.
|
| I'd also say LangChain's primary goal isn't performance it's
| convenience and functionality coverage.
| timonv wrote:
| Cool, that's a fun read! I recently added sparse vector support
| to fastembed-rs, with Splade, not bm-25. Still, would be nice
| to compare the two.
| swyx wrote:
| i mean LLM based or not has nothing to do with it, this is a
| standard optimization, scripting lang vs systems lang story.
| godelski wrote:
| Shhhh, let this one go. So many people don't get optimization
| and why it is needed that I'll take anything we can get. Hell,
| I routinely see people saying no one needs to know C because
| python calls C in "the backend" (who the fuck writes "the
| backend" then?). The more people that can learn some HPC and
| parallelism, the better.
| pjmlp wrote:
| Even better if they would learn about these amazing managed
| languages where we can introspect the generated machine code
| of their dynamic compilers.
| godelski wrote:
| Agree, but idk what the gateway in is since I'm so
| desperate for people to just get the basic concepts.
| dboreham wrote:
| Obviously AI writes the backend.
| serjester wrote:
| I'm surprised they don't talk about the business side of this -
| did they have users complaining about the speed? At the end of
| day they only increased performance by 50%.
|
| These kind of optimization seem awesome once you have a somewhat
| mature product but you really have to wonder if this is the best
| use of a startup's very limited bandwidth.
| godelski wrote:
| > At the end of day they only increased performance by 50%.
| > only 50%.
|
| I'm sorry... what?! That's a lot of improvement and will save
| you a lot of money. 10% increases are quite large!
|
| Think about it this way, if you have a task that takes an hour
| and you turn that into 59 minutes and 59 seconds, it might seem
| like nothing (0.02%). But now consider you have a million
| users, that's a million seconds, or 277 hrs! This can save you
| money, you are often paying by the hour in one way or another
| (even if you own the system, your energy has cost that's
| dynamic). If this is a task run frequently, you're saving a lot
| of time in aggregate, despite not a lot per person. But even
| for a single person, this is helpful if more devs do this.
| Death by a thousand cuts.
|
| But in the specific case, if a task takes an hour and you save
| 50%, your task takes 30 minutes. Maybe the task here took only
| a few minutes, but people will be chaining these together quite
| a lot.
| lpapez wrote:
| Maybe these optimizations benefit the two users who do the
| operation three times a year.
|
| In such an extreme case no amount of optimization work would
| be profitable.
|
| So the parent comment asks a very valid question: how much
| total time was saved by this and who asked for it to be saved
| (paying or free tier customers for example)?
|
| People who see the business side of things rightfully fear
| when they hear the word "optimization", it's often not the
| best use of limited development resources - especially in an
| early stage product under development.
| sroussey wrote:
| I do wish that when people write about optimization that
| they would then multiply by usage, or something similar.
|
| Another way is to show CPU usage over a fleet of servers
| before and after. And then reshuffle the servers and use
| fewer and use the number of servers no longer needed as the
| metric.
|
| Number of servers have direct costs, as well as indirect
| costs, so you can even derive a dollar value. More so if
| you have a growth rate.
| godelski wrote:
| > I do wish that when people write about optimization
| that they would then multiply by usage, or something
| similar.
|
| How? You can give specific examples and then people make
| the same complaints because it isn't relevant to their
| use case. It's fairly easy to extrapolate the numbers to
| specific cases. We are humans, and we can fucking
| generalize. I'll agree there isn't much to the article,
| but I find this ask a bit odd. Do you not have all the
| information to make that calculation yourself? They
| should have done that if they're addressing their
| manager, but it looks like a technical blog where I think
| it is fair to assume the reader is technical and can make
| these extrapolations themselves.
| godelski wrote:
| > So the parent comment asks a very valid question: how
| much total time was saved by this and who asked for it to
| be saved (paying or free tier customers for example)?
|
| That is a hard question to answer because it very much
| depends on the use case, which is why I gave a vague
| response in my comment. Truth be told, __there is no
| answer__ BECAUSE it depends on context. In the case of AI
| agents, yeah, 50% is going to save you a ton of money. If
| you make LLM calls once a day, then no, probably not. Part
| of being the developer is to determine this tradeoff.
| Specifically, that's what technical managers are for,
| communicating technical stuff to business people (sure,
| your technical manager might not be technical, but someone
| being bad at their job doesn't make the point irrelevant,
| it just means someone else needs to do the job).
|
| You're right about early stage products, but there's lots
| of moderate and large businesses (and yes, startups) that
| don't optimize but should. Most software never optimizes
| and it has led to a lot of enshitification. Yes, move fast
| and break things, but go back and clean up, optimize, and
| reduce your tech debt, because you left a mess of broken
| stuff in your wake. But it is weird to pigeonhole to early
| stage startups.
| jahewson wrote:
| > 10% increases are quite large!
|
| You have to ask yourself, 10% of what? I don't usually mind
| throwing 10% more compute or memory at a problem but I do
| mind if its 10x more. I've shipped 100x perf improvements in
| the past where 1.5x would have been a waste of engineering
| time. A more typical case is a 10x or 20x improvement that's
| worth a few days coding. Now, if I'm working on a mature
| system that's had tens of thousands of engineering hours
| devoted to it, and is used by thousands of users, then I
| might be quite happy with 10%. Though I also may not! The
| broader context matters.
| godelski wrote:
| Sure, but I didn't shy away from the fact that it is case
| dependent. In fact, you're just talking about the
| metaoptimization. Which for any optimization, needs to be
| considered too.
| timonv wrote:
| Core maintainer of Swiftide here. That's a fair comment!
| Additionially, it's interesting to note that almost all the
| time is spend in FastEmbed / onxx in the Swiftide benchmark. A
| more involved follow up with chunking and transformation could
| be very interesting, and anecdotally shows far bigger
| differences. We did not have the time yet to fully dive into
| this.
|
| Personally, I just love code being fast, and Rust is incredible
| to work with. Exceptions granted, I'm more productive with Rust
| than any other language. And it's fun.
| satvikpendem wrote:
| I was asking the same question, turns out mistral.rs [0] has
| pretty good abstractions in order to not depend and package
| llama.cpp for every platform.
|
| [0] https://github.com/EricLBuehler/mistral.rs
| RcouF1uZ4gsC wrote:
| Why not use C++?
|
| For the most part, these aren't security critical components.
|
| You already have a massive amount of code you can use like say
| llama.cpp
|
| You get the performance that you do with Rust.
|
| Compared to Python, in addition to performance, you also get a
| much easier deployment story.
| oersted wrote:
| If you already have substantial experience with C++, this could
| be a good option. But I'd say nowadays that learning to use
| Rust *well* is much easier than learning to use C++ *well*. And
| the ecosystem, even if it's a lot less mature, I'd say is
| already better in Rust for these use-cases.
|
| Indeed, here security (generally safety) is a secondary concern
| and is not the main reason for choosing Rust, although welcome.
| It's just that Rust has everything that C++ gives you, but in a
| more modern and ergonomic package. Although, again, I can see
| how someone already steeped in C/C++ for years might not feel
| that, and reasonably so. But I think I can farely safely say
| that Rust is just "a better C++" from the perspective of
| someone starting from scratch now.
| outworlder wrote:
| Indeed.
|
| Plus, one doesn't usually just 'learn C++'. It's a herculean
| effort and I've yet to meet anyone, even people exclusively
| using C++ for all their careers, that could confidently say
| they "know C++". They may be comfortable with whatever subset
| of C++ their company uses, while another company's codebase
| will look completely alien, often with entire features being
| ignored that they used, and vice versa.
|
| Despite that, it's still a substantial time commitment, to
| the point that many (if not most) people working on C++ have
| made that their career; it's not just a tool anymore at that
| point. They may be more willing to jump entire industries
| rather than jump to another language. It is a generalization,
| but I have seen that far too often at this point.
|
| If someone is making a significant time investment starting
| today, I too would suggest investing in Rust instead. It also
| requires a decent time investment, but the rewards are great.
| Instead of learning where all the (hidden) landmines are, you
| learn how to write code that can't have those landmines in
| the first place. You aren't losing much either, other than
| the ability to read existing C++ codebases.
| riku_iki wrote:
| > But I'd say nowadays that learning to use Rust _well_ is
| much easier than learning to use C++ _well_.
|
| For someone(me) who was making a choice recently, it is not
| that obvious. I tried to learn through rust examples and
| ecosystems, and there are many more wtf moments compared to
| when I am writing C++ as C with classes + boost, especially
| when writing close to metal performance code, rust has many
| abstractions with unobvious performance implications.
| tcfhgj wrote:
| > rust has many abstractions with unobvious performance
| implications.
|
| such as?
| riku_iki wrote:
| this article has several examples:
| https://blog.polybdenum.com/2021/08/09/when-zero-cost-
| abstra...
| IshKebab wrote:
| Rust is much better than C++ overall and far easier to debug
| (C++ is prone to _very_ difficult to debug memory errors which
| don 't happen in Rust).
|
| The main reasons to use C++ these days are compatibility with
| existing code (C++ and Rust are a bit of a pain to mix), and if
| a big dependency is C++ (e.g. Qt).
| pjmlp wrote:
| Additionally the industry standards on GPGPU APIs, tooling
| ecosystem.
|
| Maybe one day we get Live++ or Visual Studio debugging
| experience for Rust, given that now plenty of Microsoft
| projects use Rust.
| Philpax wrote:
| Why use C++? What's the benefit over Rust here?
| timonv wrote:
| I've worked with C++ in the past, it's subject to taste. I like
| how Rust's rigidness empowers rapid change _without_ breaking
| things.
|
| Besides, the ML ecosystem is also very mature. llama.cpp has
| native bindings (which Swiftide supports), onnx bindings,
| ndarray (numpy in Rust) works great, Candle, lots of processing
| utilities. Additionally, many languages are rewriting parts in
| Rust, more often than not, these are available in Rust as well.
| roca wrote:
| Lots of reasons, but a big one is that dependency and build
| management in C++ is absolutely hellish unless you use stuff
| like Conan which nobody knows. In Rust, you use Cargo and
| everyone is happy.
| pjmlp wrote:
| There are lots of things I don't know until I learn how to
| use them, duh.
|
| Cargo is great, for pure Rust codebases, otherwise it is
| build.rs or having to learn another build system, and then
| people aren't that happy any longer.
| riku_iki wrote:
| You can always use something as simple as Make for your C++
| proj with manually dumping dependencies to some libs folder.
| zie1ony wrote:
| DSPy is in Python, so it must be Python. Sorry bro :P
| bborud wrote:
| It would be helpful to move to a compiled language with a decent
| toolchain. Rust and Go are good candidates.
| sandGorgon wrote:
| this is very cool!
|
| we built something for our internal consumption (and now used in
| quite a few places in India).
|
| Edgechains is declarative (jsonnet) based. so chains + prompts
| are declarative. And we built an wasm compiler (in rust based on
| wasmedge).
|
| https://github.com/arakoodev/EdgeChains/actions/runs/1039197...
| zozbot234 wrote:
| Am I the only one who thinks a Swift IDE project should be called
| Taylor?
| Svoka wrote:
| I would name it Tailor
| giancarlostoro wrote:
| Sure, but this is a Rust project for building LLMs called
| Swiftide, not a Swift IDE...
|
| https://swiftide.rs/what-is-swiftide/
| zitterbewegung wrote:
| This is a comparison of apples to oranges. Langchain has an order
| of magnitude of examples, of integrations and features and also
| rewrote its whole architecture to try to make the chaining more
| understandable. I don't see enough documentation in this pipeline
| to understand how to migrate my app to this. I also realize it
| would take me at least a week even migrate my own app to
| Langchain's rewrite.
|
| Langchain is used because it was a first mover and that's the
| same reason it's achilles heel and not for speed at all.
| elpalek wrote:
| Langchain and other frameworks are too bloated, it's good for
| demo, but highly recommend to build your own pipeline in
| production, it's not really that complicated, and you can have
| much better control over implementation. Plus you don't need 99%
| packages that comes with Langchain, reduce security
| vulnerabilities.
|
| I've written a series of RAG notebooks on how to implement RAG in
| python directly, with minimal packages. I know it's not in Rust
| or C++, but it can give you some ideas on how to do things
| directly.
|
| https://github.com/yudataguy/RawRAG
| cpill wrote:
| trouble is that the Langchain community is large and jumps on
| the latest research papers that come out almost immediately,
| which is a big advantage of your a small team
___________________________________________________________________
(page generated 2024-10-01 23:01 UTC)