[HN Gopher] We rewrote a high-performance database in Rust
       ___________________________________________________________________
        
       We rewrote a high-performance database in Rust
        
       Author : gk1
       Score  : 90 points
       Date   : 2022-10-18 18:51 UTC (4 hours ago)
        
 (HTM) web link (www.pinecone.io)
 (TXT) w3m dump (www.pinecone.io)
        
       | ayewo wrote:
       | Rewriting in Rust is not a meme, it's a cycle.
       | 
       | Before Rust became viable, rewrites were done in Go.
       | 
       | From the archives:
       | 
       | - Rewriting a large production system in Go
       | https://news.ycombinator.com/item?id=6234736 (2013)
       | 
       | - How We Moved Our API From Ruby to Go
       | https://news.ycombinator.com/item?id=9693743 (2015)
       | 
       | - Matrix and Riot Confirmed as the Basis for France's Secure
       | Instant Messenger App
       | https://news.ycombinator.com/item?id=16938545 (2018)
       | 
       | - Toward Vagrant 3.0
       | https://news.ycombinator.com/item?id=27476676 (2021)
       | 
       | - I'm porting the TypeScript type checker tsc to Go
       | https://news.ycombinator.com/item?id=30074414 (2022)
        
         | pjmlp wrote:
         | Which is why old timers eventually learn to just deliver with
         | boring technology.
        
           | riskable wrote:
           | > Which is why old timers eventually learn to just deliver
           | with ~~boring~~ _buggy_ technology.
           | 
           | There's a reason why folks take the time to rewrite things in
           | Rust. No matter how good you are at C/C++ you will encounter
           | bugs that you would not have if you had written it in Rust.
        
             | pjmlp wrote:
             | Assuming there is even a Rust library replacement to start
             | with.
             | 
             | People keep forgetting C++ has 30 years of being deployed
             | in production.
             | 
             | Rust is 2022 is like using C++ in 1990's in terms of
             | ecosystem.
        
               | darksaints wrote:
               | The C++ IDEs available up until about 10 years ago were
               | complete garbage. C++ _still doesn 't even have a good
               | package manager_. All the build systems are pure chaos.
               | The largest C++ package manager has 1500 packages. In
               | comparison, rust's package manager and build system are
               | way easier to use and already have 94,000 packages
               | available to users.
               | 
               | That's not exactly fair to C++ because entire categories
               | of dev tools (like build systems, package managers, IDEs,
               | debuggers, version control, static analyzers, etc.) have
               | matured after C++ did. And let's not forget that when C++
               | was new, most libraries were proprietary licensed and
               | paid for, whereas today almost all libs are open source.
               | And those improvements (along with general size of the
               | programming community) mean that a trendy language today
               | is going to develop and mature a lot faster than a
               | formerly trendy language did 30 years ago.
               | 
               | IMO Rust in 2022 is a lot closer to java in 2005 than C++
               | in the 1990s.
        
               | pjmlp wrote:
               | Another one that never used Borland, Apple, IBM IDEs.
               | 
               | Where is the Rust IDE that is half as capable as C++
               | Builder, MPW/Metrowerks, Visual Age, Zortech?
               | 
               | Considering all features they offered across the board in
               | the box, not only code completion.
        
               | darksaints wrote:
               | I've used Borland. I'm sure it was marvelous at the time,
               | but it doesn't hold a candle to CLion or Visual Studio in
               | the 2010s or later.
        
               | pjmlp wrote:
               | CLion now ships a cross platform C++ framework with it?
               | 
               | As for Visual Studio, yeah it is great in all aspects,
               | except having nothing else beyond MFC to offer on the GUI
               | department, WinUI is still a mess after UWP.
               | 
               | In any case your examples are for C++ IDEs, reinforcing
               | my case of C++ tooling versus Rust.
        
               | howinteresting wrote:
               | This is just plain false. C++ in the 1990s had nothing
               | like serde for example.
        
               | jerf wrote:
               | The minimum bar for a language has moved up significantly
               | since the 1990s. It isn't enough to just have a neat new
               | idea, you need to ship with nearly-best-of-breed JSON
               | serialization, a web server, a huge standard library with
               | not just strings but things like compression and a lot of
               | networking, and a laundry list of other things (give or
               | take a few things) just to make it to the "barely viable
               | alternate choice" point.
               | 
               | Nice as a language consumer, but a bummer that building
               | new languages and getting some attention is so much
               | harder than it used to be.
        
               | xani__ wrote:
        
               | marcosdumay wrote:
               | Well, Rust checks all of those boxes.
               | 
               | Yes, it's a problem for that language you plan on
               | creating (try specializing into a niche). But it's not
               | something that should impact Rust's adoption.
        
               | pjmlp wrote:
               | With various levels of completeness.
        
               | rastignack wrote:
               | Where are the production grade and pure rust tls library
               | ? Key-value store ? Ldap client ? SSH client ?
        
               | pitaj wrote:
               | Aren't the most commonly used libraries for all of those
               | written in C, not C++?
               | 
               | Regardless, I'm surprised you haven't heard of rustls -
               | https://github.com/rustls/rustls
        
               | rastignack wrote:
               | You have great c++ libraries for those. Not in pure rust
               | though.
        
               | jhgg wrote:
               | > production grade and pure rust tls library
               | 
               | You mean rustls? https://github.com/rustls/rustls
        
               | rastignack wrote:
               | Pure rust ? No.
        
               | bogeholm wrote:
               | You're right, I see some *.md-files in the GitHub repo
        
               | howinteresting wrote:
               | I haven't used them much, but sled, ldap3 and thrussh do
               | exist. As Rust gains further in popularity I'd expect
               | more of these to become production ready. Meanwhile
               | there's always C and C++ interop.
        
               | rastignack wrote:
               | Sled and thrussh are not production grade. I don't
               | particularly want to delve into details as I think the
               | effort is laudable. I can explain my position in private
               | if need be.
        
               | howinteresting wrote:
               | OK. I mean they're probably less mature, yeah. But high-
               | quality Rust bindings to libssh2 and RocksDB do exist so
               | _shrug_
        
               | pjmlp wrote:
               | And? A drop in the ocean of libraries.
        
               | howinteresting wrote:
               | And virtually no one should be starting new projects in
               | C++ and everyone should switch to Rust.
        
               | pjmlp wrote:
               | Start by removing C++ from Rust compiler.
               | 
               | Then go around for Khronos, NVidia, Microsoft, Sony,
               | Nintendo, Unreal, Godot,.... to support Rust on their
               | SDKs.
        
           | nicoburns wrote:
           | I honestly feel like rust _is_ boring technology in most
           | senses of the word. It "just works" more than almost any
           | other technology that I've used. The ownership system is new
           | and different, but that's really the only thing.
        
             | xani__ wrote:
        
             | lijogdfljk wrote:
             | Honestly, so is Go. I used to use Go, and it's boring as
             | hell. I hate it for a few choices they made, but they
             | definitely achieved their goal. It is quite boring.
             | 
             | I agree with you though, so is Rust. The less boring areas
             | imo these days aren't languages (at least none i see), as
             | all the good languages are boring. Zig for example, is
             | pretty mundane too.
             | 
             | The older i get the more i value confidence in a product.
             | Confidence that it won't crash at runtime. Confidence that
             | i won't be bugged over the weekend. etc
        
             | notriddle wrote:
             | Rust is not boring technology. There's too much ecosystem
             | churn, and new language features are deployed too often.
             | 
             | C++ isn't boring technology, either. If you just want to
             | deliver value, I'd recommend Java.
        
               | zozbot234 wrote:
               | > There's too much ecosystem churn, and new language
               | features are deployed too often.
               | 
               | Not much of an issue if you stick to the stable subset of
               | the language, and libraries that work within that subset.
        
               | lijogdfljk wrote:
               | > There's too much ecosystem churn, and new language
               | features are deployed too often.
               | 
               | That kinda feels like saying Linux is too crazy because
               | new apps get made for Linux frequently.
               | 
               | You can use the same part of the language tomorrow that
               | you used today. Nothing is changing out from under you.
               | If you're afraid of libraries, don't use them. You'd have
               | the same problem in any ecosystem that is new, no?
        
               | notriddle wrote:
               | > That kinda feels like saying Linux is too crazy because
               | new apps get made for Linux frequently.
               | 
               | Apps are okay, but other parts of userland that roll out
               | breaking changes on a regular basis are definitely a
               | problem [1] [2] [3]. Even if they aren't technically part
               | of the kernel, they are usually used with it to provide a
               | complete working system, and they break stuff all the
               | time.
               | 
               | [1]: https://lwn.net/Articles/904892/
               | 
               | [2]: https://lwn.net/Articles/840430/
               | 
               | [3]: https://lwn.net/Articles/777595/
        
               | howinteresting wrote:
               | I've led and been on teams that have written multiple
               | production-grade Rust services that have together
               | delivered 100MM+ USD of value. The number of production
               | bugs has been in the single digits, with exactly one
               | outage that lasted more than a few minutes in the last 3
               | years. How about yourself?
               | 
               | In my experience, Rust delivers by far the fewest number
               | of bugs in production out of any mainstream language. It
               | gets the fundamentals right like nothing before it. &,
               | &mut, Send and Sync take care of many classes of bugs in
               | the inner loop of productivity.
        
               | doliveira wrote:
               | With my (admittedly limited) experience with the Hadoop
               | ecosystem, I'd sincerely beg for people to stop writing
               | databases in Java... Apart from the way bigger system
               | requirements, dependency version hell, having to monitor
               | GC pauses is just so, so annoying
        
         | avgcorrection wrote:
         | The stable release of Go was maybe four years or so before
         | Rust. So what you're saying seems to be that people like to
         | rewrite their tech in young and hyped (for good or bad or
         | neutral) languages. Because there is little connection between
         | Rust and Go (other than chronology).
        
           | sbdivuvu wrote:
        
         | [deleted]
        
       | einpoklum wrote:
       | The authors of this post rewrote _their own_ DBMS in Rust. Which
       | is perfectly ok, but I'm not sure I would trust them to decide
       | that theirs is a "high-performance" DBMS. They don't have any
       | benchmark results except images of their own internal performance
       | measures; they don't offer any way of comparing their performance
       | with other DBMSes (e.g. Vectorwise/Actian Vector, ClickHouse,
       | DuckDB etc. - not to mention Oracle, MS or SAP offerings); and
       | they only have marketing blurb about their numbers: "Up to 10x
       | performance" (with no baseline of course).
       | 
       | So, they took some DBMS (which is probably not so hot in terms of
       | performance) and rewrote it in Rust. Surely possible, possibly
       | useful, but not much to write home about if one is interested in
       | DBMS performance.
        
       | 22SAS wrote:
       | Glanced through the article, and I see no comparisons on how
       | performance of the DB is in Rust versus their current C++
       | implementation, no mention of if maintaining the Rust code is
       | easier than their C++ codebase, no stats on how devs are ramping
       | up and how it's tackling their "hard to find a dev who knows both
       | C++ and Python well" issue.
        
         | jeroenhd wrote:
         | The next paragraph they state:                   We looked at
         | and compared several languages - Go, Java, C++, and Rust. We
         | knew that C++ was harder to scale and maintain high quality as
         | you build a dev team; that Java doesn't provide the flexibility
         | and systems programming language we needed; and that Go is also
         | a garbage collected language. This left us with Rust. With
         | Rust, the pros around performance, memory management, and ease
         | of use outweighed the cons of it not yet being a very
         | established language.
         | 
         | In other words, they wanted to unify the programming languages
         | and evaluated several. Rust won out of those for performance
         | reasons.
         | 
         | The article is a short recap of a 40 minute video. The video
         | has more context and explains the intentions much better than
         | the web page.
         | 
         | They show a graph of performance over time as the rewrite
         | progressed. There were some small optimisations and problems, a
         | few big regressions, and then a huge improvement that was
         | maintained. Looks like the rewrite process made the database
         | perform significantly better. There's nothing on how much this
         | was caused by the language switch itself, but that's
         | functionally impossible: nobody is rewriting their application
         | twice to see what rewrite is better.
        
           | lijogdfljk wrote:
           | > but that's functionally impossible: nobody is rewriting
           | their application twice to see what rewrite is better.
           | 
           | Agreed, and hypothetically the 2nd rewrite should _still_ be
           | better than the first. So the language would have to make it
           | significantly worse to outweigh the yet again experience in
           | improving things.
           | 
           | To be clear though i'm not stating that every rewrite is
           | assured to be better. However a carefully considered rewrite
           | has a much easier time making decisions learned from any
           | warts discovered in previous implementations. God knows
           | there's always _some_ warts.
           | 
           | As a Rust fanatic, i wouldn't expect Rust itself to be due to
           | the performance gains. It's not expected to be _faster_ than
           | C /C++ typically. Just comparable.
        
         | dxhdr wrote:
         | Article also states that the switch from C++ to Rust improves
         | "low level optimized instruction sets, memory layout, and
         | running async tasks."
         | 
         | The first two are also strengths of C++, and for the third the
         | article says that "Rust is async, and Tokio is the one of the
         | most popular async providers ... However, it's not great for
         | running CPU intensive workloads, like with Pinecone." Puzzling.
        
           | rwaksmunski wrote:
           | I've had more luck with async-std over Tokio for more CPU
           | intensive workloads. But then again, I ran it on a kqueue
           | platform so my experience is probably not representative.
        
           | pclmulqdq wrote:
           | My past experience with Rust async code is that both async-
           | std and Tokio are fairly unimpressive on performance (as
           | async code goes), particularly if you compare to ScyllaDB's
           | runtime or other similar C++ async runtimes.
        
       | kaladin_1 wrote:
       | I have no problem with people rewriting their projects in
       | whatever language they see fit.
       | 
       | What stood out for me in the article is him saying that it's
       | difficult getting developers with experience in both Python and
       | C++.
       | 
       | So, I wonder, if his in-house devs could pick up Rust that they
       | previously couldn't write, why does he think he can not hire a
       | good programmer and charge him to learn the stack the company
       | uses. Why must they employ someone that already writes Python or
       | C++.
       | 
       | Is Rust such a straight-forward language that people new to the
       | language can write a very performant programme
        
         | pornel wrote:
         | Despite Rust's steep learning curve, it's also paradoxically
         | easy to add novice Rust programmers to a project.
         | 
         | This is because inexperienced Rust programmers are relatively
         | harmless. Noob mistakes won't compile, rather than running into
         | dangerous gotchas. You can tell noobs not to use `unsafe` (and
         | there are ways to enforce that), and mostly they'll just write
         | inefficient or non-idiomatic code, but the code will be free
         | from data races and memory corruption.
         | 
         | The strictness of the Rust compiler is quite the opposite of
         | something like the C++ Core Guidelines where the majority of
         | the rules aren't enforced by the compiler, and have to be in
         | the programmers' head first.
         | 
         | Noobs make lifetime errors and fight the Rust compiler, but
         | imagine working with a compiler that _doesn 't_ tell you when
         | you have lifetime errors.
        
           | krona wrote:
           | Memory sanitizers, address sanitizers, leak sanitizers,
           | threading sanitizers, undefined behaviour sanitizers. The
           | visual studio core guidelines checker. The clang-tidy core
           | guideline checker. I could go on but my point is, the
           | landscape does not really look like how you've painted it.
        
             | atoav wrote:
             | All of these tools and we still have exploitable buffer
             | overflows in 2022. So either the tools are not working,
             | people are not using them or they are using them but can
             | simply ignore critical warnings.
             | 
             | Your milage may vary, but I think Rust offers a well
             | considered step into the right direction. Stupid and
             | dangerous code of the kind _every_ developer will produce
             | once in a while just won 't compile in Rust. You cannot
             | forget to run a check, you can't hide behind not knowing a
             | tool. You can't ignore the warning of you want a running
             | program.
             | 
             | That is not nothing.
        
             | abc_lisper wrote:
             | Yeah, but Rust guarantees sanity by design. Sanitizers are
             | a patch, and hence not comprehensive.
        
             | lijogdfljk wrote:
             | They painted it like reality, though, no?
             | 
             |  _You_ seem to paint the landscape as full of tools and
             | imply that they 're used. Either they're insufficient or
             | they're often under utilized, simply due to the number of
             | bugs we see. No?
        
             | pornel wrote:
             | I know about these, but there is a marked difference
             | between Rust and these tools.
             | 
             | Static analysis tools have much harder job analyzing C++
             | (aliasing and escape analysis are way harder, and static
             | analysis of thread-safety is basically impossible due to
             | lack of thread-safety info in the type system). The results
             | are a trade-off between being sparse or having false
             | positives.
             | 
             | The sanitizers only catch issues they can observe at run
             | time, and that relies on having sufficient test and fuzz
             | coverage. Some data races are incredibly hard to reproduce,
             | and might depend on a timing difference that won't happen
             | in your test harness.
             | 
             | OTOH Rust proves absence of these issues by construction,
             | at compile time.
             | 
             | It's like a difference between dynamically-typed and
             | statically-typed languages. Sure, you can fuzz type errors
             | out of JS or Python, but in statically-typed languages such
             | errors are eliminated entirely at compile time. Rust
             | extends this experience to more classes of errors.
        
               | djwatson24 wrote:
               | > and that relies on having sufficient test and fuzz
               | coverage
               | 
               | At the faang I worked at, some small portion of servers
               | ran the sanitizers in prod, so you're not reliant on test
               | coverage nearly so much for catching rare issues.
        
               | krona wrote:
               | _The results are a trade-off between sparse or having
               | false positives._
               | 
               | Rust just takes the other side of the trade-off, and will
               | reject valid programs. Hence why the unsafe keyword
               | exists, and why tools like Miri (https://github.com/rust-
               | lang/miri) exist specifically for rust.
        
               | timeon wrote:
               | > Rust just takes the other side of the trade-off, and
               | will reject valid programs.
               | 
               | Are we still talking about ease of add novice Rust
               | programmers to a project?
        
         | dymk wrote:
         | It takes longer to learn how to use C++ to the same level of
         | proficiency and correctness compared to Rust, in my experience.
         | It's harder to write an incorrect program in Rust.
        
           | lupire wrote:
           | What are the main correctness risks in C++ if you just never
           | use a raw pointer?
        
             | bcrosby95 wrote:
             | One thing off the top of my head, from experience:
             | 
             | std::string s(s);
             | 
             | To be fair, compilers will warn you about this nowadays.
             | But when I converted a C codebase to C++ 20 years ago they
             | didn't.
             | 
             | IIRC, references can also refer to de-allocated memory.
             | Also, if you don't pass-by-reference or pointer, you can
             | literally "slice" the dynamic doohickies off your instance
             | so your AlbinoCat behaves like a Cat because all that extra
             | special stuff is gone as far as the function is concerned.
             | 
             | This is just off the top of my head after not working with
             | C++ for 20 years. I'm sure with all the new features it's
             | gained over the past 20 years theres whole new exciting
             | ways to blow your leg off.
        
             | steveklabnik wrote:
             | I don't know about "main", but like, you don't need raw
             | pointers to have UB. uniq_ptr is nullptr after you move it.
             | 
             | And even then, my understanding is that raw pointers are
             | still intended to be used in Modern C++: they're there for
             | when you don't want to transfer ownership.
        
             | mamcx wrote:
             | Well...                  UNSAFE {              // TODO:
             | Verify all the lines, all the time, are ok           //
             | Just like you do testing, documentation, security and all
             | that           // ok?           #include <iostream>
             | using namespace std;           int main() {
             | // YOUR CODE           }           }
        
               | lupire wrote:
               | What are you saying?
        
               | mamcx wrote:
               | Well if the questions is:
               | 
               | > What are the main correctness risks in C++ if you just
               | never use a raw pointer?
               | 
               | All the code on C/C++ IS a correctness "risks". Only
               | constant, manual inspection could(maybe) say otherwise.
               | 
               | What Rust gives is significant reduction of the risks.
        
         | ekidd wrote:
         | I've written quite a bit of production code in C++, Python and
         | Rust, and currently work on a hybrid Rust/Python system. Here's
         | my experience:
         | 
         | - C++ is an unusually large language. And it has many historic
         | footguns, requiring a higher level of vigilance and code
         | review. If I were starting a brand new project today, I
         | wouldn't try to build a team of C++ programmers.
         | 
         | - Untyped Python becomes more difficult to refactor and
         | maintain once you reach 50k to 80k lines on a group project.
         | Typed Python, however, scales nicely beyond this size.
         | 
         | - Rust is a "medium-sized" language. It requires developers to
         | learn more than Go or Python does, but less than C++. And Rust
         | has far fewer traps for the unwary and the reckless than C++.
         | Rust's tooling is also very good in many areas.
         | 
         | - It's tempting to split a project into a fast "core" language,
         | and high-level "glue" language. There are real advantages to
         | this. (Which is why I've done it on one recent project!) But
         | this also comes with costs: everyone needs to be fairly good at
         | two languages, and switch back and forth. And you pay a tax at
         | the boundary.
         | 
         | If I were building a brand new database (and a team to maintain
         | it), I'd actually be strongly tempted to use Rust exclusively.
         | But this is partly because databases rarely have a "business
         | logic" layer that changes constantly, so there's less need for
         | a high level scripting language.
         | 
         | But with a different team or different constraints, C++ could
         | also be the right choice.
        
           | hutzlibu wrote:
           | "But this also comes with costs: everyone needs to be fairly
           | good at two languages, and switch back and forth."
           | 
           | Why does everyone needs to be good at both languages? You can
           | seperate and have the core people writing efficient low level
           | code - and you have higher level scripting/gluing code.
        
             | pornel wrote:
             | You will have Conway's law in your codebase. Coordination
             | between teams is hard, so teams will prefer to implement
             | features entirely in their language, even where that is
             | technically suboptimal.
             | 
             | You will get hot loops in Python, because a Rust programmer
             | wasn't around, and Rust programmers implementing whole
             | complex business logic in Rust behind a single `do_it()`
             | Python call.
        
               | hutzlibu wrote:
               | Communication and coordination is surely hard and things
               | like that surely happen, but this is why project
               | management exist.
               | 
               | If it is doing things right, then the rust people don't
               | do complex buisness logic, because it is not assigned to
               | them and they would not even have the details.
               | 
               | And if the python people were too eager and have core
               | stuff implemented and it is affecting performance, than
               | you can always reimplement it low level.
               | 
               | It all depends on the project of course, of what would be
               | the best mix.
        
           | lupire wrote:
           | A brand new project doesn't need legacy C++ footguns. It can
           | use modern C++.
           | 
           | The part of Python (usually) is that you don't _need_ to be
           | "good at it" it you aren't trying to write super polymorphic
           | core that runs super efficient computations like scipy. If
           | you have a fast core engine for the innner loop, a slow
           | Python management layer is plenty fast.
        
           | dkarl wrote:
           | > Typed Python, however, scales nicely beyond this size.
           | 
           | Could you say more about what tools and practices make this
           | possible, beyond simply adding type annotations in your code?
           | Asking for a friend.
           | 
           | > It's tempting to split a project into a fast "core"
           | language, and high-level "glue" language.
           | 
           | I did this with C++ and Boost Python back in the day and
           | loved the experience. I wonder if Rust will someday get a
           | high-level language for writing applications and scripts on
           | top of a Rust codebase, like Boost Python for C++ or Tcl for
           | C.
        
             | heavyset_go wrote:
             | > _Could you say more about what tools and practices make
             | this possible, beyond simply adding type annotations in
             | your code? Asking for a friend._
             | 
             | Python with type annotations works really well with type
             | checkers like Mypy, along with LSP servers, and both of
             | those integrate with most development environments.
             | 
             | Using a Python-oriented IDE like Pycharm with type
             | annotated Python also allows for better refactoring
             | options. It reduces the uncertainty and guesswork an IDE's
             | static analyzer must engage in for even basic features
             | you'd take for granted with IDE and statically typed
             | languages.
             | 
             | In practice, developers don't have to keep what can be a
             | massively complex application running in their heads to
             | modify code accurately. A nicely typed project makes it
             | easy to exactly what types of data are being passed around
             | and modified. Before gradual typing, you'd have to
             | backtrack to all of a function's call sites to understand
             | exactly what kind of data it takes and returns. With
             | gradual typing, you can just look at types and rely on Mypy
             | to ensure the right data is actually being shuffled around.
             | 
             | > _I did this with C++ and Boost Python back in the day and
             | loved the experience. I wonder if Rust will someday get a
             | high-level language for writing applications and scripts on
             | top of a Rust codebase, like Boost Python for C++ or Tcl
             | for C._
             | 
             | I haven't used Boost Python, but there are some options for
             | Rust and Python that work well and seem to suit this use
             | case like PyO3.
        
           | exceptione wrote:
           | Especially business logic should be taken in the firm grip of
           | static compile time guarantees that the hand of a strong type
           | system delivers. Even more so if it changes constantly!
           | Refactoring without fear.
           | 
           | Only software that does not have to run correctly
           | (prototypes, personal hobby projects) can get away with a
           | non-static type system.
           | 
           | When I have to pick a tool and I see it is written in Python
           | I will have a look for an alternative if possible. Because I
           | know it will have many bugs: some known, lots hidden.
        
         | MisterTea wrote:
         | > What stood out for me in the article is him saying that it's
         | difficult getting developers with experience in both Python and
         | C++.
         | 
         | More like they had difficulty finding cheap experienced
         | c++/python devs.
        
         | arriu wrote:
         | I agree, rust has a difficult learning curve. I've often heard
         | at least a year is required to really feel confident.
        
           | gotts wrote:
           | 6 months is what I heard. I'm currently at ~2 and 6 sounds
           | like a pretty good estimate.
        
           | heavyset_go wrote:
           | It takes a relatively short time to be proficient enough to
           | make useful contributions, maybe half a year to a year to be
           | confident. You can give an experienced developer the Rust
           | book and have them contributing to a Rust codebase quickly.
        
         | heavyset_go wrote:
         | The footgun-to-appropriate-feature ratio is higher with C++
         | than Rust. Rust also has some excellent Python integration
         | options that are relatively easy to use.
        
       | smitty1e wrote:
       | What is a vector database?
       | 
       | https://www.pinecone.io/learn/vector-database/
       | 
       | ...was less than informative.
        
         | jiggawatts wrote:
         | Standard row-oriented databases store columns on disk like so:
         | ABCABCABCABC
         | 
         | Vector databases store them like this:
         | AAAABBBBCCCC
         | 
         | This allows faster queries if you just need one (or a few)
         | columns, because unrelated columns don't have to be processed
         | at all. Caches are more efficient, vector CPU instructions can
         | be used, etc...
         | 
         | The downside is that random single row access is more expensive
         | because a row has to be reassembled from many locations.
        
       | [deleted]
        
       | makmanalp wrote:
       | > If you're using a higher level language, you're not going to
       | have access to how the memory is laid out. A simple change, like
       | removing indirection in our list, was an order of magnitude
       | improvement in our latencies since there's memory prefetching in
       | the compiler and the CPU can anticipate which vectors are going
       | to be loaded next in order to improve the memory footprint.
       | 
       | This is a common experience and I'm still surprised by the choice
       | I constantly see to use a managed-memory languages to build a
       | database - one of a very small set of special cases where having
       | full control over the memory layout might just be a reasonable
       | thing to want. In this universe (absent doing something
       | completely absurd) it's not algorithmic complexity but managing
       | data locality in the cache hierarchy (e.g. reading things from L3
       | vs main memory vs disk) that makes things orders of magnitude
       | faster, especially if you're in the realm of doing things like
       | SIMD operations to speed things up.
       | 
       | Perhaps there's some level of suck we're willing to tolerate for
       | all the other benefits you get, but I've been noticing a pattern
       | of "align things just so at the higher level and hope they mostly
       | turn out the way you want at the lower level" (e.g. also with the
       | Apache java-y databases like hadoop / hbase / cassandra which I
       | guess were mostly supposed to derive their total throughput from
       | massive scale rather than per-node performance) which is a bit
       | funny.
       | 
       | But also it seems like part of Rust's promise was "low level but
       | make it high level" which seems to be succeeding (zero-cost
       | abstractions and whatnot), so I imagine this will get better over
       | time - having not attempted a project like this myself, I'm not
       | sure what the limitations you'd run up against are in terms of
       | laying things out in memory in a favorable way - I imagine the
       | kind of massive manually managed arena allocations and ad-hoc
       | pointers going everywhere that one normally does doesn't really
       | fly.
        
         | pjmlp wrote:
         | Because it is a fake dichotomy.
         | 
         | D, Nim, C#, Swift, not to count all of those that existed since
         | Xerox PARC days.
        
       | lesuorac wrote:
       | > As you can see in the above graph, a commit was merged that
       | caused a huge spike. However, with Criterion, an open source
       | benchmarking tool, we were easily able to identify it, mitigate
       | it, and push a fix.
       | 
       | Wonder what the commit was that caused a more than 2x regression
       | and got a fix instead of an undo.
        
       | menaerus wrote:
        
         | sbdivuvu wrote:
        
         | TehCorwiz wrote:
         | I care.
         | 
         | Programming languages give us different frameworks and
         | guardrails to express computational tasks similarly to how
         | written and spoken languages give us a different set of
         | concepts with which to express ideas. New languages mean a
         | potentially different way of thinking about a problem. Some
         | ideas which are difficult to express in one language are
         | trivial in another.
         | 
         | Discovering these differences is one of the joys of language
         | learning. Language learning requires practice, and rewriting a
         | known work (or translating it you might say) is a great way to
         | deepen your understanding and test which ideas are easier or
         | harder.
        
         | [deleted]
        
       | lupire wrote:
       | I liked the part where they said Python is too slow because it's
       | garbage collected, and didn't show any metrics, and then built a
       | new solution and Rust and didn't show metics to compare to the
       | original system.
       | 
       | Makes me think the eng lead just wanted to do Rust, and made up a
       | rationalization.
        
         | viig99 wrote:
         | Same, "We knew that C++ was harder to scale and maintain high
         | quality as you build a dev team" this just sounds arbitary and
         | a weak excuse to use rust, C++-20 is as scalable as rust with a
         | very rich ecosystem.
        
         | cercatrova wrote:
         | Well, we already know Python is inherently slower than Rust or
         | any compiled language really, so does one really need metrics
         | to know that the Rust implementation was faster?
        
           | codespin wrote:
           | a perf change without perf numbers is a bug
        
           | ketralnis wrote:
           | Yes. If your bottleneck is magnetic disc seeking times, no
           | amount of language change is going to move the needle (hah!).
        
       | rs_rs_rs_rs_rs wrote:
       | >In addition, it's challenging to find developers with experience
       | in both Python and C++
       | 
       | So you decided on a language that makes it even harder to find
       | experienced developers?
        
         | masklinn wrote:
         | Anecdotally, a lot of rust-curious people seem to know python.
         | Projects like pyo3 help a lot as they make it much easier (=
         | safe) to build native modules compared to C, let alone C++.
        
           | pclmulqdq wrote:
           | Rust is seen as more approachable by Javascript and Python
           | devs, so they tend to learn it more often than C or C++.
           | 
           | It is a lot more similar to JS than C++ is.
        
             | 8jy89hui wrote:
             | > It is a lot more similar to JS than C++ is.
             | 
             | That is strange as I have experienced the opposite. I've
             | written all three languages and I've noticed that JS
             | patterns don't translate well to Rust. Many C++ patterns
             | translate well to Rust (albeit after a bit of borrow
             | checker fighting).
             | 
             | Thoughts?
        
               | smilekzs wrote:
               | Rust iterators gives you JS vibes, with gotchas mostly
               | related to lambda captures lifetimes. Once you accept
               | that sometimes a `collect` is the easiest way out, it
               | feels okay at the end of the day.
        
         | goodpoint wrote:
        
         | mistrial9 wrote:
         | it is arguable that C++ in the modern days is no longer "one
         | language" due to style, libraries, language features and code-
         | base legacy; you have to find a coder that will fit your C++
         | world, not just C++
        
           | pjmlp wrote:
           | Just like it will happen to Rust when it achieves 30 years of
           | history, getting features every six weeks.
           | 
           | How many epochs will exist in 30 years?
        
       | Thaxll wrote:
        
       | fwip wrote:
       | > First of all, Python is a garbage collected language, which
       | means it can be extremely slow for writing anything high
       | performance at scale.
       | 
       | I don't think garbage collection is in the top 3 causes of why
       | Python is slow.
        
         | 0x457 wrote:
         | > First of all, Python is a garbage, which means it can be
         | extremely slow for writing anything high performance at scale.
         | 
         | Fixed it.
        
           | pjmlp wrote:
           | Thankfully Fintech and military weapon control systems aren't
           | high performance.
        
         | mattnewton wrote:
         | This might be a difference of semantics- there is a difference
         | between garbage collection as a concept being slow and python's
         | GIL approach. My understanding is that the GIL would almost
         | always make the top 3 reason of why python is slow in practice
         | - it works for a very specific single threaded execution model
         | but can't really take advantage of modern processors.
        
           | slt2021 wrote:
           | I think GIL is not the reason for slowness, it just specifies
           | single threaded interpreter execution model. You can always
           | spin up more interpreters to take advantage of multiple
           | cores.
           | 
           | The reason for slowness - is the weak dynamic type system of
           | Python.
           | 
           | Every single instruction need to be type checked at runtime
           | and thus making everything slow.
           | 
           | Compare to C#/Java which have GC but both are amazingly fast,
           | because these languages have stricter type system. If you add
           | JIT on top (which can selectively replace MSIL/java opcodes
           | with native machine instructions) and it makes perf on par
           | with natively compiled languages like C++.
        
             | riku_iki wrote:
             | also, is py still interpreted or jit-compiled?
        
             | kstrauser wrote:
             | I'd argue Python is strongly, dynamically typed, and that
             | "weak" and "dynamic" are on different axes.
        
               | [deleted]
        
             | PartiallyTyped wrote:
             | > I think GIL is not the reason for slowness, it is the
             | weak dynamic types of Python that make it slow.
             | 
             | Python is structurally typed, that makes it dynamic, but it
             | is not weak as there is no type coercion.
             | 
             | > Every single instruction need to be type checked at
             | runtime and thus making everything slow.
             | 
             | This is also wrong, python does not type check anything,
             | not in the "regular" manner of typechecking. It relies on
             | structural typing, if it quacks like a duck, then it is
             | treated like a duck.
             | 
             | In fact, PyPy is an argument against your position as it
             | still allows the same (more or less) behaviour that python
             | has while operating a lot faster due to JIT.
             | 
             | Python doesn't have the luxury of compiling that C# and
             | Java, nor is the VM intended to be high performing.
        
               | pjmlp wrote:
               | PyPy is still slower than the competition and largely
               | ignored by the Python community.
        
       | ordiel wrote:
       | This only proves some skilled people heve too much free time and
       | no creativity...
        
       | jalino23 wrote:
       | rewriting everything in rust is not just a meme?
        
         | codegeek wrote:
         | It's like everyone is trying to do things in "Rust" because
         | it's the new thing to do.
        
           | dymk wrote:
           | Rust is 12 years old. 1.0 was released in 2015.
           | 
           | We've been past the "it's the shiny new thing" phase for a
           | while.
        
         | likeabbas wrote:
         | Memes typically have some basis in reality. If your project
         | reaches a point where it could benefit from fearless
         | concurrency or better memory control, Rust is probably your
         | best bet at the moment.
         | 
         | I could see huge benefits from Kafka and Cassandra being re-
         | written in Rust.
        
           | dominotw wrote:
           | kafka clone redpanda is written in rust ?
        
             | tilt_error wrote:
             | No. C++ and the C* (seastar) framework.
        
             | eatonphil wrote:
             | Hmm? I'm pretty sure it's written in C++.
             | 
             | See also, their install dependencies script.
             | 
             | https://github.com/redpanda-
             | data/redpanda/blob/dev/install-d...
        
           | tilt_error wrote:
           | They are rewritten in C++ already; Redpanda and ScyllaDB,
           | respectively. Why waste the effort of rewriting it once
           | again?
        
             | agallego wrote:
             | I tried in 2017 writing it in rust and found some compiler
             | bugs. I also found compiler bugs in c++ tho to be honest,
             | but I felt more comfortable in c++ so decided to write the
             | first version of it in c++. The huge advantage is that
             | storage engines in particular need to be more conservative
             | in many dimensions and having seen success with scylla,
             | seastar was apealing to me as a 'tried and tested' for
             | storage systems.
             | 
             | Prior systems I had built with facebook folly (c++ lib) and
             | had also written my own eventing systems in the past, but
             | the real value is having seastar being battle tested since
             | 2016. Largely it has been the right decision for us as
             | redpanda for it's young age has benefited from the
             | stability of seastar.
        
             | arcticbull wrote:
             | C++ isn't better in the ways outlined.
             | 
             | I'm not necessarily advocating it [1] but the parent's
             | claim was that those programs could benefit from memory
             | safety, thread safety and better concurrency and C++ does
             | not deliver along that axis.
             | 
             | [1] https://www.joelonsoftware.com/2000/04/06/things-you-
             | should-...
        
         | dymk wrote:
         | Why do you think the meme exists?
        
       ___________________________________________________________________
       (page generated 2022-10-18 23:01 UTC)