https://old.reddit.com/r/rust/comments/1dpvm0j/120ms_to_30ms_python_to_rust/ jump to content my subreddits edit subscriptions * popular * -all * -random * -users | * AskReddit * -pics * -funny * -movies * -gaming * -worldnews * -news * -todayilearned * -nottheonion * -explainlikeimfive * -mildlyinteresting * -DIY * -videos * -OldSchoolCool * -TwoXChromosomes * -tifu * -Music * -books * -LifeProTips * -dataisbeautiful * -aww * -science * -space * -Showerthoughts * -askscience * -Jokes * -IAmA * -Futurology * -sports * -UpliftingNews * -food * -nosleep * -creepy * -history * -gifs * -InternetIsBeautiful * -GetMotivated * -gadgets * -announcements * -WritingPrompts * -philosophy * -Documentaries * -EarthPorn * -photoshopbattles * -listentothis * -blog more >> rust rust * comments Want to join? Log in or sign up in seconds.| * English [ ][] [ ]limit my search to r/rust use the following search parameters to narrow your results: subreddit:subreddit find submissions in "subreddit" author:username find submissions by "username" site:example.com find submissions from "example.com" url:text search for "text" in url selftext:text search for "text" in self post contents self:yes (or self:no) include (or exclude) self posts nsfw:yes (or nsfw:no) include (or exclude) results marked as NSFW e.g. subreddit:aww site:imgur.com dog see the search faq for details. advanced search: by author, subreddit... this post was submitted on 27 Jun 2024 153 points (87% upvoted) shortlink: [https://redd.it/1dpv] Submit a new link Submit a new text post Get an ad-free experience with special benefits, and directly support Reddit. get reddit premium rust joinleave298,247 readers 281 users here now Please read The Rust Community Code of Conduct --------------------------------------------------------------------- The Rust Programming Language A place for all things related to the Rust programming language--an open-source systems language that emphasizes performance, reliability, and productivity. --------------------------------------------------------------------- Rules Observe our code of conduct * Strive to treat others with respect, patience, kindness, and empathy. * We observe the Rust Project Code of Conduct. * Details Submissions must be on-topic * Posts must reference Rust or relate to things using Rust. For content that does not, use a text post to explain its relevance. * Post titles should include useful context. * For Rust questions, use the stickied Q&A thread. * Arts-and-crafts posts are permitted on weekends. * No meta posts; message the mods instead. * Details Constructive criticism only * Criticism is encouraged, though it must be constructive, useful and actionable. * If criticizing a project on GitHub, you may not link directly to the project's issue tracker. Please create a read-only mirror and link that instead. * Details Keep things in perspective * A programming language is rarely worth getting worked up over. * No zealotry or fanaticism. * Be charitable in intent. Err on the side of giving others the benefit of the doubt. * Details No endless relitigation * Avoid re-treading topics that have been long-settled or utterly exhausted. * Avoid bikeshedding. * This is not an official Rust forum, and cannot fulfill feature requests. Use the official venues for that. * Details No low-effort content * No memes or image macros. * Use properly formatted text to share code samples and error messages. Do not use images. * Details --------------------------------------------------------------------- Useful Links Megathreads Most links here will now take you to a search page listing posts with the relevant flair. The latest megathread for that flair should be the top result. * Alternative Rust Discussion Venues * Official Blog Posts * Rust Foundation Posts * Got a Question? * What's Everyone Working On? * Who's Hiring? Jobs Threads * This Week in Rust Official Resources * Official Website * Official Blog * This Week In Rust * Installers * Source Code * Bug Tracker Learn Rust * The Rust E-Book * Stdlib API Reference * Rust By Example * Rustlings * Online Playground Discussion Platforms * Official Users Forum * Official Discord * Community Discord * Mozilla Matrix Chat * Stack Overflow Chat a community for 13 years MODERATORS * message the mods discussions in r/rust <> X 153 * 7 comments 120ms to 30ms: Python to Rust 11 * 3 comments iroh 0.19.0 - Make it your own 16 * 2 comments cargo-dist 0.17.0 is out! 21 * 2 comments Tutorial: Implementing JSON parsing (Rust) 86 * 7 comments feature `lint_reasons` stabilized in nightly 1.81.0! 5 * 1 comment I made clap-maybe-deser 6 * 6 comments begging for help with cargo expand 5 Building a New Programming Language in 2024, pt. 1 9 * 19 comments Pattern for a type representing a running service Book or resource for rust CI/CD with GitHub actions , docker & k8s Welcome to Reddit, the front page of the internet. Become a Redditor and join one of thousands of communities. x 152 153 154 120ms to 30ms: Python to Rust (self.rust) submitted 6 hours ago * by stephenlblum We love to see performance numbers. It is a core objective for us. We are excited at another milestone in our ongoing effort: a 4x reduction in write latency for our data pipeline, bringing it down from 120ms to 30ms! This improvement is the result of transitioning from a C library accessed through a Python application to a fully Rust-based implementation. This is a light intro on our architectural changes, the real-world results, and the impact on system performance and user experience. Chart A and Chart B shown in the image above. So Why did we Switch to Rust from Python? Our Data Pipeline is Used by All Services! Our data pipeline is the backbone of our real-time communication platform. Our team is responsible for copying event data from all our APIs to all our internal systems and services. Data processing, event storage and indexing, connectivity status and lots more. Our primary goal is to ensure up-to-the-moment accuracy and reliability for real-time communication. Before our migration, the old pipeline utilized a C library accessed through a Python service, which buffered and bundled data. This was really the critical aspect that was causing our latency. We desired optimization, and knew it was achievable. We explored a transition to Rust, as we've seen performance, memory safety, and concurrency capabilities benefit us before. It's time to do it again! We Value Highly Rust Advantages with Performance and Asynchronous IO Rust is great in performance-intensive environments, especially when combined with asynchronous IO libraries like Tokio. Tokio supports a multithreaded, non-blocking runtime for writing asynchronous applications with the Rust programming language. The move to Rust allowed us to leverage these capabilities fully, enabling high throughput and low latency. All with compile-time memory and concurrency safety. Memory and Concurrency Safety Rust's ownership model provides compile-time guarantees for memory and concurrency safety, which preempts the most common issues such as data races, memory leaks, and invalid memory access. This is advantageous for us. Going forward we can confidently manage the lifecycle of the codebase. Allowing a ruthless refactoring if needed later. And there's always a "needed later" situation. Technical Implementation of Architectural Changes and Service-to-Service and Messaging with MPSC and Tokio The previous architecture relied on a service-to-service message-passing system that introduced considerable overhead and latency. A Python service utilized a C library for buffering and bundling data. And when messages were exchanged among multiple services, delays occurred, escalating the system's complexity. The buffering mechanism within the C library acted as a substantial bottleneck, resulting in an end-to-end latency of roughly 120 milliseconds. We thought this was optimal because our per-event latence average was at 40 microseconds. While this looks good from the old Python service perspective, downstream systems took a hit during unbundle time. This causes overall latency to be higher. In Chart B above shows when we deployed that the average per-event latency increased to 100 microseconds from the original 40. This seems non-optimal. Chart B should show reduced latency, not an increase! Though when we step back to look at the reason, we can see how this happens. The good news is now that downstream services can consume events more quickly, one-by-one without needing to unbundle. The overall end-to-end latency had a chance to dramatically improve from 120ms to 30ms. The new Rust application can fire off events instantly and concurrently. This approach was not possible with Python as it would have also been a rewrite to use a different concurrency model. We could have probably rewritten in Python. And if it's going to be a rewrite, might as well make the best rewrite we can with Rust! Resource Reduction CPU and Memory: Our Python service would consume upwards of 60% of a core. The new Rust service consumes less than 5% across multiple cores. And the memory reduction was dramatic as well, with Rust operating at about 200MB vs Python's GBs of RAM. New Rust-based Architecture: The new architecture leverages Rust's powerful concurrency mechanisms and asynchronous IO capabilities. Service-to-service message passing was replaced by utilizing multiple instances of Multi-Producer, Single-Consumer (MPSC) channels. Tokio is built for efficient asynchronous operations, which reduces blocking and increases throughput. Our data process was streamlined by eliminating the need for intermediary buffering stages, and opting instead for concurrency and parallelism. This improved performance and efficiency. Example Rust App The code isn't a direct copy, it's just a stand-in sample that mimics what our production code would be doing. Also, the code only shows one MPSC where our production system uses many channels. 1. Cargo.toml: We need to include dependencies for Tokio and any other crate we might be using (like async-channel for events). 2. Event definition: The Event type is used in the code but not defined as we have many types not shown in the this example. 3. Event stream: event_stream is referenced but not created in the same way we do with many streams. Depends on your approach so the example keeps things simple. The following is a Rust example with code and Cargo.toml file. Event definitions and event stream initialization too. Cargo.toml [package] name = "tokio_mpsc_example" version = "0.1.0" edition = "2021" [dependencies] tokio = { version = "1", features = ["full"] } main.rs use tokio::sync::mpsc; use tokio::task::spawn; use tokio::time::{sleep, Duration}; // Define the Event type #[derive(Debug)] struct Event { id: u32, data: String, } // Function to handle each event async fn handle_event(event: Event) { println!("Processing event: {:?}", event); // Simulate processing time sleep(Duration::from_millis(200)).await; } // Function to process data received by the receiver async fn process_data(mut rx: mpsc::Receiver) { while let Some(event) = rx.recv().await { handle_event(event).await; } } #[tokio::main] async fn main() { // Create the channel with a buffer size of 100 let (tx, rx) = mpsc::channel(100); // Spawn a task to process the received data spawn(process_data(rx)); // Simulate an event stream with dummy data for demonstration let event_stream = vec![ Event { id: 1, data: "Event 1".to_string() }, Event { id: 2, data: "Event 2".to_string() }, Event { id: 3, data: "Event 3".to_string() }, ]; // Send events through the channel for event in event_stream { if tx.send(event).await.is_err() { eprintln!("Receiver dropped"); } } } Rust Sample Files 1. Cargo.toml: + Specifies the package name, version, and edition. + Includes the necessary tokio dependency with the "full" feature set. 2. main.rs: + Defines an Event struct. + Implements the handle_event function to process each event. + Implements the process_data function to receive and process events from the channel. + Creates an event_stream with dummy data for demonstration purposes. + Uses the Tokio runtime to spawn a task for processing events and sends events through the channel in the main function. Benchmark Tools used for Testing To validate our performance improvements, extensive benchmarks were conducted in development and staging environments. Tools, such as hyperfine https://github.com/sharkdp/hyperfine and criterion.rs https://crates.io/crates/criterion were used to gather latency and throughput metrics. Various scenarios were simulated to emulate production-like loads, including peak traffic periods and edge cases. Production Validation In order to assess the real-world performance of the production environment, continuous monitoring was implemented using Grafana and Prometheus. This setup allowed for the tracking of key metrics such as write latency, throughput, and resource utilization. Additionally, alerts and dashboards were configured to promptly identify any deviations or bottlenecks in the system's performance, ensuring that potential issues could be addressed promptly. We of course deploy carefully to a low percentage of traffic over several weeks. The charts you see are the full-deploy after our validation phase. Benchmarks Are not Enough Load testing proved improvements. Though yes, testing doesn't prove success as much as it provides evidence. Write latency was consistently reduced from 120 milliseconds to 30 milliseconds. Response times were enhanced, and end-to-end data availability was accelerated. These advancements significantly improved overall performance and efficiency. Before and After Before the legacy system, service-to-service messaging was done with C library buffering. This involved multiple services in the message-passing loop, and the C library added latency through event buffering. The Python service added an extra layer of latency due to Python's Global Interpreter Lock (GIL) and its inherent operational overhead. These factors resulted in high end-to-end latency, complicated error handling and debugging processes, and limited scalability due to the bottlenecks introduced by event buffering and the Python GIL. After implementing Rust, message-passing via direct channels eliminated intermediary services, while Tokio enabled non-blocking asynchronous IO, significantly boosting throughput. Rust's strict compile-time guarantees reduced runtime errors, and we get robust performance. Improvements observed included a reduction in end-to-end latency from 120ms to 30ms, enhanced scalability through efficient resource management, and improved error handling and debugging facilitated by Rust's strict typing and error handling model. It's hard to argue using anything other than Rust. Deployment and Operations Minimal Operational Changes The deployment underwent minimal modifications to accommodate the migration from Python to Rust. Same deployment and CI/CD. Configuration management continued to leverage existing tools such as Ansible and Terraform, facilitating seamless integration. This allowed us to see a smooth transition without disrupting the existing deployment process. This is a common approach. You want to change as little as possible during a migration. That way, if a problem occurs, we can isolate the footprint and find the problem sooner. Monitoring and Maintenance Our application is seamlessly integrated with the existing monitoring stack, comprising Prometheus and Grafana, enabling real-time metrics monitoring. Rust's memory safety features and reduced runtime errors have significantly decreased the maintenance overhead, resulting in a more stable and efficient application. It's great to watch our build system work, and even better to catch the errors during development on our laptops allowing us to catch errors before we push commits that would cause builds to fail. Practical Impact on User Experience Improved Data AvailabilityQuicker write operations allow for near-instantaneous data readiness for reads and indexing, leading to user experience enhancements. These enhancements encompass reduced latency in data retrieval, enabling more efficient and responsive applications. Real-time analytics and insights are better too. This provides businesses with up-to-date information for informed decision-making. Furthermore, faster propagation of updates across all user interfaces ensures that users always have access to the most current data, enhancing collaboration and productivity within teams who use the APIs we offer. The latency is noticeable from an external perspective. Combining APIs can ensure now that data is available and sooner. Increased System Scalability and Reliability Rust-focused businesses will get a serious boost advantage. They'll be able to analyze larger amounts of data without their systems slowing down. This means you can keep up with the user load. And let's not forget the added bonus of a more resilient system with less downtime. We're running a business with a billion connected devices, where disruptions are a no-no and continuous operation is a must. Future Plans and Innovations Rust has proven to be successful in improving performance and scalability, and we are committed to expanding its utilization throughout our platform. We plan to extend Rust implementations to other performance-critical components, ensuring that the platform as a whole benefits from its advantages. As part of our ongoing commitment to innovation, we will continue to focus on performance tuning and architectural refinements in Rust, ensuring that it remains the optimal choice for mission-critical applications. Additionally, we will explore new asynchronous patterns and concurrency models in Rust, pushing the boundaries of what is possible with high-performance computing. Technologies like Rust enhance our competitive edge. We get to remain the leader in our space. Our critical infrastructure is Rusting in the best possible way. We are ensuring that our real-time communication services remain the best in class. The transition to Rust has not only reduced latency significantly but also laid a strong foundation for future enhancements in performance, scalability, and reliability. We deliver the best possible experience for our users. Rust combined with our dedication to providing the best API service possible to billions of users. Our experiences positions us well to meet and exceed the demands of real-time communication now and into the future. * 7 comments * share * save * hide * report all 7 comments sorted by: best topnewcontroversialoldrandomq&alive (beta) [ ] Want to add to the discussion? Post a comment! Create an account [-]epicar 53 points54 points55 points 5 hours ago (1 child) We love to see performance numbers. It is a core objective for us. We are excited who is we/us? * permalink * embed * save * report * reply [-]stephenlblum[S] 21 points22 points23 points 5 hours ago* (0 children) we/us PubNub We are a distributed team of engineers working at PubNub. Rust is our favorite language. And we are working to make sure that we get to use as much of Rust as possible. The outcomes are great each time we deploy a new Rust service. Our repeated success allows us to continue taking the advantages that Rust offers * permalink * embed * save * parent * report * reply [-]stephenlblum[S] 33 points34 points35 points 5 hours ago (1 child) Thank you to the Reddit r/rust community for requesting a rewrite of the original posted article. The original article was fluffy. It had no substance other than us saying: "look! we did a thing!". The new updated post was improved by u/rtkay123, u/Buttleston, u/ the-code-father and u/RedEyed__ thank you! The improvements they helped us with: 1. Proper Graphs and Charts with labels, legend and details on chat axis. 2. Removing logos and names to prevent any possible advertisement. 3. Posting directly to Reddit (vs linking out) 4. Covering all the details and questions asked here and elsewhere. 5. Annotated images using Reddit annotation feature. * permalink * embed * save * report * reply [-]the-code-father 0 points1 point2 points 15 minutes ago (0 children) Fwiw I think people are fine with (at least I am) out links that raise awareness of a product/company that uses Rust as long as the post contributes something meaningful/educational. It's definitely a much better post this time around! * permalink * embed * save * parent * report * reply [-]danted002 7 points8 points9 points 1 hour ago (2 children) Interesting, it would actually help to see what the original Python code did and what was the original C lib used. From experience, usually a rewrite introduces optimisations to a pipeline based on existing usage, which my itself brings a minimum of 2x performance. You also mentioned the GIL being a blocker, however you also mentioned using a C lib which is odd because all current established C libs release the GIL; so, without the code per se, I would say your 4x improvement comes from the fact that you don't buffer anymore and you stopped using a C lib that was blocking the GIL for no reasons, and not from the fact you switched languages. Usually a python to rust rewrite for CPU bound operations brings a 10x minimum performance boost. See uv, ruff and pydantic all of which boast an improvement between 10x and 30x so the fact that you only got a 4x boost seems like a "cannery in the mine" type of thing. For comparison on one of my projects that was processing somewhere between 5k and 10k events per second we achieved an 7x performance boost just by switching from a sync codebase to an async one. Would you know it, the IO was the actual bottleneck all along. * permalink * embed * save * report * reply [-]stephenlblum[S] 0 points1 point2 points 1 hour ago* (1 child) Hi u/danted002 excellent question! Performance gain really was +10x like you mentioned. From a CPU and scale perspective we did meet those gains. The CPU utilization was reduced. We can process more events per second. The end-to-end latency is now optimal from the transmission (send) that the new Rust service is responsible for. This was the larger latency improvement we could achieve by rewriting the transmitter. The remaining 30ms latency is from the downstream systems. During the life of the original Python service, about 10 years, we spent time optimizing our Python event service pipeline. We did our best to make it performant for what it could do for us. The bundling approach was a general improvement, though it required blocking the CPU with the GIL in our way. We knew eventually a rewrite was needed to get to the next level. We were considering better concurrency model in Python, potentially MP. Or an async approach that would be similar to what we did with Rust. The buffering event bundling was in Python land which was CPU bound and this is where the GIL came into play. Threads couldn't help us. While the buffering approach was originally an optimization it did prevent us from pushing performance further. We really did want to move to the Async approach like you were describing. It was one of our options we had on our list Our code was old and needed a rewrite anyway to achieve the async approach. We could have done this in Python. It really could have stayed in the Python world. We chose Rust since we are getting good at it. And each time we deploy a new Rust service it reduces our memory and CPU usage at the level of gains you mentioned. We are happy with the upgrade and looking to repeat this for our other services where it makes sense. Some of them are fine in Python today. If we can capture some noteworthy gains and a rewrite is on the table, then Rust is #1 choice [?] * permalink * embed * save * parent * report * reply [-]danted002 1 point2 points3 points 42 minutes ago (0 children) Thank you for the follow up. It answered all my questions. Basically you reached the end of what could be optimised with the existing codebase and required a full overhaul of the system. Since you needed a full rewrite anyway you went with Rust because... well it's Rust. Who doesn't love a low level language that can "talk" in high level abstractions which incidentally is also being fully embraced by the Python ecosystem. Thank you once again for the explanation. [?] * permalink * embed * save * parent * report * reply * about * blog * about * advertising * careers * help * site rules * Reddit help center * reddiquette * mod guidelines * contact us * apps & tools * Reddit for iPhone * Reddit for Android * mobile website * <3 * reddit premium Use of this site constitutes acceptance of our User Agreement and Privacy Policy. (c) 2024 reddit inc. All rights reserved. REDDIT and the ALIEN Logo are registered trademarks of reddit inc. [pixel] p Rendered by PID 56048 on reddit-service-r2-loggedout-bb9d5bc47-5hkmj at 2024-06-27 23:01:54.268236+00:00 running 9cb577f country code: US.