https://thenewstack.io/how-meta-patches-linux-at-hyperscale/ TNS OK VOXPOP How has the recent turmoil within the OpenAI offices changed your plans to use GPT in a business process or product in 2024? Increased uncertainty means we are more likely to evaluate alternative AI chatbots and LLMs. 0% No change in plans, though we will keep an eye on the situation. 0% With Sam Altman back in charge, we are more likely to go all-in with GPT and LLMs. 0% What recent turmoil? 0% Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter: [ ] SUBMIT TNS DAILY We've launched a new daily email newsletter! You can now receive a free roundup of the most recent TNS articles in your inbox each day. Register now, never miss a story, always stay in-the-know. [ ] SUBSCRIBE [ ] Search More Results ARCHITECTURE Cloud Native Ecosystem Containers Edge Computing Microservices Networking Serverless Storage ENGINEERING AI Frontend Development Software Development TypeScript WebAssembly Cloud Services Data Security OPERATIONS Platform Engineering Operations CI/CD Tech Life DevOps Kubernetes Observability Service Mesh CHANNELS Podcasts Ebooks Events Newsletter TNS RSS Feeds THE NEW STACK About / Contact Sponsors Sponsorship Contributions PODCASTS EBOOKS EVENTS NEWSLETTER ARCHITECTURE ENGINEERING OPERATIONS Cloud Native Ecosystem Containers Edge Computing Microservices Networking Serverless Storage Microsoft's New .NET Dev Tool Draws Community Support Nov 30th 2023 8:54am, by Darryl K. Taft Amazon S3 Express One Zone Introduces Near-Real Time Object Storage Nov 28th 2023 1:20pm, by Joab Jackson Cloud Native Users Struggle to Achieve Benefits, Report Says Nov 28th 2023 5:00am, by Charles Humble Multicloud Architecture: What I Want to See Nov 17th 2023 10:19am, by Robert Sonders Create a Movie Recommendation Engine with Milvus and Python Nov 17th 2023 9:32am, by Gourav Singh Bais Demystifying Container Security for Developers Nov 28th 2023 12:21pm, by David Benas Why Securing Containers Is Like Going to the Dentist Nov 17th 2023 7:09am, by Neta Krakover Dell GA's APEX Cloud Platform for Red Hat OpenShift Nov 2nd 2023 11:00am, by Chris J. Preimesberger Lynis: Run a Security Audit on Linux for Free Oct 28th 2023 6:00am, by Jack Wallen Why Capistrano Got Usurped by Docker and Then Kubernetes Oct 25th 2023 1:53pm, by David Eastman Arm Pushes AI into the Smallest IoT Devices with Cortex-M52 Chip Nov 27th 2023 1:06pm, by Jeffrey Burt TikTok to Open Source 'Cloud-Neutralizing' Edge Accelerator Nov 20th 2023 8:30am, by Joab Jackson Docker at the Edge: How Machine Learning Transformed Fowl Task Nov 9th 2023 10:28am, by Loraine Lawson The 2023 State of Kubernetes in Production Nov 7th 2023 8:10am, by Jennifer Riggins Edge AI: How to Make the Magic Happen with Kubernetes Sep 14th 2023 4:00am, by Saad Malik Securing Microservices Communication with mTLS in Kubernetes Nov 28th 2023 3:00am, by Robert Kimani AppMap Releases Runtime Code Review as a GitHub Action Nov 21st 2023 3:00am, by Susan Hall Why Microservices Aren't Working for You Nov 20th 2023 6:34am, by Ned Harris The Struggle for Microservice Integration Testing Nov 10th 2023 6:30am, by Nocnica Mellifera Docker at the Edge: How Machine Learning Transformed Fowl Task Nov 9th 2023 10:28am, by Loraine Lawson Why Is IPv6 Adoption Slow? Nov 13th 2023 9:20am, by Tucker Preston Effective Traffic Management with Kubernetes Gateway API Policies Nov 7th 2023 3:00am, by Robert Kimani Enhancing Kubernetes Networking with the Gateway API Nov 3rd 2023 3:30am, by Robert Kimani Cilium CNCF Graduation Could Mean Better Observability, Security with eBPF Oct 13th 2023 4:00am, by B. Cameron Gain Performant and Programmable Telco Networking with eBPF Aug 11th 2023 10:00am, by Bill Mulligan Netlify Launches Composable Web Platform for Enterprise Devs Oct 19th 2023 9:00am, by Richard MacManus 4 Factors of a WebAssembly Native World Oct 12th 2023 9:00am, by Liam Crilly Raising the Serverless Bar: Infrastructure APIs Unleash More Value for Enterprises Oct 10th 2023 10:00am, by Tyson Trautmann The Security-First Mindset to Unlocking the AWS Opportunity Aug 9th 2023 8:14am, by David Melamed 3 Reasons Why Teams Move Away from AWS Lambda Jul 18th 2023 10:00am, by Jonathan Michaux Amazon S3 Express One Zone Introduces Near-Real Time Object Storage Nov 28th 2023 1:20pm, by Joab Jackson How Epic Games Revs up Unreal Engine 'Cook Time' for Devs Nov 28th 2023 7:20am, by Cynthia Dunlop How to Get Data Warehouse Performance on the Data Lakehouse Nov 20th 2023 9:00am, by Sida Shen MongoDB vs. ScyllaDB: A Comparison of Database Architectures Nov 13th 2023 1:06pm, by Daniel Seybold Doing DynamoDB Better, More Affordably, All at Once Oct 30th 2023 11:00am, by Tzach Livyatan and Felipe Cardeneti Mendes AI Frontend Development Software Development TypeScript WebAssembly Cloud Services Data Security How to Mature Your DevOps Automation Practices Nov 30th 2023 6:32am, by Saif Gunja Tackling AI's Black Box: Howso Challenges PyTorch and JAX Nov 30th 2023 3:00am, by Loraine Lawson How to Easily Add AI to Your Applications Nov 28th 2023 10:44am, by Jason Monden Docker CTO Explains How Docker Can Support AI Efforts Nov 28th 2023 10:09am, by Loraine Lawson Arm Pushes AI into the Smallest IoT Devices with Cortex-M52 Chip Nov 27th 2023 1:06pm, by Jeffrey Burt Tackling AI's Black Box: Howso Challenges PyTorch and JAX Nov 30th 2023 3:00am, by Loraine Lawson The State of the Open Web: 3 Takeaways Heading into 2024 Nov 29th 2023 10:26am, by Richard MacManus Dev News: Vite Rust-ifies, Roc Language, JS Framework SDKs Nov 24th 2023 7:30am, by Loraine Lawson React Panel: Frontend Should Embrace React Server Components Nov 23rd 2023 9:30am, by Loraine Lawson How Qwik's Astro Integration Beats Both React and Vanilla JS Nov 23rd 2023 8:47am, by Paul Scanlon Hey Programming Language Developer -- Get over Yourself Nov 30th 2023 12:30pm, by Alex Williams Microsoft's New .NET Dev Tool Draws Community Support Nov 30th 2023 8:54am, by Darryl K. Taft Thinking in Systems: A Sociotechnical Approach to DevOps Nov 30th 2023 8:22am, by Tao Hansen How to Use Python If-Else Statements Nov 30th 2023 5:00am, by Jack Wallen Tic-Tac-Toe with Python and Tkinter Nov 29th 2023 9:46am, by Jessica Wachtel How to Get Advantages of TypeScript in JavaScript Oct 27th 2023 10:51am, by Phil Nash Dev News: Udemy's New Docker Program, Plus TypeScript Beta Oct 7th 2023 5:01am, by Loraine Lawson The Angular Renaissance: Why Frontend Devs Should Revisit It Sep 26th 2023 8:15am, by Loraine Lawson Dev News: A 'Nue' Frontend Dev Tool; Panda and Bun Updates Sep 16th 2023 4:00am, by Loraine Lawson Dev News: Svelte 5 vs. VanillaJS and Google's Project IDX Aug 12th 2023 8:00am, by Loraine Lawson Why WebAssembly Is a Good Fit for Extensible Control Planes Nov 29th 2023 1:36pm, by Charles Humble Dev News: Vite Rust-ifies, Roc Language, JS Framework SDKs Nov 24th 2023 7:30am, by Loraine Lawson WebAssembly's Status in Computing Nov 14th 2023 11:14am, by B. Cameron Gain Who's Leading WebAssembly Adoption? So Far, Vendors Nov 6th 2023 6:00am, by Lawrence E Hecht Can Kubernetes Solve WebAssembly's Component Challenges? Nov 2nd 2023 7:13am, by B. Cameron Gain 4 Guidelines to Tame Your Hybrid Cloud Migration Nov 30th 2023 8:00am, by Silvia Davis Amazon Q, a GenAI to Understand AWS (and Your Business Docs) Nov 29th 2023 8:47am, by Joab Jackson VMware Unveils a Pile of New Data Services for Its Cloud Nov 22nd 2023 8:44am, by Chris J. Preimesberger A Guide to Migrating to the AWS Identity Center Nov 20th 2023 12:00pm, by Sharon Kisluk How to Manage Cloud Services with Terraform Nov 20th 2023 9:34am, by Rauf Gadirov Tackling AI's Black Box: Howso Challenges PyTorch and JAX Nov 30th 2023 3:00am, by Loraine Lawson Docker CTO Explains How Docker Can Support AI Efforts Nov 28th 2023 10:09am, by Loraine Lawson How Epic Games Revs up Unreal Engine 'Cook Time' for Devs Nov 28th 2023 7:20am, by Cynthia Dunlop How Discord Scales up to Millions of Users on a Single Guild (Server) Nov 28th 2023 6:00am, by Chris J. Preimesberger Puzzling over the Postgres Query Planner with LLMs Nov 27th 2023 9:27am, by Jon Udell Mitigate OWASP Security Top Threats with an API Gateway Nov 29th 2023 10:40am, by David Sudia How Cilium's Mutual Authentication Can Compromise Security Nov 29th 2023 9:14am, by Christian Posta How a Popular Combo Provides DDoS Protection Nov 29th 2023 8:12am, by Andrey Slastenov Demystifying Container Security for Developers Nov 28th 2023 12:21pm, by David Benas Securing Microservices Communication with mTLS in Kubernetes Nov 28th 2023 3:00am, by Robert Kimani Platform Engineering Operations CI/CD Tech Life DevOps Kubernetes Observability Service Mesh Measuring the Impact of Your Platform Engineering Strategy Nov 30th 2023 9:30am, by David Williams How to Mature Your DevOps Automation Practices Nov 30th 2023 6:32am, by Saif Gunja Cloud Native Users Struggle to Achieve Benefits, Report Says Nov 28th 2023 5:00am, by Charles Humble A Shortcut to Building an Enterprise-Grade Developer Platform Nov 27th 2023 10:02am, by Luca Galante How to Be an Effective Platform Engineering Team Nov 22nd 2023 6:53am, by Nocnica Mellifera How Meta Patches Linux at Hyperscale Dec 1st 2023 3:00am, by Steven J. Vaughan-Nichols How a Popular Combo Provides DDoS Protection Nov 29th 2023 8:12am, by Andrey Slastenov Does Kubernetes Really Perform Better on Bare Metal vs. VMs? Nov 24th 2023 6:30am, by Oleg Zinovyev Two Easy Ways to Improve Your Website Homepage Nov 21st 2023 11:00am, by Andy Corrigan 5 Ways to Supercharge Incident Remediation with Automation Nov 21st 2023 9:30am, by Greg Chase How to Observe Your CI/CD Pipelines with OpenTelemetry Nov 28th 2023 7:40am, by Adriana Villela and Reese Lee How to Use Databases Inside GitHub Actions Nov 9th 2023 9:40am, by Gerald Venzl Securing CI/CD Pipelines: A Comprehensive Approach Is Vital Oct 25th 2023 6:06am, by Noah Simon Continuous Release: Move Faster without Breaking Things Oct 19th 2023 10:55am, by Karishma Irani No Workflow Is an Island Oct 13th 2023 7:09am, by Mark Fussell How to Use Python If-Else Statements Nov 30th 2023 5:00am, by Jack Wallen Tic-Tac-Toe with Python and Tkinter Nov 29th 2023 9:46am, by Jessica Wachtel Meet 'Anna Boyko': How a Fake Speaker Blew up DevTernity Nov 28th 2023 11:24am, by Richard Gall Cloud Native Users Struggle to Achieve Benefits, Report Says Nov 28th 2023 5:00am, by Charles Humble How to Conduct an Interview for a Senior Developer Role Nov 24th 2023 5:30am, by David Eastman Thinking in Systems: A Sociotechnical Approach to DevOps Nov 30th 2023 8:22am, by Tao Hansen 4 Guidelines to Tame Your Hybrid Cloud Migration Nov 30th 2023 8:00am, by Silvia Davis How to Mature Your DevOps Automation Practices Nov 30th 2023 6:32am, by Saif Gunja Making Testing Easier with Testcontainers Nov 29th 2023 1:13pm, by Alex Williams The Highs and Lows of Low-Code Tools Nov 29th 2023 7:05am, by Tony Graham Microsoft's New .NET Dev Tool Draws Community Support Nov 30th 2023 8:54am, by Darryl K. Taft Why WebAssembly Is a Good Fit for Extensible Control Planes Nov 29th 2023 1:36pm, by Charles Humble How Cilium's Mutual Authentication Can Compromise Security Nov 29th 2023 9:14am, by Christian Posta OpenTelemetry for Go Is Almost a Go Nov 29th 2023 7:22am, by B. Cameron Gain Securing Microservices Communication with mTLS in Kubernetes Nov 28th 2023 3:00am, by Robert Kimani Hey Programming Language Developer -- Get over Yourself Nov 30th 2023 12:30pm, by Alex Williams OpenTelemetry for Go Is Almost a Go Nov 29th 2023 7:22am, by B. Cameron Gain How to Observe Your CI/CD Pipelines with OpenTelemetry Nov 28th 2023 7:40am, by Adriana Villela and Reese Lee 5 Ways to Supercharge Incident Remediation with Automation Nov 21st 2023 9:30am, by Greg Chase What Grafana's Purchase of Asserts.ai Means for the User Nov 16th 2023 1:00pm, by B. Cameron Gain Using JWTs to Authenticate Services Unravels API Gateways Nov 8th 2023 6:53am, by Christian Posta and Peter Jausovec Enhancing Kubernetes Networking with the Gateway API Nov 3rd 2023 3:30am, by Robert Kimani Linkerd Enterprise Creators: Keep the Sidecar Mesh Oct 31st 2023 7:05am, by B. Cameron Gain Scaling Environments with OpenTelemetry and Service Mesh Oct 17th 2023 11:13am, by Anirudh Ramanathan Kubernetes 1.28 Accommodates the Service Mesh, Sudden Outages Aug 18th 2023 10:08am, by Joab Jackson 2023-12-01 03:00:56 How Meta Patches Linux at Hyperscale Linux / Operations How Meta Patches Linux at Hyperscale Patching Linux is easy. Except when you need to patch tens of thousands of servers without downtime. Here's how Meta does it. Dec 1st, 2023 3:00am by Steven J. Vaughan-Nichols Featued image for: How Meta Patches Linux at Hyperscale Feature image by Casey Allen on Unsplash. VOXPOP Try our new 5 second poll. It's fast. And it's fun! How has the recent turmoil within the OpenAI offices changed your plans to use GPT in a business process or product in 2024? Increased uncertainty means we are more likely to evaluate alternative AI chatbots and LLMs. No change in plans, though we will keep an eye on the situation. With Sam Altman back in charge, we are more likely to go all-in with GPT and LLMs. What recent turmoil? I HAVE AN OPINION We'd love to hear what you think. RICHMOND, Va. -- Anyone with a tech clue can patch a Linux server. But, patching thousands of them without any downtime, that's not easy. At the Linux Plumbers Conference, the invite-only conference of top Linux kernel developers earlier this month, Meta Linux kernel engineer Breno Leitao explained how Facebook pulls the trick off with its millions of servers around the world. If you were to use ordinary techniques, Leitao said it would take more than 45 days to roll out a new kernel to all machines. As he put it, "Draining and un-draining hosts is hard." You can say that again. That may be fine if it's a minor update, but if it's a security patch, that won't work. So, Meta uses Kernel Live Patching (KLP) with Red Hat's Kpatch, to deliver fast patches. In KLP, you can apply the latest security updates to Linux kernels without rebooting. This maximizes system uptime and availability. Live Kernel Patches Kernel live patches are delivered as packages with modified code that are separate from the main kernel package. The live patches are cumulative, so the latest patch contains all fixes from the previous ones for the kernel package. Each kernel live package is tied to the exact kernel revision for which it is issued. Live patches won't work on everything, though. You can't patch data or structure. Another problem is that extra engineering work is usually required to make a live patch. As Leitao warned, "It's not just as simple as compiling the live patch, and knowing it'll be safe and applying it. These are kernel modules, you can break things if you're not careful. There are no guarantees provided that the patch itself is correct." Kpatch works by comparing the original and patched kernels and then uses a customized kernel module to patch the new code into the running kernel. The Kpatch process then watches the stack of existing processes using ftrace to see if a patch can be made without any harmful effects. When it's safe, it redirects the running code to the patched functions and then removes the now outdated code. And, there you are, your server's patched, and there's been no downtime. Of course, it's not that simple in practice. Leitao explained, "At Meta, when we apply a live patch, it usually takes one to two seconds to apply the patch to the host. That's to a single host, obviously not to like the whole fleet of servers, but one to two seconds for a host is really, really fast compared to even kexec," the Linux kernel mechanism for booting a new kernel. It doesn't require any downtime or workload migration, you just apply the live patch, and off you go." How to Patch Millions of Machines But, when you're talking about millions of machines, that's not the entire story. Meta will find bugs during their patch rollouts, so the administrators start by patching a release candidate tier. So, as the package roller delivers the RPM-based patches, the servers' health is automatically checked as well. Meta looks for crashes, major alarms, and application problems and performances in the new kernels. This data is pulled up from a variety of sources, including crashes, netconsole results, and core dumps. If the error rate goes over one crash per thousand servers, the patch is pulled, and the old kernel is restored. With over a billion users, Facebook also keeps a close eye on performance. As Leitao said, "The live patch performance overhead is small, but there is always a concern when a relatively hot function is patched." While Meta uses Kpatch, there are alternatives. SUSE offers kGraft; while Oracle uses Ksplice; and Canonical supports Livepatch. Regardless of the code, they all deliver similar results. So, if you'd rather not have downtime with your servers, data centers, and clouds, follow Meta's example and use live patching. You'll be glad you did. TRENDING STORIES Created with Sketch. [cee63948-c] Steven J. Vaughan-Nichols, aka sjvn, has been writing about technology and the business of technology since CP/M-80 was the cutting-edge PC operating system, 300bps was a fast internet connection, WordStar was the state-of-the-art word processor, and we liked it. Read more from Steven J. Vaughan-Nichols SHARE THIS STORY TRENDING STORIES Red Hat and Oracle are sponsors of The New Stack. SHARE THIS STORY TRENDING STORIES THE NEW STACK UPDATE A newsletter digest of the week's most important stories & analyses. [ ] SUBSCRIBE The New stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy. ARCHITECTURE Cloud Native Ecosystem Containers Edge Computing Microservices Networking Serverless Storage ENGINEERING AI Frontend Development Software Development TypeScript WebAssembly Cloud Services Data Security OPERATIONS Platform Engineering Operations CI/CD Tech Life DevOps Kubernetes Observability Service Mesh CHANNELS Podcasts Ebooks Events Newsletter TNS RSS Feeds THE NEW STACK About / Contact Sponsors Sponsorship Contributions FOLLOW TNS roadmap.sh Community created roadmaps, articles, resources and journeys for developers to help you choose your path and grow in your career. Frontend Developer Roadmap Backend Developer Roadmap Devops Roadmap (c) The New Stack 2023 Disclosures Terms of Use Privacy Policy Cookie Policy FOLLOW TNS TNS DAILY SUBSCRIBE * *