https://planetscale.com/blog/performant-database-tree-traversal-with-rails Get started * Products * Customers * Enterprise * Pricing * Resources * Contact Us Sign inGet started BlogEngineering Follow us * * * * * * * We ship a lot, don't miss a thing. [ ]Subscribe Performant database tree traversal with Rails Learn how to solve a tree traversal N+1 query problem in your Rails application. Performant database tree traversal with Rails Written by Mike Coutermarsh Mike Coutermarsh Last updated July 12, 2023 Table of contents[Going one-by-one is slow ] We recently solved an interesting performance problem in our Rails API when performing a tree traversal. We have some code that figures out a 3-way merge for your database schemas. To do this, we need to calculate a merge base between two schemas (like git!). Our Rails API keeps track of changes to each PlanetScale database schema. We call each of these changes a "schema snapshot," similar to a git commit that stores the state of the schema at a specific time. Each snapshot can have one or two parents. When merging branches, we perform a breadth-first search on the history of each change until we find the common ancestor between both branches. This is the merge base. An image showing the merge base with two arrows coming out of it: one for alter table teams and another for alter table users. The alter table users has an arrow coming of it for create table payments. Going one-by-one is slow# Finding the merge base involves checking each snapshot in the history of both branches. Usually, this is fast because most databases only have a couple changes. Others, though, would have thousands. This is where we'd run into performance issues. Since we were doing a database query for every step as we traversed the tree. Even though each query was fast, the network time alone added up quickly. SQL select * from schema_snapshots where id = 20 select * from schema_snapshots where id = 21 select * from schema_snapshots where id = 22 select * from schema_snapshots where id = 23 select * from schema_snapshots where id = 24 // *thousands more queries* Solving the N+1 problem# This is a classic "N+1" performance problem. For each step in the process we trigger another query. Usually, fixing these in Rails is quite simple. There is an includes method for preloading the data we need. Unfortunately in this case, due to the data structure, the normal preloading techniques do not work. We had to invent something new. In-memory cache# First, we started by creating an in-memory cache for storing the snapshots. This is a temporary cache that only exists to find the merge base. In-memory is important here because our primary performance issue was caused by the sheer number of network requests. Ruby class SnapshotMergeBaseCache attr_reader :store, :hits, :misses def initialize @store = {} @misses = 0 @hits = 0 end def get(id) if @store[id] @hits += 1 return @store[id] end @misses += 1 add(SchemaSnapshot.find(id)) end def add(node) @store[node.id] = node end end This gives us a place to store all of our snapshots in memory. It uses a simple hash where the id is the key, and the value is the snapshot data. The add method puts a snapshot into the cache. The get method is for retrieving it. The get method also keeps stats on the hits/misses. We used these stats to understand how well it worked once in production. Preloading the cache# Now that we have the cache, the next step is bulk-loading it with snapshots. Conveniently, the snapshot history is pretty predictable when finding the merge base. We could preload the X most recent snapshots for each branch and drastically reduce the number of trips back to the database. Ruby cache = SnapshotMergeBaseCache.new from_branch.recent_snapshots.limit(FROM_PRELOAD_COUNT).each do |snap| cache.add(snap) end into_branch.recent_snapshots.limit(INTO_PRELOAD_COUNT).each do |snap| cache.add(snap) end Now, when running our breadth-first search, we can use cache.get(id) to find the next node. It hits the cache in most cases, avoiding the network request and solving our performance problem. Rolling out & testing# Making changes like this can be tricky. There is often a wide gap between what you expect to happen and the reality of production. First, we needed to ensure it was accurate. We ran a few tests where we calculated the merge base using both the old and new methods for thousands of databases. This made us confident the new code was returning the correct results. We then used feature flags to test rolling out the new code path and recorded data on how it performed. The hits and misses data proved useful for fine-tuning the number of snapshots we preloaded. After a couple iterations, we released it to 100% of our customers. Alternative solutions# Adding an in-memory cache is just one way of solving this problem. This worked out best for us due to the high number of snapshots we needed to traverse for some databases. It was also simple to layer this solution onto our existing code without many major changes. This reduced the risk when rolling it out. Database recursive CTE# One option for solving this is letting the database do the work. This can be accomplished with a recursive common table expression. With this, the database could follow the pointer to each record until it finds the common merge base. Materialized path# The materialized path technique is a way to represent a graph in a SQL database. It stores the relationship history in a single column, such as 20/19/15/10/5/3/1. By doing this, you can then look at the history of two nodes and find their common parent. This is a great option that works well for tree structures with a known limit. In our case, storing thousands of relationships didn't make this feasible. Want to supercharge your Rails database? Sign up for PlanetScale Share Next post Announcing PlanetScale Scaler Pro Written by Mike Coutermarsh Mike Coutermarsh Last updated July 12, 2023 Table of contents * Going one-by-one is slow * Solving the N+1 problem + In-memory cache + Preloading the cache + Rolling out & testing * Alternative solutions + Database recursive CTE + Materialized path Upgrade your Enterprise The more reliable way to scale your business. Contact sales Related posts [image] Engineering How PlanetScale keeps your data safe A detailed description of the multi-layered approach PlanetScale takes to ensure your data is safe. Sam Lambert [image] Engineering Datetimes versus timestamps in MySQL Storing datetime and timestamp data in MySQL correctly. Aaron Francis Your business deserves a predictable database. Never worry about another 3am wake-up call saying the site is down. Give your engineers the power they deserve with a PlanetScale database today. Get startedContact sales Company * About * Blog * Careers * Changelog Product * Case studies * Enterprise * Pricing Open Source * Vitess * Vitess community * GitHub Resources * Docs * Courses * Support * Status * Contact Talk to us * Call +1 408 214 1997 * Contact Sales (c) 2023 PlanetScale, Inc. All rights reserved. PrivacyTermsCookiesDo Not Share My Personal Information