[HN Gopher] Optimizing Postgres table layout for maximum efficiency
       ___________________________________________________________________
        
       Optimizing Postgres table layout for maximum efficiency
        
       Author : napsterbr
       Score  : 65 points
       Date   : 2024-10-06 15:50 UTC (1 days ago)
        
 (HTM) web link (r.ena.to)
 (TXT) w3m dump (r.ena.to)
        
       | remus wrote:
       | Very interesting, but I think the author overstates the
       | importance of alignment a little. Unless your data/indexes are
       | already of a challenging size (or you expect them to be
       | imminently) for your hardware then fiddling with byte alignment
       | details feels like a fairly premature optimisation.
       | 
       | Disk is cheap, memory is plentiful, your time is expensive etc.
        
         | koolba wrote:
         | > Disk is cheap, memory is plentiful, your time is expensive
         | etc.
         | 
         | Taking the time to know the in memory sizing for your data
         | types is well worth it. Taking the time to think about the
         | types to use and sorting them by size is also minimal and well
         | worth it.
         | 
         | It may make sense for the system to do it automatically for
         | newly created tables. But maybe not as it's possible you'd want
         | the data layout to match some existing structure.
        
         | egnehots wrote:
         | As mentioned in the article, it's a good idea to consider this
         | when creating a new table, since it's essentially a free
         | optimization. However, it's probably not worth the hassle of
         | reordering a production table for that.
         | 
         | > Disk is cheap, memory is plentiful, but your time is
         | expensive, etc.
         | 
         | One thing to keep in mind, though, is that while you often have
         | plenty of disk space, RAM is still relatively expensive. It's
         | also divided into many smaller buffers, such as working memory
         | and shared buffers, which are not that large. These
         | optimizations help to fit more data into cache.
         | 
         | However, what the article said about alignment being important
         | for indexes is somewhat misleading. Reordering an index field
         | is not the same as reordering columns in a table. Beside having
         | to rewrite queries, it also changes the access pattern and the
         | time required to access the data, which is often much more
         | significant than the space saved. Indexes are, by nature, a
         | tradeoff where you give up space to gain time, so this mindset
         | doesn't really apply there.
        
           | napsterbr wrote:
           | Hey, author here.
           | 
           | > Indexes are, by nature, a tradeoff where you give up space
           | to gain time, so this mindset doesn't really apply there.
           | 
           | I agree that (re)aligning indexes are a different beast
           | entirely, but (as mentioned in my recommendation) ideally the
           | developer should keep this in mind when creating the index
           | initially.
           | 
           | Factors like cardinality and even readability should take
           | precedence over perfect alignment, but all else being equal,
           | aligning your indexes from the very moment they are
           | introduced in the codebase is the ideal scenario IMO.
        
         | sgarland wrote:
         | > Disk is cheap, memory is plentiful, your time is expensive
         | etc.
         | 
         | Spend 30 minutes one day playing around with Postgres, trying
         | different column combinations out. Boom, you now know how best
         | to order columns. This doesn't seem like a big ask.
         | 
         | The flip side is that changing data at scale is HARD, so if you
         | put things like this off, when you do finally need to squeeze
         | bytes, it's painful.
         | 
         | Also, memory is absolutely not plentiful. That's generally the
         | biggest bottleneck (or rather, the lack of it then makes IO the
         | bottleneck) for an RDBMS, assuming you have connection pooling
         | and aren't saturating the CPU with overhead.
        
       | koolba wrote:
       | The content itself is fine but the unnecessary image above the
       | label " _You have the right to remain aligned_ " with the usual
       | AI-generated garbage text cheapens the entire article.
        
         | loloquwowndueo wrote:
         | Are you kidding, that image is hilarious.
         | 
         | (Articles that intersperse annoying memes every 3 paragraphs
         | with obnoxious gif animations are much much worse).
        
       | wccrawford wrote:
       | Interesting. But if this is important, why doesn't Postgres do it
       | invisibly, automatically? Surely there's a most-correct sequence,
       | with the longest values first, and the shortest values last, and
       | then (probably) the variable-length values?
       | 
       | It could then happily report back the data in the order that the
       | DB Admin/developer expects.
        
         | j45 wrote:
         | Updating column rows in MySQL is pretty trivial, and I've
         | wondered why it's not the same in Postgres, even at the GUI
         | level.
         | 
         | Ordering columns for performance might be a different order of
         | reccomendation.
        
         | jeltz wrote:
         | Because nobody has implemented it yet. A patch would be very
         | welcome by many but nobody has written one.
        
       | johnthuss wrote:
       | This is a super useful thing to know and I'm thankful for this
       | article highlighting this aspect of Postgres.
       | 
       | But I would disagree with the takeaway to focus on optimizing
       | your indexes and not your tables. The reason is that the order of
       | columns in a multi-column index is highly meaningful and
       | intentional in order to support match on a range of values for
       | the last column in the index. The way databases work you can only
       | utilize a multi-column index on (customer_id int4, date
       | timestamp) if have an equality match on customer_id, like "WHERE
       | customer_id = 1 AND BETWEEN '2024-01-01' and '2025-01-01'". If
       | you reorder these columns in the index to put the larger date
       | column first, then, sure, you save space in the index, but you
       | also make it worthless - it will never be used by the query
       | above. As such, optimizing a multi-column index is only useful
       | when all the columns are queried for equality rather than a
       | range.
       | 
       | In contrast, when you are creating a NEW table you might not
       | think hard about the order of the columns in the table, and
       | especially not about the data-sizes of each column and their
       | alignment. But doing so at the time you create the table can be
       | tremendously beneficial if it is going to be very large. It is
       | important to note that you not only save space on-disk, but in
       | precious RAM when the tuples are loaded.
        
         | sgarland wrote:
         | This (index ordering resulting in useless indices) is not true,
         | at least not in newer versions of Postgres (I just tried with
         | 15).
         | 
         | While the query will take much longer (for me, it was about 47
         | msec vs 0.6 msec for 2,000,000 rows), it still uses the index.
         | 
         | Similarly, while normally you wouldn't expect a query using
         | predicate Y to use an index defined on columns (X, Y, Z) if
         | predicate X is also not referenced, Postgres may choose to do
         | so, depending on table stats, and the relative difference
         | between random_page_cost and seq_page_cost. I've seen it happen
         | before.
        
           | ghusbands wrote:
           | > This (index ordering resulting in useless indices) is not
           | true, at least not in newer versions of Postgres (I just
           | tried with 15).
           | 
           | > While the query will take much longer (for me, it was about
           | 47 msec vs 0.6 msec for 2,000,000 rows), it still uses the
           | index.
           | 
           | I'd argue that something being 78x slower can make it pretty
           | useless, but it is indeed at least used, in some cases.
        
         | jeltz wrote:
         | This is a limitation which is currently being worked on. The
         | order will still matter of course but it will allow PostgreSQL
         | to make some use of indexes even when the order of coulums does
         | not match.
         | 
         | https://www.postgresql.org/message-id/CAH2-Wzmn1YsLzOGgjAQZd...
        
           | johnthuss wrote:
           | >>> This is feasible in cases where the total number of
           | distinct values in the column 'a' is reasonably small (think
           | tens or hundreds, perhaps even thousands for very large
           | composite indexes).
           | 
           | It's great this is improving, but this is a fairly narrow
           | improvement. Personally, the multi-column indexes I use would
           | not be improved by this change since column 'a' does not
           | store a "reasonably small" distribution of values.
        
       | rtuin wrote:
       | It's really something you don't know how this applies to your
       | Postgres DB, until you run into a situation where you do.
       | 
       | The author explains this very well, it's a good read! I've
       | learned about this padding little over a year ago, while I was
       | designing a data intensive application with a colleague. I was
       | skeptical about the advantage at first, but for our specific
       | design, where we have 100 to 480+ columns in one table it makes a
       | huge difference on table store size. Not so much on the indexes,
       | though.
        
         | sgarland wrote:
         | > where we have 100 to 480+ columns in one table
         | 
         | I'm sorry, what? Why?
        
           | rtuin wrote:
           | Glad you asked!
           | 
           | This system contains measurements and state of physical
           | devices (time series). It's designed for both heavy write and
           | read, with slight emphasis on write. Each table is one type
           | of device and contains 1 to 5 different measurements/states.
           | But here's the trick: because data is queried with minimum
           | bucket size of 15minutes I figured we could just create a
           | column for each measurement + quarter of the day (i.e.
           | measure0000, measure0015), so that's 100 columns for each
           | measurement (96 quarter + 4 for DST), include the date in the
           | key, et voila: excellent write performance (because it's
           | mainly UPDATE queries) and good read performance.
           | 
           | Okay, the queries to make sense of the data aren't pretty,
           | but can be generated.
           | 
           | I find it really cool how effective this is for time-series
           | data without Postgres extensions (we're on RDS).
        
             | napsterbr wrote:
             | One interesting thing your team may want to look into (if
             | you haven't already) is compression.
             | 
             | Of course there are a multitude of variables we don't have
             | access from the outside, but Postgres only compresses data
             | that is TOASTed, and based on your description of the
             | table, the data is not being TOASTed (and therefore not
             | being compressed).
             | 
             | Instead, if you could somehow pack your timeseries entries
             | into an array, you would get the benefits of compression
             | automatically.
             | 
             | Given your write performance requirements, using an array
             | may be out-of-question (and you may get too much overhead
             | from dead tuples) -- but who knows? Always a good idea to
             | benchmark.
             | 
             | I actually considered mentioning this at the post but
             | figured it was too long already and could be the material
             | for a future one :)
        
             | aidos wrote:
             | The updates wouldn't be in place anyway because of the
             | Postgres mvcc approach, right?
             | 
             | I can see the advantage in terms of just needing a single
             | tuple for a reads. So a timestamp + value model would
             | likely take twice as much heap space than your approach?
             | 
             | Given that you're probably always just inserting new data
             | you could use a brin index to get fast reads on the date
             | ranges. Would be interesting to see it in action and play
             | around to see the tradeoffs. The model you've settled on
             | sounds like it would be a pain to query.
        
       | OliverJones wrote:
       | Good material!
       | 
       | It has to be said, the order of columns in correctly designed
       | multicolumn BTREE indexes is governed by the shape of the queries
       | the indexes support.
       | 
       | So don't arbitarily reorder columns in your indexes to handle
       | alignment.
        
       | delduca wrote:
       | Could the Django ORM deal with this?
        
         | sgarland wrote:
         | Yes, but it doesn't that I'm aware of. Certainly able to
         | override the classes yourself, though. Make a lookup table of
         | column type / length and then reorder based on that.
        
         | jbkkd wrote:
         | It can't right now, but this is a doable extension
        
       | SoftTalker wrote:
       | This is the sort of thing that good DBAs used to know about and
       | manage, but nowadays that isn't a fashionable job.
        
       | didgetmaster wrote:
       | It's been awhile since I dug into the bowels of PG; but the
       | overall row size with respect to page size used to be important.
       | If you had a table where every row took up 4097 bytes and the
       | page size was 8K; the your disk footprint for that table was
       | double. Only one row fit in a page and the other (nearly) half of
       | each page was wasted.
       | 
       | If that is still true, then alignment issues could push you over
       | the edge if you are close to it.
        
       | gnfargbl wrote:
       | The ever-useful postgres_dba tool
       | (https://github.com/NikolayS/postgres_dba) can help with this.
       | Install it, start psql, run :dba and choose option p1 to see how
       | much table space you would save by reordering your columns.
        
       | tiffanyh wrote:
       | How much of this article is rooted in hard drives with physically
       | mechanical spinning disk latency assumptions, vs ssd/nvme?
        
       ___________________________________________________________________
       (page generated 2024-10-07 23:00 UTC)