[HN Gopher] Optimizing Postgres table layout for maximum efficiency
___________________________________________________________________
Optimizing Postgres table layout for maximum efficiency
Author : napsterbr
Score : 65 points
Date : 2024-10-06 15:50 UTC (1 days ago)
(HTM) web link (r.ena.to)
(TXT) w3m dump (r.ena.to)
| remus wrote:
| Very interesting, but I think the author overstates the
| importance of alignment a little. Unless your data/indexes are
| already of a challenging size (or you expect them to be
| imminently) for your hardware then fiddling with byte alignment
| details feels like a fairly premature optimisation.
|
| Disk is cheap, memory is plentiful, your time is expensive etc.
| koolba wrote:
| > Disk is cheap, memory is plentiful, your time is expensive
| etc.
|
| Taking the time to know the in memory sizing for your data
| types is well worth it. Taking the time to think about the
| types to use and sorting them by size is also minimal and well
| worth it.
|
| It may make sense for the system to do it automatically for
| newly created tables. But maybe not as it's possible you'd want
| the data layout to match some existing structure.
| egnehots wrote:
| As mentioned in the article, it's a good idea to consider this
| when creating a new table, since it's essentially a free
| optimization. However, it's probably not worth the hassle of
| reordering a production table for that.
|
| > Disk is cheap, memory is plentiful, but your time is
| expensive, etc.
|
| One thing to keep in mind, though, is that while you often have
| plenty of disk space, RAM is still relatively expensive. It's
| also divided into many smaller buffers, such as working memory
| and shared buffers, which are not that large. These
| optimizations help to fit more data into cache.
|
| However, what the article said about alignment being important
| for indexes is somewhat misleading. Reordering an index field
| is not the same as reordering columns in a table. Beside having
| to rewrite queries, it also changes the access pattern and the
| time required to access the data, which is often much more
| significant than the space saved. Indexes are, by nature, a
| tradeoff where you give up space to gain time, so this mindset
| doesn't really apply there.
| napsterbr wrote:
| Hey, author here.
|
| > Indexes are, by nature, a tradeoff where you give up space
| to gain time, so this mindset doesn't really apply there.
|
| I agree that (re)aligning indexes are a different beast
| entirely, but (as mentioned in my recommendation) ideally the
| developer should keep this in mind when creating the index
| initially.
|
| Factors like cardinality and even readability should take
| precedence over perfect alignment, but all else being equal,
| aligning your indexes from the very moment they are
| introduced in the codebase is the ideal scenario IMO.
| sgarland wrote:
| > Disk is cheap, memory is plentiful, your time is expensive
| etc.
|
| Spend 30 minutes one day playing around with Postgres, trying
| different column combinations out. Boom, you now know how best
| to order columns. This doesn't seem like a big ask.
|
| The flip side is that changing data at scale is HARD, so if you
| put things like this off, when you do finally need to squeeze
| bytes, it's painful.
|
| Also, memory is absolutely not plentiful. That's generally the
| biggest bottleneck (or rather, the lack of it then makes IO the
| bottleneck) for an RDBMS, assuming you have connection pooling
| and aren't saturating the CPU with overhead.
| koolba wrote:
| The content itself is fine but the unnecessary image above the
| label " _You have the right to remain aligned_ " with the usual
| AI-generated garbage text cheapens the entire article.
| loloquwowndueo wrote:
| Are you kidding, that image is hilarious.
|
| (Articles that intersperse annoying memes every 3 paragraphs
| with obnoxious gif animations are much much worse).
| wccrawford wrote:
| Interesting. But if this is important, why doesn't Postgres do it
| invisibly, automatically? Surely there's a most-correct sequence,
| with the longest values first, and the shortest values last, and
| then (probably) the variable-length values?
|
| It could then happily report back the data in the order that the
| DB Admin/developer expects.
| j45 wrote:
| Updating column rows in MySQL is pretty trivial, and I've
| wondered why it's not the same in Postgres, even at the GUI
| level.
|
| Ordering columns for performance might be a different order of
| reccomendation.
| jeltz wrote:
| Because nobody has implemented it yet. A patch would be very
| welcome by many but nobody has written one.
| johnthuss wrote:
| This is a super useful thing to know and I'm thankful for this
| article highlighting this aspect of Postgres.
|
| But I would disagree with the takeaway to focus on optimizing
| your indexes and not your tables. The reason is that the order of
| columns in a multi-column index is highly meaningful and
| intentional in order to support match on a range of values for
| the last column in the index. The way databases work you can only
| utilize a multi-column index on (customer_id int4, date
| timestamp) if have an equality match on customer_id, like "WHERE
| customer_id = 1 AND BETWEEN '2024-01-01' and '2025-01-01'". If
| you reorder these columns in the index to put the larger date
| column first, then, sure, you save space in the index, but you
| also make it worthless - it will never be used by the query
| above. As such, optimizing a multi-column index is only useful
| when all the columns are queried for equality rather than a
| range.
|
| In contrast, when you are creating a NEW table you might not
| think hard about the order of the columns in the table, and
| especially not about the data-sizes of each column and their
| alignment. But doing so at the time you create the table can be
| tremendously beneficial if it is going to be very large. It is
| important to note that you not only save space on-disk, but in
| precious RAM when the tuples are loaded.
| sgarland wrote:
| This (index ordering resulting in useless indices) is not true,
| at least not in newer versions of Postgres (I just tried with
| 15).
|
| While the query will take much longer (for me, it was about 47
| msec vs 0.6 msec for 2,000,000 rows), it still uses the index.
|
| Similarly, while normally you wouldn't expect a query using
| predicate Y to use an index defined on columns (X, Y, Z) if
| predicate X is also not referenced, Postgres may choose to do
| so, depending on table stats, and the relative difference
| between random_page_cost and seq_page_cost. I've seen it happen
| before.
| ghusbands wrote:
| > This (index ordering resulting in useless indices) is not
| true, at least not in newer versions of Postgres (I just
| tried with 15).
|
| > While the query will take much longer (for me, it was about
| 47 msec vs 0.6 msec for 2,000,000 rows), it still uses the
| index.
|
| I'd argue that something being 78x slower can make it pretty
| useless, but it is indeed at least used, in some cases.
| jeltz wrote:
| This is a limitation which is currently being worked on. The
| order will still matter of course but it will allow PostgreSQL
| to make some use of indexes even when the order of coulums does
| not match.
|
| https://www.postgresql.org/message-id/CAH2-Wzmn1YsLzOGgjAQZd...
| johnthuss wrote:
| >>> This is feasible in cases where the total number of
| distinct values in the column 'a' is reasonably small (think
| tens or hundreds, perhaps even thousands for very large
| composite indexes).
|
| It's great this is improving, but this is a fairly narrow
| improvement. Personally, the multi-column indexes I use would
| not be improved by this change since column 'a' does not
| store a "reasonably small" distribution of values.
| rtuin wrote:
| It's really something you don't know how this applies to your
| Postgres DB, until you run into a situation where you do.
|
| The author explains this very well, it's a good read! I've
| learned about this padding little over a year ago, while I was
| designing a data intensive application with a colleague. I was
| skeptical about the advantage at first, but for our specific
| design, where we have 100 to 480+ columns in one table it makes a
| huge difference on table store size. Not so much on the indexes,
| though.
| sgarland wrote:
| > where we have 100 to 480+ columns in one table
|
| I'm sorry, what? Why?
| rtuin wrote:
| Glad you asked!
|
| This system contains measurements and state of physical
| devices (time series). It's designed for both heavy write and
| read, with slight emphasis on write. Each table is one type
| of device and contains 1 to 5 different measurements/states.
| But here's the trick: because data is queried with minimum
| bucket size of 15minutes I figured we could just create a
| column for each measurement + quarter of the day (i.e.
| measure0000, measure0015), so that's 100 columns for each
| measurement (96 quarter + 4 for DST), include the date in the
| key, et voila: excellent write performance (because it's
| mainly UPDATE queries) and good read performance.
|
| Okay, the queries to make sense of the data aren't pretty,
| but can be generated.
|
| I find it really cool how effective this is for time-series
| data without Postgres extensions (we're on RDS).
| napsterbr wrote:
| One interesting thing your team may want to look into (if
| you haven't already) is compression.
|
| Of course there are a multitude of variables we don't have
| access from the outside, but Postgres only compresses data
| that is TOASTed, and based on your description of the
| table, the data is not being TOASTed (and therefore not
| being compressed).
|
| Instead, if you could somehow pack your timeseries entries
| into an array, you would get the benefits of compression
| automatically.
|
| Given your write performance requirements, using an array
| may be out-of-question (and you may get too much overhead
| from dead tuples) -- but who knows? Always a good idea to
| benchmark.
|
| I actually considered mentioning this at the post but
| figured it was too long already and could be the material
| for a future one :)
| aidos wrote:
| The updates wouldn't be in place anyway because of the
| Postgres mvcc approach, right?
|
| I can see the advantage in terms of just needing a single
| tuple for a reads. So a timestamp + value model would
| likely take twice as much heap space than your approach?
|
| Given that you're probably always just inserting new data
| you could use a brin index to get fast reads on the date
| ranges. Would be interesting to see it in action and play
| around to see the tradeoffs. The model you've settled on
| sounds like it would be a pain to query.
| OliverJones wrote:
| Good material!
|
| It has to be said, the order of columns in correctly designed
| multicolumn BTREE indexes is governed by the shape of the queries
| the indexes support.
|
| So don't arbitarily reorder columns in your indexes to handle
| alignment.
| delduca wrote:
| Could the Django ORM deal with this?
| sgarland wrote:
| Yes, but it doesn't that I'm aware of. Certainly able to
| override the classes yourself, though. Make a lookup table of
| column type / length and then reorder based on that.
| jbkkd wrote:
| It can't right now, but this is a doable extension
| SoftTalker wrote:
| This is the sort of thing that good DBAs used to know about and
| manage, but nowadays that isn't a fashionable job.
| didgetmaster wrote:
| It's been awhile since I dug into the bowels of PG; but the
| overall row size with respect to page size used to be important.
| If you had a table where every row took up 4097 bytes and the
| page size was 8K; the your disk footprint for that table was
| double. Only one row fit in a page and the other (nearly) half of
| each page was wasted.
|
| If that is still true, then alignment issues could push you over
| the edge if you are close to it.
| gnfargbl wrote:
| The ever-useful postgres_dba tool
| (https://github.com/NikolayS/postgres_dba) can help with this.
| Install it, start psql, run :dba and choose option p1 to see how
| much table space you would save by reordering your columns.
| tiffanyh wrote:
| How much of this article is rooted in hard drives with physically
| mechanical spinning disk latency assumptions, vs ssd/nvme?
___________________________________________________________________
(page generated 2024-10-07 23:00 UTC)