[HN Gopher] An unexpected find that freed 20GB of unused index s...
___________________________________________________________________
An unexpected find that freed 20GB of unused index space (2021)
Author : skadamat
Score : 212 points
Date : 2023-08-28 14:42 UTC (8 hours ago)
(HTM) web link (hakibenita.com)
(TXT) w3m dump (hakibenita.com)
| voiper1 wrote:
| (2021) I thought the picture looked familiar.
| pletnes wrote:
| Does this principle apply if you have one or a few values that
| are very common? E.g an integer column with 90% of them being 0?
| convivialdingo wrote:
| An index is most efficient when it represents a unique set of
| values. It's still very useful to have an index for grouping,
| but if the groups represent a severe minority then you will end
| up wasting a lot of space and cycles searching through the
| index.
| lozenge wrote:
| Yes, Postgres keeps track of the most common values. If it
| knows 0 is common and "where val = 0" will keep 90% of rows it
| might choose to use a table scan instead of the index.
| teddyh wrote:
| Couldn't you create separate indexes for the 0 and non-0 cases?
| recursive wrote:
| If the zeroes are most of the common value, it could be
| slower to use the index when searching for zeroes. How so?
| Reading values that aren't included in the index require
| following a reference back to the table row. If 90% of the
| rows in your query are zeroes, you'd be better of not using
| that column in your query planning. A naive filter, possibly
| even after a table scan, is likely to be faster than using
| the index.
| teddyh wrote:
| Would an index of the non-0 rows still help?
| recursive wrote:
| Yes. If your WHERE clause is searching for only the ones
| for instance.
| ChristianGeek wrote:
| Only if you used null to represent the common value, which
| generally isn't a good idea.
| hinkley wrote:
| The first large scale project I worked on, my team couldn't
| figure out why operations had slowed down as the data set grew.
|
| Indexes have log(n) insertion time per record. If you had 1000
| records in your test database, as you approach 65k your insertion
| time goes up by 60% (2^10 vs 2^16 records). Success slows
| everything down, and there are only so many server upgrades
| available.
|
| Add a couple new indexes for obscure features the business was
| looking for, and now you're up to double.
| winrid wrote:
| Double a very small number. If that's seriously an issue use a
| faster disk. So many people trying to run a DB on EBS with less
| IOPS than my PC from 2015.
|
| I manage plenty of DBs with hundreds of millions of records and
| 40+ indexes per table/collection...
| vjerancrnjak wrote:
| Insertion should still be extremely fast for such a small
| index, right?
|
| Doing binary search over a b-tree page is <100 cycles. b-tree
| traversal over 100M records should still be measured in
| microseconds and binary search over that should be in
| microseconds too if not in 100s of nanoseconds.
| hinkley wrote:
| This conversation is about unrepresentative sample data.
| vjerancrnjak wrote:
| Are you saying that the index is made on columns which only
| have few unique values so you need to scan everything all
| the time?
|
| Given that you've included the log(b) claim I thought you
| were referring to a normally functioning btree.
|
| A situation where 65k rows end up slowing down things
| severely should have more to do with other database
| features (foreign keys and similar).
| lazide wrote:
| 'It depends' - don't forget about locking overhead, needing
| to flush multiple pages/leafs as thresholds get hit, disk I/O
| wait times, etc.
|
| At the beginning it's rarely noticeable, but it does exist
| and under heavy load or with a lot of index writes needing to
| happen it can cause very noticeable overhead.
| vjerancrnjak wrote:
| Yes, but these bits don't scale with log(n) index tree.
| They depend on something other than the index data
| structure.
| efxhoy wrote:
| I spent last week freeing up 200GB from our 600GB db with just
| reindex and pg_repack. The worst offender was a 17GB (of data)
| table that had 142GB of indexes. Reindex took it down to 21GB.
| The table indexing is crazy and has multiple indexes over
| different sets of columns.
|
| A contributing factor for the huge index I think was the
| distribution of data. It's had inserts, updates and deletes
| continuously since 2015. Data is more likely to be deleted the
| older it gets so there's more data from recent years, but about
| 0.1% of the data is still from 2015. I think maybe this skewed
| distribution with a very long tail meant vacuum had a harder time
| dealing with that index bloat.
| dang wrote:
| Discussed at the time:
|
| _An unexpected find that freed 20GB of unused index space in
| PostgreSQL_ - https://news.ycombinator.com/item?id=25988871 - Feb
| 2021 (78 comments)
| halayli wrote:
| I highly recommend pganalyze.com to discover unused indexes,
| optimization opportunities, and high latency queries.
| aftbit wrote:
| Making indexes smaller is nice even when you have a ton of
| storage, as then more can fit into the hot set. However as
| someone who runs TB of databases, "just provision more storage"
| is always a valid option. Especially if you are outside the
| cloud. If you have your own hardware, new enterprise NVMe SSDs
| are about $80/TB, and DDR4 RAM is around $1.20/GB. Four hours of
| engineering time (very roughly $1000) buys either 800 GB of RAM
| or 12 TB of storage.
| paulddraper wrote:
| Neither one will save you fro slower writes due to unnecessary
| indices.
| dchftcs wrote:
| Depending on the scale and complexity, if you make no effort to
| control resource usage, costs grow exponentially, sometimes
| even without growth in business because the requirements just
| become more complex. For a certain optimization, you may save
| 1TB today, but end up saving 2TB some years down the road; just
| a few of these decisons you could be looking at a difference of
| a magnitude or more, sometimes even at larger scale. Overall
| there's always a balance to be struck.
| rockostrich wrote:
| Just provision more storage is usually the answer on the cloud
| as well unless someone has a concrete idea as to how to use
| less storage. Although the math is a bit trickier since it's a
| monthly price rather than a one time cost (although if your
| database is growing at a constant rate then that one time
| provisioning is really just a monthly price on-prem as well).
| crabbone wrote:
| This is not at all a good way to compare.
|
| Engineering hours: there's a good chance you pay for that once,
| and that solves the problem.
|
| 10 SSDs: will require rack space, electricity, PCIe slots,
| timely replacement, management software... most of these
| expenses will be recurring expenses. If done once, sometimes
| the existing infrastructure can amortize these expenses (i.e.
| you might have already had empty space in a rack, you might
| have already had spare PCIe slots, etc.), but amortization will
| only work in small numbers.
|
| Another aspect of this trade-off: systems inevitably lose
| performance per unit of equipment as they grow due to
| management expenses and increased latency. So, if you keep
| solving problems by growing the system, overall, the system
| will become more and more "sluggish" until it becomes
| unserviceable.
|
| On the other hand, solutions which minimize the number of
| system resources necessary to accomplish a task increase
| overall performance per unit. In other words, create a higher-
| quality system, which is an asset in its own right.
| krembo wrote:
| Even on RDS 20GB's price is neglectible for most companies and
| doesn't worth the added efforts and engineers salaries of
| looking into that. From DBA perspective, that's a cool find.
| Thanks for sharing.
| pgwhalen wrote:
| I'm not super in the know about it, but I believe the _total_
| cost of SAN storage at my employer (on prem) is ~10X that.
| codetrotter wrote:
| > Four hours of engineering time (very roughly $1000)
|
| Why $1000 for four hours?
|
| Does it cost your company $150,000 per month per engineer?
| vikingerik wrote:
| Well, there's a multiplier between the nominal capacity and the
| capacity you actually need to purchase for your whole system.
| You're not buying just that 1 TB. You probably want at least
| two live failover servers at least, maybe more. Plus however
| many layers of backups and disaster recovery; a year of weekly
| backups makes your 1 TB into 50, even if it's offline storage.
|
| My own company struggles with that - throwing more storage on
| the live db is easy, so we've kept doing that for years, but
| pushing around multi-terabyte backups is getting cumbersome and
| we're going to have to slim down the data in prod even at the
| cost of engineer effort.
| danielheath wrote:
| Weekly full backups are quite large vs eg ZFS snapshots,
| right?
| Dylan16807 wrote:
| Are you putting indexes into your backups?
| krab wrote:
| pg_basebackup is, like any physical replication of
| Postgres.
| nine_k wrote:
| Backing up indexes makes sense if you care about restoring
| the DB _quickly_.
| [deleted]
| _a_a_a_ wrote:
| > new enterprise NVMe SSDs are about $80/TB, and DDR4 RAM is
| around $1.20/GB
|
| sounds incredibly cheap
| telios wrote:
| The post makes mention of B-tree de-duplication that is present
| in PostgreSQL 13, but not 12, the version they're using; at the
| same time, they're noting that the vast majority of values in
| some of their foreign key indexes are NULL.
|
| I have to wonder if B-tree de-duplication would have helped with
| that particular case? The PostgreSQL 13 documentation seems to
| imply it, as far as I can tell[0] (under 63.4.2):
|
| > B-Tree deduplication is just as effective with "duplicates"
| that contain a NULL value, even though NULL values are never
| equal to each other according to the = member of any B-Tree
| operator class.
|
| I don't think it would be as effective as a partial index as
| applied in the post, I'm just curious.
|
| [0]: https://www.postgresql.org/docs/13/btree-implementation.html
| luhn wrote:
| In previous discussion of this article, an HN user did the
| math: pg12 is 16 bytes per NULL and pg13 is 6.32 bytes per
| NULL. https://news.ycombinator.com/item?id=25989467 So
| definitely some pretty significant savings there.
| zzzeek wrote:
| Wow, they are not kidding, this is truly "we saved 20g with one
| weird trick". We get a lot of requests for users wanting to use
| these unusual Postgresql index forms that are largely unheard of
| in old school Oracle / SQL Server shops, I didn't realize they
| were indexing NULL values either.
| aidos wrote:
| The fact that this is news to you makes me feel better about
| not considering it myself. I suspect we have a few large
| indexes that could be given the same treatment.
| hn_throwaway_99 wrote:
| Wow, thanks for this great writeup! I thought this was really
| useful not just from the point of the "partial index 'find'" that
| is the focus of the post (but that was a great point and thing to
| be aware of), but from the general overview of good techniques
| and things to be aware of in postgres if you're worried about
| using space inefficiently. Saving this one for reference!
|
| One minor "word to the wise", cause I thought this was a great
| post but also has the potential to be misused: if you work at a
| startup or early stage company, it is nearly _always_ the better
| decision to throw more disk space at a storage problem like this
| than worry about optimizing for size. Developers are expensive,
| disk space is cheap.
| nico wrote:
| > if you work at a startup or early stage company, it is nearly
| always the better decision to throw more disk space at a
| storage problem like this than worry about optimizing for size.
| Developers are expensive, disk space is cheap
|
| This is great advice. In general at the beginning it is better
| to keep things as simple as possible.
|
| At one fast growing startup I worked at, one of the founders
| insisted we kept upgrading just one machine (plus redundancy
| and backups). It was a great strategy! Kept the architecture
| super simple, easy to manage, easy to debug and recover. For
| the first 5 years of the company, the whole thing ran on one
| server, while growing exponentially and serving millions of
| users globally.
|
| After seeing that, it's clear to me you should only upgrade
| when needed, and in the simplest, most straightforward way
| possible.
| jtc331 wrote:
| There's a lot of emphasis here about just adding storage, but
| that's missing the fact that unnecessary indexing affects
| writes and reads performance also (and potentially quite
| significantly).
|
| Using a partial index where it's obviously matches the use case
| (like when a majority of values are null) is just _correct
| modeling_ and should not be considered a premature optimization
| or waste or developer time.
| hyperman1 wrote:
| In a similar vein, I got good value from these scripts:
|
| https://github.com/NikolayS/postgres_dba
|
| I managed to free 10% a.k.a about 100 GB of storage by
| reorganizing table column ordering in a big table.
___________________________________________________________________
(page generated 2023-08-28 23:00 UTC)