[HN Gopher] A read query can write to disk: a Postgres story
___________________________________________________________________
A read query can write to disk: a Postgres story
Author : yakkomajuri
Score : 91 points
Date : 2021-10-31 12:08 UTC (1 days ago)
(HTM) web link (www.mutuallyhuman.com)
(TXT) w3m dump (www.mutuallyhuman.com)
| krembo wrote:
| I'm doing professional services of optimizing on-prem and cloud
| Postgres databases. It always surprises me how much companies
| tend to take cloud databases as automated and rely on them to do
| auto-tuning for everything.
|
| Well they don't, and as soon as the product goes live and the
| amount of data slightly grows, the performance degrades
| exponentially.
|
| * Never count on cloud databases to replace a DBA work.
|
| * Don't take ORM optimizations for granted. They all suck at one
| point or another.
|
| * If you can't afford an FTE, take short professional gigs from
| time to time to fix and boost performance.
| spicybright wrote:
| > Don't take ORM optimizations for granted.
|
| I've been bitten many a time by this.
|
| I'd even go as far to say if you don't know how to write SQL
| that does the same thing your ORM framework is doing, you're
| setting yourself up for a lot of pain later when you do
| something non-trivial.
| beaconstudios wrote:
| this is why nowadays I prefer to use query builders (like
| knex - knexjs.org) instead of ORMs. You can optimise your
| queries without having huge blobs of raw SQL in the middle of
| model code, and you can build models (or service objects, as
| I prefer) manually that are completely transparent.
| berkes wrote:
| Worse, is that ORMS tend to creep all through your code.
| Introducing not only all sorts of coupling- and
| responsibility issues down the road, it most of all leads to
| unexpected issues.
|
| Sure, its nice being able to say `ThisPost.authors.sort(DESC,
| 'postCount').where('active = true').select('name',
| 'avatarUrl', 'postCount')` anywhere in your codebase, but you
| know have spread knowledge of how to sort, filter; what
| attributes there are, what their relations are and so on,
| throughout your codebase.
|
| Or, more on-topic: you know have spread knowledge about what
| can be fetched optimized, and what will cause performance
| pain, throughout your code. Suddenly some view-template, or
| completely unrelated validation, can now cause your database
| to go down.
|
| Rails' activerecord is especially cumbersome in this: it
| offers "sharp knives"[0] but hardly any tooling to protect
| yourself (or that new intern) from leaving all those sharp
| knives lying around. A "has_and_belongs_to :users" and a
| ".where(users: { status: 'active', comment_count: '>10' }" is
| just two lines. But they will haunt you forever and probably
| will bring your database down if you have any load.
|
| [0] https://rubyonrails.org/doctrine/#provide-sharp-knives
| wpietri wrote:
| Yeah, for me this is a classic "easy vs simple" example.
|
| Cloud databases are a very easy option. But Postgres is 1.3
| million lines of code, and on top of that you add the
| complexity of the cloud vendor's environment, choices, and
| custom code. It may be easy, but it's definitely not simple.
| selfhoster11 wrote:
| > It always surprises me how much companies tend to take cloud
| databases as automated and rely on them to do auto-tuning for
| everything.
|
| Automation and auto-tuning is kind of the whole value
| proposition of managed services, isn't it?
| paulryanrogers wrote:
| IME, there is a spectrum from barely hosted (rented bare
| metal) to white-glove service (Heroku)
| latch wrote:
| This is pretty basic PostgreSQL tuning (1). An oft-cited problem
| with PostgreSQL is that it uses a process-per-connection (session
| in PG lingo, I think). `work_mem` is the amount of memory a
| connection has to, you know, do work. Sorting, distinct, some
| joins.
|
| Number of Connections * work_mem is usually going to eat up the
| biggest chunk of your PostgreSQL's memory. AFAIK, the
| configuration is extremely coarse. There's no way for example to
| say: I want all the connections from username "myapp" to have
| 10MB and those from user "reporting" to have 100MB. And it can't
| be adjusted on the fly, per query.
|
| Being able to set aside 200GB of memory to be used, as needed, by
| all connections (maybe with _some_ limits to prevent an
| accident), would solve a lot of problems (and introduce a bunch
| of them, I know).
|
| Since they're on RDS, I can't help but point out that I've seen
| DB queries on baremetal operate orders of magnitude faster. 10
| minute to 1 second type thing. I couldn't help but wonder if
| they'd even notice the disk-based sorting on a proper (yes, I
| said it) setup.
|
| (1)
| https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Serv...
| mikeklaas wrote:
| > And it can't be adjusted on the fly, per query.
|
| Sure it can: SET LOCAL work_mem = '256MB';
| SELECT * FROM ...
| ianleeclark wrote:
| > `work_mem` is the amount of memory a connection has to, you
| know, do work.
|
| It's the amount of memory a connection has to do work per
| sorting/hashing operation.
| https://www.postgresql.org/docs/12/runtime-config-resource.h...
|
| > There's no way for example to say: I want all the connections
| from username "myapp" to have 10MB and those from user
| "reporting" to have 100MB.
|
| Yes, you can with
|
| `ALTER ROLE myapp SET work_mem TO '10mb';`
|
| and
|
| `ALTER ROLE reporting SET work_mem TO '100mb';
| jhgb wrote:
| > Yes, you can with [...a helpful example...]
|
| Some days it's nice to have a lucky 10k problem
| (https://xkcd.com/1053) in place of a c10k problem
| (http://www.kegel.com/c10k.html). Thanks.
| WJW wrote:
| I worked on a MySQL-on-RDS setup once where the company was
| splurging (and being charged through the nose, ofc) on a
| backing EBS volume with 20k IOPS. Meanwhile nobody had bothered
| to update the default settings, which limited all background
| work to use no more than 100 IOPS. Needless to say, this DB had
| severe performance problems.
|
| Managed databases in the cloud are a bit like ORMs: they
| significantly reduce the amount of knowledge needed 99% of the
| time. Sadly, when the 1% of the time hits this also means that
| nobody on the team has built up the DBA skills required for the
| big hairy problem. I see this pattern repeated all the time
| when consulting for startups that just had their first real
| traction and now the magic database box is no longer working as
| before.
| idiocrat wrote:
| Call it driving with hand brakes on.
|
| You have bought your 1000 hps car and are driving it as-is
| since you left the car dealer's parking place. Nobody told
| you to release the hand brakes.
| WJW wrote:
| Quite so. Of course, in most countries all drivers will
| have had mandatory driving lessons before they got their
| license and the instructor will have told them about the
| handbrake. No such license requirement exists for databases
| and it shows.
| daemoncoder wrote:
| I like to say they should remove the block from under the
| accelerator pedal.
| throwaway20371 wrote:
| > Sadly, when the 1% of the time hits this also means that
| nobody on the team has built up the DBA skills required for
| the big hairy problem. I see this pattern repeated all the
| time when consulting for startups that just had their first
| real traction and now the magic database box is no longer
| working as before.
|
| I call this the Kubernetes effect. 99% of the time, people
| think it's easy. Then the 1% hits and they have to completely
| rebuild it from the ground up. For us it was cluster
| certificate expiration and etcd corruption, and some other
| problems we couldn't ever pin down.
|
| The other effect (which I'm sure has a name) is when they
| depend on it for a long time, but then a problem starts to
| occur, and because they can't figure out a way to solve it,
| they move to a completely different tech ("We had problems
| with updating our shards so we moved to a different
| database") and encounter a whole new problem and repeat the
| cycle.
| mcherm wrote:
| I think DB tuning seems like a perfect business opportunity
| for specialist consulting. Doing a really good job requires
| deep expertise but then it only needs to be repeated
| occasionally (if the operational profile or the data size
| changes significantly).
|
| But I don't see this model much. I mostly hear of companies
| struggling to develop internal expertise or going without.
|
| Why?
| BrentOzar wrote:
| > But I don't see this model much. Why?
|
| It does exist (I'm one of them), and it's much more common
| on expensive databases like Microsoft SQL Server and
| Oracle. With free databases, it's relatively inexpensive to
| throw more hardware at the problem. When licensing costs
| $2,000 USD per CPU core or higher, there's much more
| incentive for companies to bring an expert in quickly to
| get hardware costs under control.
| redis_mlc wrote:
| Yup, SQL Server performance and licensing (!) consultants
| came out of retirement after MS doubled the licensing
| costs in 2012.
|
| Managers don't like paying for optimization consultants,
| but it's the easiest way to both improve current
| performance and scale a master.
|
| Note that Oracle positions MySQL as a replacement for SQL
| Server, as "moderate-performance databases." :)
|
| Source: DBA.
|
| https://metrixdata360.com/pricing-series/1500-price-
| increase...
| Intermernet wrote:
| I did this for years. I was a server optimisation
| consultant. I'd always advise my clients that increasing
| the specs of the hardware would probably be cheaper in the
| short term than hiring me to go through the various
| configurations with a fine toothed comb. I also promised a
| fixed price report on what needed to be done, as well as a
| fixed price quote on doing the actual work. I found that
| this level of transparency and honesty never failed, and
| the work was always approved. I'd also offer training for
| the internal team.
|
| I've pretty much completely changed careers now, but I
| found that this model did work well. It involved a lot of
| constant research and occasionally a huge amount of stress,
| but I loved it!
| apavlo wrote:
| > I think DB tuning seems like a perfect business
| opportunity
|
| I got you fam: https://ottertune.com/
| ComodoHacker wrote:
| An Oracle story. Read queries can also write to disk due to the
| way Oracle handles consistency. Information on which transactions
| last changed particular rows in a table gets stored with the
| rows. Read query then looks up those transactions to see whether
| they were committed or not. If not, read query skips rows they
| touched (and looks up their previous versions). If they were,
| that transaction info gets cleared up, so next queries don't have
| to look it up again.
|
| This is simplified explanation, but the point is the process
| executing a read query is often best positioned to do this
| maintenance work. This decision trades off a little latency for
| throughput.
| ccleve wrote:
| Postgres does something similar. Every time a query returns a
| row the system must do a lookup into a visibility map to see if
| the row is visible to the current transaction. It can also
| check if it's a completely dead row. If there is an index
| involved, then on the next row fetch Postgres will report to
| the index whether the last row was dead or not. The index can
| elect to delete the row reference, which means a write during a
| select statement.
| sokols wrote:
| Similarly, in Oracle, Temporary Tablespaces might be utilized
| when a query uses sort operations for example.
| chasil wrote:
| This Oracle behavior is called "block cleanout," and makes it
| unnecessary for a transaction to touch all effected blocks when
| it commits or rolls back - some future transaction will do
| this.
|
| Oracle also has two init.ora parameters, SORT_AREA_SIZE and
| SORT_AREA_RETAINED_SIZE, that are similar to the WORK_MEM
| mentioned in the subject article.
|
| As far as I know, SORT_AREA_SIZE is global, and differing
| values cannot be assigned to roles.
| dspillett wrote:
| This is not a pg specific thing, for instance in SQL Server this
| is referred to as "spilling to disk". Large sorts can spill to
| disk, and the query planner may assume this is going to be done
| in RAM, based on what stats it has about the relevant indexes,
| but later have to spill to disk, so pick a plan that is sub-
| optimal.
| bbarnett wrote:
| Some queries with mysql result in temp tables, and some use
| disk based tables, due to complexity.
|
| https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-t...
| scottlamb wrote:
| Even SQLite3 does this.
| https://www.sqlite.org/queryplanner.html says under "Sorting":
|
| > Furthermore, the entire output is accumulated in temporary
| storage (which might be either in main memory or on disk,
| depending on various compile-time and run-time settings) which
| can mean that a lot of temporary storage is required to
| complete the query.
|
| It also will sometimes create an index during a query.
| https://www.sqlite.org/tempfiles.html mentions this under
| "Transient Indices".
| whoomp12342 wrote:
| Am I stupid? is this just disk caching or page filing? why is
| this so clickbaity
| mnw21cam wrote:
| The database has analysed the query that is being run, and
| decided that the quickest way to fulfil that query is to take
| 27MB of data and sort it into a file on disc before reading it
| back in with the correct order. This is caused by the work_mem
| setting in the database being set too low, preventing the
| database from contemplating just sorting the whole thing in
| memory.
|
| The default work_mem setting for Postgres has historically been
| very low. It's fine for reading single rows from a single table
| using an index, but as soon as you put a few joins in there
| it's not adequate. It should be one of the first steps of
| setting up Postgres to increase this limit.
___________________________________________________________________
(page generated 2021-11-01 23:02 UTC)