[HN Gopher] A read query can write to disk: a Postgres story
       ___________________________________________________________________
        
       A read query can write to disk: a Postgres story
        
       Author : yakkomajuri
       Score  : 91 points
       Date   : 2021-10-31 12:08 UTC (1 days ago)
        
 (HTM) web link (www.mutuallyhuman.com)
 (TXT) w3m dump (www.mutuallyhuman.com)
        
       | krembo wrote:
       | I'm doing professional services of optimizing on-prem and cloud
       | Postgres databases. It always surprises me how much companies
       | tend to take cloud databases as automated and rely on them to do
       | auto-tuning for everything.
       | 
       | Well they don't, and as soon as the product goes live and the
       | amount of data slightly grows, the performance degrades
       | exponentially.
       | 
       | * Never count on cloud databases to replace a DBA work.
       | 
       | * Don't take ORM optimizations for granted. They all suck at one
       | point or another.
       | 
       | * If you can't afford an FTE, take short professional gigs from
       | time to time to fix and boost performance.
        
         | spicybright wrote:
         | > Don't take ORM optimizations for granted.
         | 
         | I've been bitten many a time by this.
         | 
         | I'd even go as far to say if you don't know how to write SQL
         | that does the same thing your ORM framework is doing, you're
         | setting yourself up for a lot of pain later when you do
         | something non-trivial.
        
           | beaconstudios wrote:
           | this is why nowadays I prefer to use query builders (like
           | knex - knexjs.org) instead of ORMs. You can optimise your
           | queries without having huge blobs of raw SQL in the middle of
           | model code, and you can build models (or service objects, as
           | I prefer) manually that are completely transparent.
        
           | berkes wrote:
           | Worse, is that ORMS tend to creep all through your code.
           | Introducing not only all sorts of coupling- and
           | responsibility issues down the road, it most of all leads to
           | unexpected issues.
           | 
           | Sure, its nice being able to say `ThisPost.authors.sort(DESC,
           | 'postCount').where('active = true').select('name',
           | 'avatarUrl', 'postCount')` anywhere in your codebase, but you
           | know have spread knowledge of how to sort, filter; what
           | attributes there are, what their relations are and so on,
           | throughout your codebase.
           | 
           | Or, more on-topic: you know have spread knowledge about what
           | can be fetched optimized, and what will cause performance
           | pain, throughout your code. Suddenly some view-template, or
           | completely unrelated validation, can now cause your database
           | to go down.
           | 
           | Rails' activerecord is especially cumbersome in this: it
           | offers "sharp knives"[0] but hardly any tooling to protect
           | yourself (or that new intern) from leaving all those sharp
           | knives lying around. A "has_and_belongs_to :users" and a
           | ".where(users: { status: 'active', comment_count: '>10' }" is
           | just two lines. But they will haunt you forever and probably
           | will bring your database down if you have any load.
           | 
           | [0] https://rubyonrails.org/doctrine/#provide-sharp-knives
        
         | wpietri wrote:
         | Yeah, for me this is a classic "easy vs simple" example.
         | 
         | Cloud databases are a very easy option. But Postgres is 1.3
         | million lines of code, and on top of that you add the
         | complexity of the cloud vendor's environment, choices, and
         | custom code. It may be easy, but it's definitely not simple.
        
         | selfhoster11 wrote:
         | > It always surprises me how much companies tend to take cloud
         | databases as automated and rely on them to do auto-tuning for
         | everything.
         | 
         | Automation and auto-tuning is kind of the whole value
         | proposition of managed services, isn't it?
        
           | paulryanrogers wrote:
           | IME, there is a spectrum from barely hosted (rented bare
           | metal) to white-glove service (Heroku)
        
       | latch wrote:
       | This is pretty basic PostgreSQL tuning (1). An oft-cited problem
       | with PostgreSQL is that it uses a process-per-connection (session
       | in PG lingo, I think). `work_mem` is the amount of memory a
       | connection has to, you know, do work. Sorting, distinct, some
       | joins.
       | 
       | Number of Connections * work_mem is usually going to eat up the
       | biggest chunk of your PostgreSQL's memory. AFAIK, the
       | configuration is extremely coarse. There's no way for example to
       | say: I want all the connections from username "myapp" to have
       | 10MB and those from user "reporting" to have 100MB. And it can't
       | be adjusted on the fly, per query.
       | 
       | Being able to set aside 200GB of memory to be used, as needed, by
       | all connections (maybe with _some_ limits to prevent an
       | accident), would solve a lot of problems (and introduce a bunch
       | of them, I know).
       | 
       | Since they're on RDS, I can't help but point out that I've seen
       | DB queries on baremetal operate orders of magnitude faster. 10
       | minute to 1 second type thing. I couldn't help but wonder if
       | they'd even notice the disk-based sorting on a proper (yes, I
       | said it) setup.
       | 
       | (1)
       | https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Serv...
        
         | mikeklaas wrote:
         | > And it can't be adjusted on the fly, per query.
         | 
         | Sure it can:                  SET LOCAL work_mem = '256MB';
         | SELECT * FROM ...
        
         | ianleeclark wrote:
         | > `work_mem` is the amount of memory a connection has to, you
         | know, do work.
         | 
         | It's the amount of memory a connection has to do work per
         | sorting/hashing operation.
         | https://www.postgresql.org/docs/12/runtime-config-resource.h...
         | 
         | > There's no way for example to say: I want all the connections
         | from username "myapp" to have 10MB and those from user
         | "reporting" to have 100MB.
         | 
         | Yes, you can with
         | 
         | `ALTER ROLE myapp SET work_mem TO '10mb';`
         | 
         | and
         | 
         | `ALTER ROLE reporting SET work_mem TO '100mb';
        
           | jhgb wrote:
           | > Yes, you can with [...a helpful example...]
           | 
           | Some days it's nice to have a lucky 10k problem
           | (https://xkcd.com/1053) in place of a c10k problem
           | (http://www.kegel.com/c10k.html). Thanks.
        
         | WJW wrote:
         | I worked on a MySQL-on-RDS setup once where the company was
         | splurging (and being charged through the nose, ofc) on a
         | backing EBS volume with 20k IOPS. Meanwhile nobody had bothered
         | to update the default settings, which limited all background
         | work to use no more than 100 IOPS. Needless to say, this DB had
         | severe performance problems.
         | 
         | Managed databases in the cloud are a bit like ORMs: they
         | significantly reduce the amount of knowledge needed 99% of the
         | time. Sadly, when the 1% of the time hits this also means that
         | nobody on the team has built up the DBA skills required for the
         | big hairy problem. I see this pattern repeated all the time
         | when consulting for startups that just had their first real
         | traction and now the magic database box is no longer working as
         | before.
        
           | idiocrat wrote:
           | Call it driving with hand brakes on.
           | 
           | You have bought your 1000 hps car and are driving it as-is
           | since you left the car dealer's parking place. Nobody told
           | you to release the hand brakes.
        
             | WJW wrote:
             | Quite so. Of course, in most countries all drivers will
             | have had mandatory driving lessons before they got their
             | license and the instructor will have told them about the
             | handbrake. No such license requirement exists for databases
             | and it shows.
        
             | daemoncoder wrote:
             | I like to say they should remove the block from under the
             | accelerator pedal.
        
           | throwaway20371 wrote:
           | > Sadly, when the 1% of the time hits this also means that
           | nobody on the team has built up the DBA skills required for
           | the big hairy problem. I see this pattern repeated all the
           | time when consulting for startups that just had their first
           | real traction and now the magic database box is no longer
           | working as before.
           | 
           | I call this the Kubernetes effect. 99% of the time, people
           | think it's easy. Then the 1% hits and they have to completely
           | rebuild it from the ground up. For us it was cluster
           | certificate expiration and etcd corruption, and some other
           | problems we couldn't ever pin down.
           | 
           | The other effect (which I'm sure has a name) is when they
           | depend on it for a long time, but then a problem starts to
           | occur, and because they can't figure out a way to solve it,
           | they move to a completely different tech ("We had problems
           | with updating our shards so we moved to a different
           | database") and encounter a whole new problem and repeat the
           | cycle.
        
           | mcherm wrote:
           | I think DB tuning seems like a perfect business opportunity
           | for specialist consulting. Doing a really good job requires
           | deep expertise but then it only needs to be repeated
           | occasionally (if the operational profile or the data size
           | changes significantly).
           | 
           | But I don't see this model much. I mostly hear of companies
           | struggling to develop internal expertise or going without.
           | 
           | Why?
        
             | BrentOzar wrote:
             | > But I don't see this model much. Why?
             | 
             | It does exist (I'm one of them), and it's much more common
             | on expensive databases like Microsoft SQL Server and
             | Oracle. With free databases, it's relatively inexpensive to
             | throw more hardware at the problem. When licensing costs
             | $2,000 USD per CPU core or higher, there's much more
             | incentive for companies to bring an expert in quickly to
             | get hardware costs under control.
        
               | redis_mlc wrote:
               | Yup, SQL Server performance and licensing (!) consultants
               | came out of retirement after MS doubled the licensing
               | costs in 2012.
               | 
               | Managers don't like paying for optimization consultants,
               | but it's the easiest way to both improve current
               | performance and scale a master.
               | 
               | Note that Oracle positions MySQL as a replacement for SQL
               | Server, as "moderate-performance databases." :)
               | 
               | Source: DBA.
               | 
               | https://metrixdata360.com/pricing-series/1500-price-
               | increase...
        
             | Intermernet wrote:
             | I did this for years. I was a server optimisation
             | consultant. I'd always advise my clients that increasing
             | the specs of the hardware would probably be cheaper in the
             | short term than hiring me to go through the various
             | configurations with a fine toothed comb. I also promised a
             | fixed price report on what needed to be done, as well as a
             | fixed price quote on doing the actual work. I found that
             | this level of transparency and honesty never failed, and
             | the work was always approved. I'd also offer training for
             | the internal team.
             | 
             | I've pretty much completely changed careers now, but I
             | found that this model did work well. It involved a lot of
             | constant research and occasionally a huge amount of stress,
             | but I loved it!
        
             | apavlo wrote:
             | > I think DB tuning seems like a perfect business
             | opportunity
             | 
             | I got you fam: https://ottertune.com/
        
       | ComodoHacker wrote:
       | An Oracle story. Read queries can also write to disk due to the
       | way Oracle handles consistency. Information on which transactions
       | last changed particular rows in a table gets stored with the
       | rows. Read query then looks up those transactions to see whether
       | they were committed or not. If not, read query skips rows they
       | touched (and looks up their previous versions). If they were,
       | that transaction info gets cleared up, so next queries don't have
       | to look it up again.
       | 
       | This is simplified explanation, but the point is the process
       | executing a read query is often best positioned to do this
       | maintenance work. This decision trades off a little latency for
       | throughput.
        
         | ccleve wrote:
         | Postgres does something similar. Every time a query returns a
         | row the system must do a lookup into a visibility map to see if
         | the row is visible to the current transaction. It can also
         | check if it's a completely dead row. If there is an index
         | involved, then on the next row fetch Postgres will report to
         | the index whether the last row was dead or not. The index can
         | elect to delete the row reference, which means a write during a
         | select statement.
        
         | sokols wrote:
         | Similarly, in Oracle, Temporary Tablespaces might be utilized
         | when a query uses sort operations for example.
        
         | chasil wrote:
         | This Oracle behavior is called "block cleanout," and makes it
         | unnecessary for a transaction to touch all effected blocks when
         | it commits or rolls back - some future transaction will do
         | this.
         | 
         | Oracle also has two init.ora parameters, SORT_AREA_SIZE and
         | SORT_AREA_RETAINED_SIZE, that are similar to the WORK_MEM
         | mentioned in the subject article.
         | 
         | As far as I know, SORT_AREA_SIZE is global, and differing
         | values cannot be assigned to roles.
        
       | dspillett wrote:
       | This is not a pg specific thing, for instance in SQL Server this
       | is referred to as "spilling to disk". Large sorts can spill to
       | disk, and the query planner may assume this is going to be done
       | in RAM, based on what stats it has about the relevant indexes,
       | but later have to spill to disk, so pick a plan that is sub-
       | optimal.
        
         | bbarnett wrote:
         | Some queries with mysql result in temp tables, and some use
         | disk based tables, due to complexity.
         | 
         | https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-t...
        
         | scottlamb wrote:
         | Even SQLite3 does this.
         | https://www.sqlite.org/queryplanner.html says under "Sorting":
         | 
         | > Furthermore, the entire output is accumulated in temporary
         | storage (which might be either in main memory or on disk,
         | depending on various compile-time and run-time settings) which
         | can mean that a lot of temporary storage is required to
         | complete the query.
         | 
         | It also will sometimes create an index during a query.
         | https://www.sqlite.org/tempfiles.html mentions this under
         | "Transient Indices".
        
       | whoomp12342 wrote:
       | Am I stupid? is this just disk caching or page filing? why is
       | this so clickbaity
        
         | mnw21cam wrote:
         | The database has analysed the query that is being run, and
         | decided that the quickest way to fulfil that query is to take
         | 27MB of data and sort it into a file on disc before reading it
         | back in with the correct order. This is caused by the work_mem
         | setting in the database being set too low, preventing the
         | database from contemplating just sorting the whole thing in
         | memory.
         | 
         | The default work_mem setting for Postgres has historically been
         | very low. It's fine for reading single rows from a single table
         | using an index, but as soon as you put a few joins in there
         | it's not adequate. It should be one of the first steps of
         | setting up Postgres to increase this limit.
        
       ___________________________________________________________________
       (page generated 2021-11-01 23:02 UTC)