[HN Gopher] Things I hate about PostgreSQL (2020)
___________________________________________________________________
Things I hate about PostgreSQL (2020)
Author : latch
Score : 372 points
Date : 2021-04-06 07:20 UTC (15 hours ago)
(HTM) web link (rbranson.medium.com)
(TXT) w3m dump (rbranson.medium.com)
| vardaro wrote:
| For a moment there I thought was gonna be written by Richard
| Branson.
| barrkel wrote:
| My single biggest beef about PG is the lack of query planner
| hints.
|
| Unplanned query plan changes as data distribution shifts can and
| does cause queries to perform orders of magnitude worse. Queries
| that used to execute in milliseconds can start taking minutes
| without warning.
|
| Even the ability to freeze query plans would be useful,
| independent of query hints. In practice, I've used CTEs to force
| query evaluation order. I've considered implementing a query
| interceptor which converts comments into before/after per-
| connection settings tweaks, like turning off sequential scan (a
| big culprit for performance regressions, when PG decides to do a
| sequential scan of a big table rather than believe an inner join
| is actually sparse and will be a more effective filter).
| orthoxerox wrote:
| ClickHouse is the opposite: it has no optimizer, so your SQL
| must be structured the way you want it to run: deeply nested
| subqueries with one JOIN per SELECT. But at least you can be
| sure your query runs the way you intended.
| jeff-davis wrote:
| PostgreSQL offers a config where you can control join order
| to match the query text:
|
| https://www.postgresql.org/docs/13/runtime-config-
| query.html...
| rwmj wrote:
| An interesting approach and I'm not sure if I'd prefer it (I
| happen to like my queries being optimized automatically for
| my very tiny databases). But wouldn't it be possible to
| modify PostgreSQL to work this way too? It's unclear why
| you'd want to switch to a whole new DBMS for this.
| jeff-davis wrote:
| Can you enumerate some use cases you've run into? Sometimes
| looking at the individual use cases leads to better features
| than trying to generalize too quickly. For instance,
| controlling join order might suggest a different solution than
| a cardinality misestimate or a costing problem.
|
| Query plan freezing seems like an independently useful feature.
| zepearl wrote:
| > _My single biggest beef about PG is the lack of query planner
| hints._
|
| Same here.
|
| I did evaluate if to use PG for my stuff, but not having any
| hint available at all makes dealing with problems super-hard
| and potential bad situations become super-risky (esp. for PROD
| environments where you'll need an immediate fix if things go
| wrong for any reason, and especially involving 3rd party
| software which might not allow you to change the SQLs that it
| executes).
|
| Not saying that it should be as hardcore as Oracle (hundreds of
| hints available, at the same time a quite stubborn optimizer),
| but not having anything that can be used is the other bad
| extreme.
|
| I'd like as well to add that using hints doesn't have to be
| always the result of something that was implemented in a bad
| way - many times I as a human just knew better than the DB
| about how many rows would be accessed/why/how/when/etc... (e.g.
| maybe just the previous "update"-sql could have changed the
| data distribution in one of the tables but statistics would not
| immediately reflect that change) and not being able to force
| the execution to be done in a certain way (by using a hint)
| just leaved me without any options.
|
| MariaDB's optimizer can often be a "dummy" even with simple
| queries, but at least it provides some way (hints) to steer it
| in the right direction => in this case I feel like I have more
| options without having to rethink&reimplement the whole DB-
| approach each time that some SQL doesn't perform.
| jeff-davis wrote:
| "many times I as a human just knew better than the DB about
| how many rows would be accessed/why/how/when/etc..."
|
| Would you say the primary problem that you have with the
| planner is a misestimate of the number of rows input/output
| from a subplan? Or are you encountering other problems, too?
| nrdvana wrote:
| (not the OP but...) I have had 3 cases in the last year
| where a postgres instance with less than millions of rows
| per table has decided to join with fancy hash algorithms
| that result in tens of seconds per query instead of the 5ms
| that it would take when it uses nested loops (i.e.
| literally start with the table in the from clause, apply
| some where clause, join to next table, apply more where
| clause, join to next table, and so on)
|
| I do believe the planner was coming up with vast mis-
| estimates in some of those cases. 2 of the 3 were cases
| where the fully joined query would have been massive, but
| we were displaying it in a paged interface and only wanted
| 100 rows at a time.
|
| One was a case where I was running a "value IN (select
| ...)" subquery where the subquery was very fast and
| returned a very small number of rows, but postgres decided
| to be clever and merge that subquery into the parent. I
| fixed that one by running two separate queries, plugging
| the result of the first into the second.
|
| For one of the others, we actually had to re-structure the
| table and use a different primary key that matched the
| auto-inc id column of its peer instead of using the
| symbolic identifier (which was equally indexed). In that
| case we were basically just throwing stuff at the wall to
| see what sticks.
|
| I have no idea what we'd do if one of these problems just
| showed up suddenly in production, which is kind of scary.
|
| I'm sure the postgres optimizer is doing nice things for us
| in places of the system that we don't even realize, but I'm
| sorely tempted to just find some way to disable it entirely
| and live with whatever performance we get from nested
| loops. Our data is already structured in a way that matches
| our access patterns.
|
| The most frustrating part of it all is how much time we can
| waste fighting the query planner when the solution is so
| obvious that even sqlite could handle it faster.
|
| For context, I've only been using postgres professionally
| for about a year, having come from mysql, sql server, and
| sqlite, and I'm certainly still on the learning curve to
| figure out how the planner works and how to live with it.
| Meanwhile, postgres feature set is so much better than
| mysql or sql server I'd never consider going back.
| ganomi wrote:
| Take a look at this Postgres Extension:
| http://pghintplan.osdn.jp/pg_hint_plan.html
|
| I am even using this with AWS RDS since it comes in the set of
| default extensions that can be activated.
| rattray wrote:
| The pg_hint_plan is now being developed on github:
| https://github.com/ossc-db/pg_hint_plan
|
| I recently put up a PR for a README, you can read it here:
| https://github.com/ossc-
| db/pg_hint_plan/blob/8a00e70c387fc07...
| ollysb wrote:
| This looks very interesting. I had real difficulty where I
| needed both a btree and gin(pg_trgm) index on the same
| column. When using `like` postgres would consistently choose
| the btree index which resulted in performance that was
| something like 15secs as opposed to the 200ms or so I'd see
| if the gin index were used. In the end I added two separate
| columns, one for each index so that I could force the correct
| one to be used for a particular query.
| natmaka wrote:
| Isn't the optimizer fooled by some inadequately set parameter,
| for example "effective_cache_size"?
|
| The planner may be fooled due to a too small data sample, you
| may try: ALTER TABLE table_name ALTER COLUMN column_name SET
| STATISTICS 10000;
|
| Can't you use the autovacuumer in order to kick an ANALYZE
| whenever there is a risk of data distribution shift? ALTER
| TABLE table_name autovacuum_analyze_scale_factor=X,
| autovacuum_analyze_threshold=Y;
| brianwawok wrote:
| It's pretty hard to fix a complex black-box query planner
| with an estimate of when another black box analyze command
| will fix it.
|
| That said, if you have good monitoring, you can hopefully
| find out when a query gets hosed and at least have a chance
| to fix it.. it's not terribly often.
| gkop wrote:
| Don't blindly set stats to 10000, an intermediate value
| between the default of 100 and the max of 10000 may give you
| the best plans; experiment to find out.
| joana035 wrote:
| My only complain about PostgreSQL is COUNT() being quite slow
| compared with MySQL.
|
| Everything else is pretty good, MySQL has compressed tables, but
| in PostgreSQL the same amount of data already takes less space by
| default.
|
| Pghero/pg_stat_statements are also very handy.
|
| But "hate"? No, no hate here :)
| dijit wrote:
| just so you're aware, COUNT() on mysql can lie.
|
| Basically it's fetching metadata on the table, which can in
| some cases not be updated (yet), where as in pg it actually
| counts entries in the index.
| joana035 wrote:
| Doesn't this happens only when using sql_calc_found_rows?
| meritt wrote:
| That's only for MyISAM which sees very little use today. The
| InnoDB engine on MySQL does a full row count and is also
| relatively slow.
| jhgb wrote:
| Does it really count entries in the index? For example, in
| Firebird, it has to fetch rows because of row versioning
| (which happens in data pages, not in indices), and since
| PostgreSQL does versioning, too, I would have assumed that
| it's subject to the same limitation if it wants to return a
| correct answer for the current transaction.
| rattray wrote:
| You may be interested in this technique, or some of the others
| in the article:
| https://www.citusdata.com/blog/2016/10/12/count-performance/...
|
| EDIT: I'm also curious what version of Postgres you've
| experienced this on? Sounds like there may have been
| improvements to COUNT (DISTINCT in v11+
| caioariede wrote:
| If you're not making a lot of writes, that may be good
| approach since AFAIK the counts will only be updated after
| the table is ANALYZE'd
| natmaka wrote:
| One thing I hate about such articles is this "((use)) a managed
| database service" hint. Many if not most readers' data are
| confidential and storing them on a machine managed by unknown
| people seems foolish to me. Am I paranoid?
| mixedCase wrote:
| You are, unless you have a very good reason to treat your cloud
| provider as a likely malicious actor, in which case good luck
| setting up your own data center.
| dewey wrote:
| > Am I paranoid?
|
| Yes, because letting someone who knows what they are doing run
| your database is in most cases a better idea / more secure than
| doing it yourself if that's not your main business. If you pick
| a reputable provider there's not really an incentive for them
| to not keep your data confidential.
|
| Example: All the open MongoDB instances because the owners
| expose them to the internet with a simple configuration
| mistakes.
| latch wrote:
| Yes, but no.
|
| I'm a staunch believer that multi-tenant hardware and managed
| services are _obvious_ no-gos for privacy reasons.
|
| But, having done B2B where i had to deal with security
| procedures/questionnaires/documentation/checklists from large
| customers, no one else agrees.
| watermelon0 wrote:
| From the point of view of those large customers, would you
| really trust people working in company X more than
| AWS/Azure/GCP? Especially since those customers already use
| other SaaS providers, that probably use at least on the big
| cloud providers.
|
| There definitely are companies that employ great engineers,
| follow best practices, and can be on par with big cloud
| providers, but generally you shouldn't really expect that. In
| such cases, I'd rather see they leverage managed services,
| instead of deploying their own servers.
| mpolun wrote:
| I don't see how it's any different than using any hosting
| provider. It's probably worth encrypting your databases, but if
| you don't trust your hosting provider you're hosed -- managed
| service or not.
|
| If your paranoia is justified (which it may be, depending on
| your needs), you need to host the machines in your own
| datacenter
| vbezhenar wrote:
| One thing that I miss from PostgreSQL is transparent encryption.
| Some information systems require encryption of personal data by
| law. It's trivially implemented with commercial databases, so you
| can enable it and check a mark. Not so much with Postres.
| robertlagrant wrote:
| This seems better done at the storage layer. Doing it in the
| database layer is a good idea if you're monetizing your
| database by CPU core, of course.
| vemv wrote:
| Like many developers, I've used postgresql unquestioningly for
| many years, say 10. But if you ask me, I've rarely have had to
| face stringent scaling or availability requirements.
|
| Many deployments were a variation of AWS RDS or a similarly
| managed offering for HA. And they weren't exactly flawless.
|
| So this article is good food for thought. Why are we using
| postgresql? Because of the features, developer-friendliness, and
| so on.
|
| Does that align with actual customer requirements? Not
| necessarily.
|
| This problem framing aligns well with other phenomena one can
| observe. i.e. many complexities in our industry are completely
| self-inflicted (some personal 'favorites': OOP, microservices,
| SPAs, graphql).
|
| What if postgresql could be added to that list?
|
| Note that I'm not necessarily questioning postgresql as a
| project, but instead, the hype process that can lead communities
| to unquestioningly adopt this or that product without making an
| actually informed assessment.
| __jem wrote:
| You have to put your data somewhere... are commercial database
| vendors or nosql solutions _that_ much better?
| vemv wrote:
| I cannot make an expert assessment myself but for one thing
| mysql is occasionally picked precisely because of its HA
| story.
|
| It was the case in a couple past jobs of mine, back then I
| didn't particularly appreciate it but now I might see it with
| different eyes.
|
| Also commercial databases do have a better HA story - at
| least that's their reputation.
|
| The current status quo is that everything should be free, but
| obviously that has a fundamental contradiction with being a
| _professional software developer_ in the first place.
| MaxBarraclough wrote:
| What do you make of Amazon Aurora?
| collyw wrote:
| Do you have any good examples of Graphql increasing complexity
| (as opposed to REST)? A dev at my work is pushing for it, and I
| am against it because it's another tech that isn't really
| solving any problems that we currently have (too much
| complexity is our number one problem). It would be good to have
| some examples to show them.
| allochthon wrote:
| I like to think of GraphQL as a specific tool to address a
| specific set of challenges. Is it convenient to be able to
| batch up a bunch of requests in one go for a frontend JS
| client? Would it be useful to be able to specify a subset of
| the fields to be returned in a query? If the answers to these
| questions and others are "yes," GraphQL might be a good fit
| for what one is doing.
|
| I've personally had good experiences with it. It does bring
| in additional complexity. Is the complexity worth taking on
| in a specific dev team? This is something that will depend on
| the team.
|
| It is usually the case that a tool is not useful outside of
| the set of challenges it is tailored to address. And in some
| cases a tool is not well designed or implemented. It's always
| good to use the right tool for the job, at the level of
| complexity that a team is in a good position to handle.
| gsvclass wrote:
| You can use GraphJin it automatically compiles GraphQL
| queries into Postgres SQL ones. The core idea is to
| simplify frontend development
| https://github.com/dosco/graphjin
| nicoburns wrote:
| I think Postgres is great for many of the same reasons that
| people often (wrongly) tout NoSQL systems for. It's flexible,
| featureful, simple and quick to get started. And unlike most
| NoSQL systems, it has full ACID compliance and can scale well
| past MVP stage to the point that most businesses will never hit
| its limitations.
|
| If you do hit really huge scale then you will need to start
| looking beyond Postgres to solutions like Cassandra, Scylla,
| etc. But hopefully by that point you have a large dev team
| capable of handling the extra complexity.
| theshrike79 wrote:
| PostgreSQL is better at being MongoDB than Mongo is.
|
| You can just add a JSON column and do queries on the content,
| index the table on individual values etc.
| pillefitz wrote:
| One of the benefits of using Mongo is its horizontal
| scalability, not necessarily its ability to store
| documents.
| jeltz wrote:
| I wouldn't say postgresql is a self inflicted problem, like eg
| some argue micro services are, it is rather that choosing
| PostgreSQL optimizes for developer experience over things like
| flawless HA and zero downtime upgrades. So the tradeoff is
| between being able to build products faster and cheaper while
| sacrificing some reliability.
| kleebeesh wrote:
| Does anyone know if there is a way to monitor sudden changes in
| planning behavior for a given query? For example, I'd like to
| monitor/alert on this kind of situation: I'm executing the same
| query for many weeks/months, some table involved in the query is
| slowly growing over time, at some point the query planner
| responds to the growth by changing the plan.
| dang wrote:
| Discussed at the time:
|
| _PostgreSQL 's Imperfections_ -
| https://news.ycombinator.com/item?id=22775330 - April 2020 (134
| comments)
|
| Other things someone else hated:
|
| _Things I Hate About PostgreSQL (2013)_ -
| https://news.ycombinator.com/item?id=12467904 - Sept 2016 (114
| comments)
| agentultra wrote:
| Gosh I remember when Postgres _didn 't_ have any streaming
| replication. That was a huge pain point. You had to manually ship
| the WAL files to the standby and use a trigger for fail-over...
| and pray that your standby is actually up to date.
|
| The code in Postgres is written in a pragmatic, no-nonsense style
| and overall I'm quite happy with it. I've been bitten at times by
| run-away toast table bloat and the odd query plan misfire. But
| over all it's been a really solid database to work with.
| nezirus wrote:
| I think this is a good list, one needs to know potential pitfalls
| and plan accordingly.
|
| As for point #7, if your upgrade requires hours, you are holding
| it wrong, try pg_upgrade --link:
| https://www.endpoint.com/blog/2015/07/01/how-fast-is-pgupgra...
|
| (as usual, before letting pg_upgrade mess with on disk data, make
| proper backups with pg_basebackup based tools such as barman).
| briffle wrote:
| my only complaint with the pg_upgrade (with or without the
| --link) is for some reason, it does not move the statistics
| over, and you have to rebuild them, or have horrible
| performance for a while, until each page hits it auto-analyze
| thresholds.
|
| I'm doing some testing now for my DB, and the rebuilding all
| stats takes far, far longer than the upgrade. The upgrade takes
| seconds, and it takes a while to analyze multi-TB sized tables,
| even on SSDs.
| fdr wrote:
| Yeah, there's some kludgey workaround for this that is
| definitely 80/20 kind of material...pg_upgrade will generate
| a script that does progressively more accurate re-ANALYZE so
| you're not flying your early queries totally blind. Maybe
| look into running that.
| mikl wrote:
| I think it's worth mentioning that most of these problems only
| occur at a scale that only top 1% of companies will reach. I've
| been using PostgreSQL for over a decade without reaching any of
| the mentioned scaling-related problems.
|
| PostgreSQL is still the best general purpose database in my
| opinion, and you can then consider using something else for parts
| of your application if you have special needs. I've used
| Cassandra alongside PostgreSQL for massive write loads with great
| success.
| I_am_tiberius wrote:
| The problem is that you want to build something that can scale
| in the future.
| theshrike79 wrote:
| I think I've heard a saying about this, something about
| premature optimisation...
| I_am_tiberius wrote:
| Sure you shouldn't care about scaling at the beginning. But
| why should you start using a system that you already know
| won't scale in the future?
| WJW wrote:
| Because the hyperscalable databases are much more
| difficult to set up, use and administer. It's not a
| "free" upgrade, it'll slow down everything else you do.
| spaetzleesser wrote:
| There are a lot of dimensions to scaling. It's hard to
| predict where you really will have to scale up.
| oblio wrote:
| > Sure you shouldn't care about scaling at the beginning.
| But why should you start using a system that you already
| know won't scale in the future?
|
| Because it's well supported and solid otherwise? There's
| a wealth of documentation, resources of many kinds,
| software built around it (debugging, tracing, UIs, etc.).
| Because there's a solid community available that can help
| you with your problems?
|
| What alternative technology is there that scales better?
| I guess MySQL could be it, but doesn't MySQL also come
| with a ton of its own footguns?
| I_am_tiberius wrote:
| I use Postgres at the moment and I'm happy except for the
| process per connection part and the upgrade part. Knowing
| what I know now I think MySQL would have made me happier.
| On the other hand, it may have caused other issues I
| don't have with Postgres. I just hope the Postgres team
| maintains its roadmap based on posts like this.
| why_Mr_Anderson wrote:
| But you have to _get_ to that future first! If you lose
| your customers because you can 't deliver something on
| time due to complexity of your 'scaling-proof' system or
| because you can't accommodate changes requested by
| clients because they would compromise your architecture,
| scaling will be last of your worries.
| theshrike79 wrote:
| Only if I _know_ I'm creating something that will
| definitely have huge amounts of concurrent users and
| someone pays me to make it scale from the start.
|
| For a hobby project that might take off or might not,
| there's really no point in making everything
| "webscale"[0] just in case.
|
| [0] https://youtu.be/b2F-DItXtZs
| o_p wrote:
| Its better to work on getting all those users before planning
| what color the ferrari will be..
| I_am_tiberius wrote:
| Exactly. That's why it would be good to have a system which
| is prepared for scaling in the future.
| oblio wrote:
| What's that system? MySQL? Are there any other OSS
| RDBMSes which are comparable and scale better?
| I_am_tiberius wrote:
| I would only have thought of MySQL.
| doctor_eval wrote:
| MySQL isn't a general solution to problems of scale,
| because you don't know what problems you're going to have
| until you have them. So for example if your scaling
| problem is ACID compliant database updates - say you're
| the next fintech - then I was under the impression that
| MySQL would be the last database you'd want to be using.
| Have I missed something?
| I_am_tiberius wrote:
| I'm no expert and can't answer that. It was just my
| impression, and I might be wrong, that for scaling
| purposes MySQL is better suited. Currently I'm working on
| a Saas product and the test instance that runs on Digital
| Ocean sometimes causes connection limit issues (with
| connection pool) sometimes. Sure my code is maybe not
| perfectly utilizing connections but I'm really afraid
| that this happens in production and I don't know how to
| fix it. On my test environment I just restart everything
| but on a productive environment I can't do that all the
| time.
| aidos wrote:
| The default limit on Postgres is 100, so you need to ask
| yourself why you're exhausting all those connections. The
| issue isn't the dB, it's the code making the connections.
| Advice: don't fret scaling issues, get your fundamentals
| right
| redis_mlc wrote:
| The default MySQL storage engine since 2010 has been
| Innodb, which is ACID-compliant.
|
| Overall, MySQL and Postgres are great Open Source
| databases, but MySQL has the edge for enterprise features
| (ie. replication options) and a simpler grant system.
|
| Source: DBA.
| dkarl wrote:
| While the executives are dreaming of exotic cars, the
| engineers are dreaming of exotic architectures. The
| difference is that when the CEO says, "It's crucial that I
| have this Ferrari BEFORE the business takes off," nobody
| takes them seriously.
| notyourday wrote:
| The really funny part is that the engineers don't just
| dream of those architectures, they implement them. That's
| how you get an app that adds two numbers that runs on
| K8S, requires four databases, a queuing system, a deploy
| pipeline, a grafana/prometheus cluster, some ad-hock Rust
| kernel driver and a devops team.
| marcus_holmes wrote:
| ffs, this attitude causes massively more problems than it
| solves.
|
| 1. You can always change later. Uber switched from Postgres
| to MySQL when they had already achieved massive scale.
|
| 2. You don't know what scaling problems you're going to get
| until you've scaled.
|
| 3. Systems designed to scale properly sacrifice other
| abilities in order to do that. You're actively hurting your
| velocity with this attitude.
|
| 4. Every single expert in the field who has done this, says
| to start with a monolith and break it out into microservices
| as the product matures. Yet every startup is founding on K8s
| because "we'll need it when we hit scale so we might as well
| start with it"
|
| 5. Twitter's Fail Whale - the problems that failing to scale
| properly bring are less than the problems of not being
| flexible enough in the early stages.
|
| Build it simple, and adapt it as you go. Messing up your
| architecture and slowing down your development now to cope
| with a problem you don't have is crazy.
| I_am_tiberius wrote:
| ffs, this attitude causes massively more problems than it
| solves.
|
| I don't think that it causes so many problems to just use
| MySQL instead of Postgres from the very beginning of a
| project. I like using Postgres and I understand that I
| shouldn't care about scaling but if a make a good decision
| from the very beginning it can't hurt.
| doctor_eval wrote:
| Is MySQL a general solution to scaling? What if your
| scaling problem is with _writes_?
| derekperkins wrote:
| That's why Vitess is so awesome - you can scale writes
| infinitely. There's not a truly comparable option for
| Postgres
| marcus_holmes wrote:
| Uber switched because of a very specific problem they had
| with the internals of Postgres, that was handled
| differently in MySQL (which I believe is now "solved"
| anyway).
|
| It's not that MySQL scales better than Postgres, but that
| Uber hit a particular specific scaling problem that they
| could solve by switching to MySQL.
|
| You could well use MySQL "because it scales better" and
| then hit a particular specific problem that would be
| solved by switching to Postgres.
| chrisandchris wrote:
| I would rather use Postgres and have a RDBMS that is
| quite strict and migrate data later instead of having a
| RDBMS that just does what it likes sometimes.
|
| For example, query your table ,,picture" with a first
| column ,,uuid" (varchar) with the following query:
|
| SELECT * FROM picture WHERE uuid = 123;
|
| I don't know what you expect, I expect the query to fail
| because a number is not a string. MySQL thinks otherwise.
| cafard wrote:
| In Oracle it will fail, but only if uuid has characters
| that can't be parsed as numbers...
| chrisandchris wrote:
| Does that make it better? IMHO, it's even worse.
| deckard1 wrote:
| > You don't know what scaling problems you're going to get
| until you've scaled.
|
| This is the point I keep repeating.
|
| If you find yourself needing to scale, the way you scale
| likely does not match what anyone else is doing. The way
| Netflix scaled does not look anything like the way WhatsApp
| scaled. The application dictates the architecture. Not the
| other way around. Netflix started as a DVD service. Their
| primary scaling concerns were probably keeping a LAMP stack
| running and how the hell to organize, ship, and receive
| thousands of DVDs a day. These scaling problems have little
| in common with their current, streaming, scaling problems.
|
| It's a weird thing that developers love to discuss and hype
| up scale and scaling technology and then turn around and
| warn against the dangers of premature optimization in code.
| If you ask me, the mother of all premature optimization is
| scaling out your architecture to multiple servers, sharding
| when you don't need to, dealing with load balancing,
| multiple security layers, availability, redundancy, data
| consistency, containers, container orchestration, etc. All
| for a system that could, realistically, run quite
| adequately on an off-the-shelf Best Buy laptop. We have
| gigabit ethernet and USB 3 on a Raspberry Pi today and
| people are still shocked you could run a site like HN off a
| single server. We've all been lobotomized by the cloud hype
| of the 2010s that we can't even function without AWS
| holding our hand.
| webreac wrote:
| I have not seen comments about technical debt. I think you
| are right: It is good to take shortcuts to ship faster.
| When you do that, you accumulate technical debt. I think it
| is important to identify it and to remain aware of this
| debt. I've seen too many people in denial who resist
| change.
| marcus_holmes wrote:
| It's not even tech debt. It's like a "tech short" -
| assuming you'll have this specific scaling problem in the
| future, and paying the cost now.
| jakeva wrote:
| "Tech short" - I love it. I'm going to use that.
| yen223 wrote:
| I am partial to the "don't solve problems you don't have"
| argument which holds true in a lot of cases.
|
| That said, the database is the one part of the system that
| is very tricky to evolve after the fact. Data migrations
| are hard. It's worth investing a little bit of time upfront
| to get it right.
| nicoburns wrote:
| > Data migrations are hard.
|
| Yes, which is exactly why you shouldn't go with a highly
| scalable database solution. All of the solutions for
| really big scale involve storing data in non-normalised
| form, which mean the pain of data migrations frequently
| while developing features.
|
| Best to avoid this until you have to.
| doctor_eval wrote:
| Agree entirely. You're going to have to migrate anyway.
| May as well migrate from a database that's easy to work
| with.
| zaarn wrote:
| Don't do anything obviously complex with your RDBMS and
| migrations are free. If all you need is a few views,
| tables and FKs, then migration between RDBMS' should be
| low effort if you have a decent RSM or ORM to plug behind
| it. And even with more efforted things, I've written low-
| effort migrations from and to various RDBMS', it's not
| black magic.
|
| The little time upfront is "use pgsql unless there is a
| good reason not to" as your first choice.
| slt2021 wrote:
| if you dont change schema dramatically, then it doesnt
| make much sense to migrate to another RDBMS, because most
| engines have pretty much similar query planner (if you
| not doing "anything obviously complex").
|
| if you do migrate due to scaling issues, then the schema
| must evolve, for example: add in-memory db for caching,
| db sharding/partitioning, table partitioning, hot/cold
| data split, OLTP/OLAP split, etc.
| zaarn wrote:
| Scaling issues can present themselves in numerous ways
| which may not require an in-memory DB,
| sharding/partitioning, hot/cold or such to be changed,
| they may even be already present.
|
| In a lot of cases, these can be used and added without
| locking you out of migration since parts of these are
| deeper application level or just DB side. The query
| planner isn't the end-all of performance, there is plenty
| of differences between MySQL and PgSQL performance
| behaviour that might force you to switch even though the
| query planner won't drastically change things.
| barrkel wrote:
| I've hit many query performance regression problems with ~10
| million rows, which required rewriting with CTEs and other
| techniques to tweak the planner. This isn't a large scale at
| all.
| latch wrote:
| PostgreSQL is great, but I don't think your statement is
| particularly true.
|
| Process per connection is pretty easy to accidentally run into,
| even at small scale. So now you need to manage another piece of
| infrastructure to deal with it.
|
| Downtime for upgrades impacts everyone. Just because you're
| small scale doesn't mean your users don't expect (possibly
| contractually) availability.
|
| Replication: see point above.
|
| General performance: Query complexity is the other part of the
| performance equation, and it has nothing to do with scale.
| Small data (data that fits in RAM) can still be attacked with
| complex queries that can benefit from things such as clustered
| index and hints.
| andoriyu wrote:
| > Process per connection is pretty easy to accidentally run
| into, even at small scale. So now you need to manage another
| piece of infrastructure to deal with it.
|
| Most places I saw this as an issue, are where developers
| think that by tweaking the number of connections will give
| them a linear boost in performance. Those are the same people
| that think adding more writers in RWLock will improve writing
| performance.
|
| I agree that it's easy to run into and pretty silly
| concurrency pattern for today's time. At the same time, it's
| just a thing you need to be aware of when using PostgreSQL
| and design your service with that in mind.
| mekster wrote:
| Everytime I see comments that praises PostgreSQL on top of
| MySQL without any explanations, I tend to think they're trying
| to bury a product from Oracle than from a real need of one over
| the other.
| sli wrote:
| Maybe some younger developers, but I'd imagine a lot of us
| grew to dislike MySQL years before Oracle bought it (in
| 2010). I'd switched to Postgres already by then.
| KptMarchewa wrote:
| >I think it's worth mentioning that most of these problems only
| occur at a scale that only top 1% of companies will reach.
|
| If you're talking about 1% of all software companies, then it's
| not true. You don't need to be B2C company with XXX millions
| users to have a lot of data.
|
| >PostgreSQL is still the best general purpose database in my
| opinion, and you can then consider using something else for
| parts of your application if you have special needs.
|
| Well, yes, you're already talking about one mitigation strategy
| to not get to this scaling problems.
| john8903 wrote:
| Not true - I work at a company of 400 people, and we ran into
| the Process-Per-Connection / pgbouncer issue.
| mikl wrote:
| I guess that's very dependent of what kind of framework
| you're using. The only PostgreSQL-driver I've seen that does
| not have connection pooling built-in is the PHP one (since
| PHP's runtime model does not work in a way where that would
| be easily possible).
| tomnipotent wrote:
| > not have connection pooling
|
| Local connection pooling only goes a very small way to
| mitigate this issue. If you have enough servers hitting PG,
| you're going to need to add in something like PgBouncer
| sooner or later.
| darksaints wrote:
| It's still fairly easy to hit problems even when you're
| using an application level connection pool, simply because
| it's so damn easy to scale up application nodes.
| hattar wrote:
| Company of 20 here. Same.
| holtalanm wrote:
| > I think it's worth mentioning that most of these problems
| only occur at a scale that only top 1% of companies will reach
|
| I'll echo what another commenter said. Tons of data != tons of
| profit.
|
| Tons of data just means tons of data.
|
| Source: Worked on an industrial operations workflow application
| that handled literally _billions_ of records in the database.
| Sure, the companies using the software were highly profitable,
| but I wouldn't have called the company I worked with 'top 1%'
| considering it was a startup.
| hobs wrote:
| It's really not that hard to have billions of rows in a
| modern data ingest situation, especially if you allow
| soft/deleting/versioning.
|
| Honestly anything that fits on one hard drive shouldnt be
| called "tons of data."
| [deleted]
| remus wrote:
| I don't think this is necessarily true. Say you have 100
| sensors sampling at 1kHz for a year, you'd have ~3 trillion
| rows in your database and plenty of potential for scaling
| issues at a very reasonable price.
| refenestrator wrote:
| In that specific case, you probably want to roll up that
| time-series data as it gets older, while keeping the full
| dataset in a flat file system for data science etc if you
| need it.
|
| You probably never need a millisecond-granularity data point
| from 6 months ago in your database.
| sitkack wrote:
| They probably shouldn't be rows at all. They are
| effectively low frequency sound files. I'd probably store
| them in parquet and use a FDW wrapper in Postgres.
| dls2016 wrote:
| I'm starting a project in this realm right now, though only
| three sensors to begin with. Generally I'm leaning towards
| "everything in Postgres", but I think I'm going to store the
| raw sensor data in the filesystem.
| neolog wrote:
| How frequently do they sample?
| dls2016 wrote:
| 500Hz
| beckingz wrote:
| This is what I'm doing right now for my home sensor
| network.
|
| 3 ESP8266's with temperature, humidity, and light sensors
| sending a reading every second to a python app that writes
| a row to postgres on a raspberry pi 3.
|
| So far the hardest bit has been getting all the services to
| restart on pi restart. Postgres works just fine.
| dls2016 wrote:
| I'm going to be using an RPI as well and have been
| messing with systemd. I know people have opinions, but I
| don't and it was straightforward to configure a Python
| script to run as a service.
| nitwit005 wrote:
| Most of the time, people just decide on the aggregates they
| want from those sensors in advance, and discard the raw data.
|
| I worked at a company that had some IOT devices logging
| "still alive and working fine" every couple of minutes. There
| was no point to holding onto that data. You only needed to
| know when the status changed or it stopped reporting in, as
| that's all anyone cared about.
| darkstar_16 wrote:
| I think a lot of issues that people complain about PostgreSQL
| come from the fact that the default config is not very good if
| one wants to run it in production, even for relatively small
| workloads. Things like process per connection can kick one in
| the foot if one is not aware of how PG works.
| forinti wrote:
| I would love it if PostgreSQL had packages.
|
| I have tons of PL/SQL that I would like to move from Oracle to
| PostgreSQL.
| thornygreb wrote:
| I used to miss packages but I mainly use schemas now to
| organize. Not the same I know but as good as it gets. I also
| add https://github.com/okbob/plpgsql_check so that I can find
| bugs earlier since plpgsql is not compiled like pl/sql.
| forinti wrote:
| That is the best way, I think. And it even has some plusses:
| you could have a local tables/views.
|
| But, if you already have a lot of packages and a lot of
| schemas, separating your packages into schemas seems a bit
| daunting. Even more so if they have a lot of dependencies.
| fdr wrote:
| I think this is a reasonable list of weaknesses, with a few
| quibbles. I guess since I've built parts of Heroku Postgres, and
| Citus Cloud, and now Crunchy Bridge...maybe I'd know.
|
| On the other hand...on the whole...maintaining Postgres is
| probably among the cheapest pieces of software on which I have to
| do so, which is why the cloud business model works. Something
| less stable (in all senses of the word) would chew up too much
| time per customer.
| rattray wrote:
| I'd be very curious to hear your quibbles!
| fdr wrote:
| I don't think the post informs on Physical and Logical
| replication that well.
|
| Most database systems of adequate budget and maturity
| implement both, for various reasons.
| rattray wrote:
| Interesting, thanks. Yeah I was surprised to hear his
| skepticism of logical replication, but I've never operated
| it in production before. Curious for resources on that.
| fdr wrote:
| You mean physical, re: skepticism. Just different things.
| Bulky for "CREATE INDEX" or "VACUUM", but also faster for
| a lot of things (no decoding) and able to more naturally
| deal with incomplete transactions. A good way to get a
| feel for that is to read how people compare using either
| one for proprietary databases that have both.
| bpodgursky wrote:
| I'm surprised nobody is complaining about the complexity of the
| permission system.
|
| I'm a generally smart guy, but setting up default permissions so
| that new tables created by a service user are owned by the
| application user... is shockingly complicated.
|
| (I love using Postgres overall, and have no intention of going
| back to MySQL.)
| pyrophane wrote:
| Yes! Postgres permissions are a huge pain to manage! You have
| to worry about table ownership, grants on existing objects, and
| default grants on new objects. It is a huge pain to manage.
| [deleted]
| otikik wrote:
| I will add one minor point to this list:
|
| The name.
|
| To this day I am convinced that the Hazapard UpperCASE usage is
| what has granted us:
|
| - A database called PostgreSQL
|
| - A library called libpostgres
|
| - An app folder called postgres
|
| - An executable called psql
|
| - A host of client libraries which chose to call themselves Pg or
| a variation.
| SigmundA wrote:
| Lets not forget the column casing issue, you either get columns
| to match your apps casing and have to quote them everywhere or
| live with them being lower cased automatically.
| https://dev.to/lefebvre/dont-get-bit-by-postgresql-case-sens...
| icedchai wrote:
| This is true. Many people get confused by the name. I've met
| several developers who refer to it as "Postgray" or some
| variation.
| wosc wrote:
| Recently I saw "post grayskull" on twitter, that's now my
| favourite. ;)
| ironmagma wrote:
| Don't forget "libpq"
| cpa wrote:
| Funfact: PQ is short for toilet paper in French So libpq
| always cracks me up. But then again there's a theorem prover
| called Coq (which is indeed pronounced as you imagine, it
| means rooster) and it's been named by French researchers at
| INRIA!
| brandmeyer wrote:
| As an American, I feel like I have to deliberately mis-
| pronounce 'coq' the theorem prover like 'coke' the soda.
| ironmagma wrote:
| Same. It really doesn't help that their logo is skin-
| colored and the shape that it is...
| zdkl wrote:
| libcaca, also french: http://caca.zoy.org/
| gonzus wrote:
| Ok, let's talk about this... pico, the editor and SI
| prefix, means "dick" in at least some Spanish-speaking
| countries... Source of endless nerd jokes.
| mst wrote:
| Not to mention the (un)surprisingly low sales in spanish
| speaking countries of the car known as the Vauxhall Nova
| ...
| pdw wrote:
| PostgreSQL used to be called Postgres. They renamed it when
| they added SQL support.
| rattray wrote:
| Renaming would be even worse!
| skrebbel wrote:
| Seems to me that they could safely rename to Postgres without
| much downside.
| rattray wrote:
| Among other potential issues, this would make it much
| harder to search for information related to the database.
| Starting out, it'd always make sense to google for eg
| "postgres ilike", but for new features you'd have to search
| for eg "NewNameSQL kindalike" (assuming a new ILIKE
| replacement called KINDALIKE comes along in pg15 aka
| newname3).
|
| Even years in to the rename, newcomers to NewNameSQL would
| need to be told that it used to be called Postgres and that
| they should look for things related to that too.
|
| Tools and code that refer to Postgres would all have to
| change their names, including those developed internally,
| open-source, closed-source, and no-longer-maintained. Not
| all would, and some would change the name and functionality
| at the same time.
|
| It'd be chaos.
| kgraves wrote:
| Postgres still doesn't support native global replication, :( this
| is kind of a deal breaker for me.
| I_am_tiberius wrote:
| Is the process per connection issue the reason why Digital Ocean
| etc. have so low limits on their concurrent connection settings?
| Even on my test database sometimes I run out of connections.
| dboreham wrote:
| Yes
| Tostino wrote:
| Setup a droplet with a loadbalancer
| I_am_tiberius wrote:
| Need to investigate this, thanks.
| 3pt14159 wrote:
| > many of the issues brought up in this post can be reduced or
| eliminated by using a managed database service like Heroku
| PostgreSQL, ...
|
| They come with their own issues though. I was unable to change a
| ludicrously small variable (which I think was temp_buffers) on
| Heroku's largest Postgres option. There was no solution. I just
| had to accept that the cloud provider wouldn't let me use this
| tool and code around what otherwise would have worked.
|
| That said, at least backups and monitoring are easy.
| bvm wrote:
| Ah I remember when Mandrill was hit with the XID thing, painful
| day.
| I_am_tiberius wrote:
| Are some of these problems solved by CitusData?
| taspeotis wrote:
| Microsoft is working hard to fix a lot of the problems e.g.
| connection scalability [1].
|
| [1] https://techcommunity.microsoft.com/t5/azure-database-for-
| po...
| I_am_tiberius wrote:
| That's great. Connection scalability is my biggest issue with
| Postgres currently. It sounds like they work on it in by
| commiting directly to Postgres and not only to CitusData.
| robertlagrant wrote:
| They're investing rather a lot into Postgres; it's great.
| datavirtue wrote:
| This blog post answered a lot of questions related to the
| internals, allowing me to make a better (real) comparison between
| SQL Server and PostgreS.
|
| For all of these issues he pointed out it is simply done
| differently in SQL Server and suffers none of the stated
| pitfalls. Well, you can't get the source code, and it is not
| free.
| roenxi wrote:
| Rather than a query planner, an interesting approach would be to
| expose the more stable part of the internals with a new language
| and let people roll their own query plans. Then Postgres can be
| NoSQL too and we can all be happy.
|
| I'm not hopeful that it would be technically feasible, but it
| isn't obvious that Postgres needs to only support SQL as an
| interface. The SQL language is so horrible I assume it is already
| translated it into some intermediate representation.
| snuxoll wrote:
| > expose the more stable part of the internals with a new
| language and let people roll their own query plans.
|
| You're basically talking about ISAM style access at this point.
| Even IBM started discouraging that on IBM i and is pushing
| developers to use embedded SQL instead.
| assface wrote:
| > Rather than a query planner, an interesting approach would be
| to expose the more stable part of the internals with a new
| language and let people roll their own query plans.
|
| Those that are ignorant of history are doomed to repeat it.
|
| Go read Stonebraker's "What Goes Around Comes Around"
|
| https://15721.courses.cs.cmu.edu/spring2020/papers/01-intro/...
| rattray wrote:
| Another recent Postgres-complaint post from one of the best
| engineers I've worked with: https://blog.nelhage.com/post/some-
| opinionated-sql-takes/
|
| Quoting his conclusion:
|
| > As for Postgres, I have enormous respect for it and its
| engineering and capabilities, but, for me, it's just too damn
| operationally scary. In my experience it's much worse than MySQL
| for operational footguns and performance cliffs, where using it
| slightly wrong can utterly tank your performance or availability.
| ... Postgres is a fine choice, especially if you already have
| expertise using it on your team, but I've personally been burned
| too many times.
|
| He wrote that shortly after chasing down a gnarly bug caused by
| an obscure Django/Postgres crossover:
| https://buttondown.email/nelhage/archive/22ab771c-25b4-4cd9-...
|
| Personally, I'd still opt for Postgres every time - the
| featureset is incredible, and while it may have scary footguns,
| it's better to have footguns than bugs - at least you can do
| something about them.
|
| Still, I absolutely wish the official Postgres docs did a better
| job outlining How Things Can Go Wrong, both in general and on the
| docs page for each given feature.
| paulryanrogers wrote:
| It's interesting how personal scars can entrench ones
| perspective. After MySQL 8's renaming-table-will-crash-server
| bug I'm reluctant to use it for new projects.
| aaron-santos wrote:
| In any sufficiently large dev team the technology choices
| trend toward zero. Because everyone has their "I will not
| work with this tech after dealing with its nasty bug."
| paulryanrogers wrote:
| Indeed. And products can get better and worse over time. If
| there's a consistent pattern of critical bugs at least with
| FOSS there is the possibility of forks and 3rd party
| patches.
| jrockway wrote:
| I'm personally guilty of this mindset, but it's something I'm
| working on. After you get burned by a system, you know of the
| bug, and you can fix it. But the instinct is to switch to a
| new system, or to rewrite the system. That does get rid of
| all the bugs in the old system! But, in the process you've
| replaced them with brand new bugs that nobody has seen or
| heard from, until they decide to crawl into your mouth while
| you're asleep and you wake up in a panic. It's bugs all the
| way down, folks... this is software we're dealing with!
|
| I think sendmail is the classic example of a program with so
| many bugs it had to be rewritten from scratch, and many
| people did. All of those alternatives, even qmail (widely
| debated:
| https://www.qualys.com/2020/05/19/cve-2005-1513/remote-
| code-...), ended up with a bug or security problem too. And
| they seem to have even fixed sendmail. It's still around and
| it doesn't take down the whole Internet every three weeks
| anymore. Wow! Sometimes there are just a million bugs, and
| you fix all one million of them, and then there aren't any
| more bugs.
| psanford wrote:
| > It's still around and it doesn't take down the whole
| Internet every three weeks anymore.
|
| Maybe that's because the vast majority of email no longer
| goes through sendmail.
| rattray wrote:
| Does anyone know of a quality, comprehensive book that
| enumerates all the things to watch out for and problems to
| proactively prevent when operating Postgres at scale?
| eor wrote:
| It's not a book, but Christophe Pettus' blog
| (https://thebuild.com/blog) has a lot of really good
| information. In particular, his talk "Breaking PostgreSQL at
| Scale" goes through the problems you run into as you hit
| different levels of scale
| (https://thebuild.com/presentations/2019-fosdem-broken.pdf)
| rattray wrote:
| Thanks! Very helpful, that talk looks like a (very)
| condensed version of what I was looking for.
|
| It looks like the video for that talk is here:
| https://www.youtube.com/watch?v=XUkTUMZRBE8
| twunde wrote:
| FYI, for anyone interested Pettus does consulting work as
| the founder of https://pgexperts.com/. Highly recommend
| working with him if you need a postgres DBA.
| gshulegaard wrote:
| Thanks for that! Been doing a lot of Postgres work but
| first time seeing that slide deck.
| oauea wrote:
| As long as mysql can't run ddl statements in a transaction it's
| worthless as far as I'm concerned.
|
| Also the thing where they (used to?) silently truncate your
| data when it wouldn't fit a column is absolutely insane. I'll
| take operational footguns over losing half my data every damn
| time.
| Foobar8568 wrote:
| MySQL is as advanced as Oracle on this topic (DDL in
| transaction), unless Oracle has changed in the recent years.
| jeltz wrote:
| Which means equally useless. I agree with him about
| transactional DDL. Having worked both with it and without
| it I would never want to go back to MySQL.
| Foobar8568 wrote:
| I have never said that it was useful :o I work with SQL
| Server, and I have been always amazed that DDL aren't
| transactional in Oracle. And it's supposed to be a
| "serious" database. That and the empty string being
| equals to NULL, but I think they 180 on that point in the
| recent years.
| skunkworker wrote:
| As nice as having DDL in a transaction is, once you get to
| scale this isn't used as you'll be doing index creation
| concurrently which cannot be done inside a DDL transaction.
| setr wrote:
| Till v8.0.16 mysql used to accept and then just ignore check
| constraints
|
| I've never been so offended by a technology as the day I
| discovered that; it's not a misfeature and its not a bug --
| only pure malice could have driven such a decision
| zodiakzz wrote:
| Don't forget the 3-byte encoding they invented and call it
| 'utf8'.
| stepbeek wrote:
| The lack of ddl in a transaction is what scared me away too.
| Having to manually clean up after a failed migration just
| felt like something I shouldn't be thinking about.
| carlos_rpn wrote:
| They don't truncate data anymore, unless you enable it in the
| configuration (it's disabled by default). Invalid data
| (0000-00-00) is also not accepted anymore.
| lokedhs wrote:
| I'm definitely no fan of mysql. I have, like many others,
| been scarred by its misfeatures. However, not having DDL in
| transactions isn't really a barrier for being useful. Oracle
| doesn't have transactional DDL either, and say what you will
| about the company, the product itself has proven itself.
| jhgb wrote:
| But Firebird does have it, too. I find it funny how
| supposedly top products sometimes lack the most basic
| features.
| gshulegaard wrote:
| Sorry for the long, rambling comment. After I wrote it I wasn't
| sure it added much, but since I invested so much time writing
| it I figured someone might find something in it useful so in
| that off chance I am posting it.
|
| ---
|
| Those were really interesting reads, and it's obvious to me
| that the author is well experienced even if I find myself at
| odds with some of the points and ultimate conclusion. To be
| explicit, there _are_ points which resonated strongly with me.
|
| I am by no means an expert, and fairly middling in experience
| by any nominal measure, but I _have_ spent a significant
| portion of my professional experience scaling PostgreSQL so I
| thought I would throw out my $0.02. I have seen many of the
| common issues:
|
| - Checkpoint bloat
|
| - Autovacuum deficiencies
|
| - Lock contention
|
| - Write amplification
|
| and even some less widely known (maybe even esoteric) issues
| like:
|
| - Index miss resulting in seq scan (see "random_page_cost"
| https://www.postgresql.org/docs/13/runtime-config-query.html)
|
| I originally scaled out Postgres 9.4 for a SaaS monitoring and
| analytics platform, which I can only describe as being a very
| "hands on" or a manual process. Mostly because many performance
| oriented features like:
|
| - Parallel execution (9.6+) (originally limited in 9.6 and
| expanded in later releases)
|
| - Vacuum and other parallelization/performance improvements
| (9.6+)
|
| - Declarative partitioning (10.0) (Hash based partitions added
| in 11.0)
|
| - Optional JIT compiling of some SQL to speed up expression
| evaluation (11.0)
|
| - (and more added in 12 and 13)
|
| Simply didn't exist yet. But even without all of that we were
| able to scale our PostgreSQL deployment to handle a few
| terabytes of data ingest a day by the time I left the project.
| The team was small, between 4-7 (average 5) full time team
| members over 3 years including product and QA. I think that it
| was possible--somewhat surprisingly--then, and has been getting
| steadily easier/better ever since.
|
| I think the general belief that it is difficult to scale or
| requires a high level of specialization is at odds with my
| personal experience. I doubt anyone would consider me a
| specialist; I personally see myself as an average DB _user_
| that has had the good fortune (or misfortune) to deal with data
| sets large enough to expose some less common challenges.
| Ultimately, I think most engineers would have come up with
| similar (if not the same) solutions after reading the same
| documentation we did. Another way to say this is I don't think
| there is much magic to scaling Postgres and it is actually more
| straight forward than common belief suggests; I believe there
| is a disproportionate amount of the fear of the unknown rather
| than PostgreSQL being intrinsically more difficult to scale
| than _other RDBMS 's_.
|
| The size and scope of the PostgreSQL feature set can make it
| somewhat difficult to figure out where to start, but I think
| this is a challenge for any feature-rich, mature tool and the
| quality of the PostgreSQL documentation is a huge help to
| actually figuring out a solution in my experience.
|
| Also, with the relatively recent (last 5 years or so) rise of
| PostgreSQL horizontal-scale projects like Citus and TimescaleDB
| I think it is an even easier to scale PostgreSQL. Most
| recently, I used Citus to implement a single (sharded)
| storage/warehouse for my current project. I have been _very_
| pleasantly surprised by how easy it was to create a hybrid data
| model which handles everything from OLTP single node data to
| auto-partitioned time series tables. There are some gotchas and
| lessons learned, but that's probably a blog post in it's own
| right so I'll just leave it as a qualification that it's not a
| magic bullet that completely abstracts the nuances of how to
| scale PostgreSQL (but it does a darned lot).
|
| TL;DR: I think scaling PostgreSQL is easier than most believe
| and have done it with small teams (< 5) without deep PostgreSQL
| expertise. New features in PostgreSQL core and tangential
| projects like Citus and TimescaleDB have made it even easier.
| jbluepolarbear wrote:
| They're so right about performance gotchas. I worked on a large
| Java project a few years back and they were transitioning from
| MySQL to Postgres, after the upgrade performance was abysmal. I
| then spent the next 5 months optimizing queries. A lot of the
| issues were inner joins and how MySQL and Postgres handled
| lookups in inner joins differently. I would still pick Postgres
| over MySQL because the tools and features around it are too
| very good.
| twic wrote:
| My experience was the opposite - a Java app running on MySQL
| that had painfully slow joins, that immediately got much
| faster on porting to PostgreSQL!
| runeks wrote:
| Devil's advocate: could it simply be that someone spent 5
| months optimizing queries for MySQL before switching to
| Postgres? Such that Postgres performance isn't "worse", it
| just doesn't plan queries in the same way that MySQL does.
| jbluepolarbear wrote:
| It was definitely that. The queries were built to take
| advantage of MySQL features.I joined after they just
| switched and was told to optimize queries. Was a pretty
| toxic job, I was hired work on streaming media systems
| (like Hulu, ESPN, etc) but instead they had me doing query
| optimizations. I quit after 5 months. I was never able talk
| to my boss once, he avoided me and was always too busy.
| sdevonoes wrote:
| For the ones who know: is MySQL (without Percona) affected by the
| same issues the author is talking about?
| derekperkins wrote:
| Most of these are Postgres specific, though MySQL is going to
| have its own list of issues.
| sethhochberg wrote:
| Its a mixed bag, the author does make a few direct comparisons
| to MySQL.
|
| In my experience MySQL replication and cluster defaults are
| generally a lot more robust, whether you're using Galera or PXC
| or even just master-slave replication. Other pain points the
| author discussed like Postgres effectively being major-version
| incompatible between different replicas are less of an issue in
| MySQL, especially since it most typically is configured with
| statement-based replication (ie, SQL statements sent over the
| wire instead of a block format for data on disk). With
| statement-based replication, as long as the same SQL statements
| are supported on all members of the cluster / replicas, the
| server versions are - usually - negligible. You wouldn't want
| to make a practice out of replicating between different MySQL
| server versions, but you could, and this is sometimes very
| useful for online upgrades.
|
| MySQL absolutely fares better on the thread-per-connection
| front, a single MySQL server can manage a far larger pool of
| connections than a single equivalent Postgres server can
| without extra tooling.
|
| InnoDB also uses index-organized tables, which tend to be more
| space-efficient in some scenarios, and has native compression
| too - but both MySQL and Postgres can benefit from running on
| compressed volumes in ZFS.
|
| Honestly, I think most of the hate MySQL gets is perhaps
| rightfully justified for far earlier versions of the database
| or specifically for the MyISAM storage engine. But if you're
| using MySQL 5.7ish or later, InnoDB, and some of the other
| Percona tooling for things like online DDLs you've got an
| extremely robust DBMS to work with. My current company uses
| Postgres on RDS, but I've maintained complex MySQL setups in
| the past on bare metal, and either approach has been perfectly
| serviceable for long term production use.
| lmarcos wrote:
| > But if you're using MySQL 5.7ish or later, InnoDB, and some
| of the other Percona tooling for things like online DDLs
| you've got an extremely robust DBMS to work with.
|
| Would you say that percona is a must when using MySQL in
| production? I have some experience with Postgres and the fact
| that pgbouncer is needed in production environments makes me
| think "why postgress doesn't come with batteries included?".
| sethhochberg wrote:
| The landscape is already a bit weird in terms of "out of
| the box" features for MySQL because there are at least a
| few major distributions between Oracle MySQL, Percona
| XtraDB, MariaDB, etc. They're generally all compatible.
| They have some different defaults, and very rarely
| different features. But particularly for XtraDB you tend to
| get access to better diagnostic info and some better
| defaults for InnoDB (or at least you used to, my info may
| be a year or two out of date at this point).
|
| While it is true that MySQL support for online DDLs has
| gotten much better over the years, I think the tools like
| pt-online-schema-change are still extremely valuable -
| there are still certain kinds of changes that you can't
| make with an online DDL in MySQL, or sometimes you
| specifically don't want to take that approach. But I'd
| think of the Percona Toolkit stuff as more a nice set of
| tools to have in your DBA toolkit, rather than an essential
| part of your DBMS for anyone running it in production. Its
| not like the pgbouncer situation. Everybody wants to avoid
| process-per-connection, but plenty of people can get by
| without complex online schema migrations.
| oblio wrote:
| Maybe there are some core PostgreSQL hackers here:
|
| I know this probably sounds silly but for the transaction ID
| thing, it does seem like a big deal, is it really insurmountable
| to make it a 64 bit value? It would probably push this problem up
| to a level where only very, very few companies would ever hit it
| and from a (huge) distance the change shouldn't be a huge
| problem.
| jeltz wrote:
| There have been several discussion about this and I if I recall
| correctly the main issue is that this would bloat the tuple
| size even more (PostgreSQL already has a large per-tuple
| overhead). The most promising proposal I have seen is to have
| 64-bit XIDs but only store the lower 32-bits per tuple but have
| a per-page epoch for the upper bits.
| tandr wrote:
| store it (something) like protobuf does - the smaller the
| number, the less bytes it takes?
| glogla wrote:
| The first paragraph,
|
| > Over the last few years, the software development community's
| love affair with the popular open-source relational database has
| reached a bit of a fever pitch. This Hacker News thread covering
| a piece titled "PostgreSQL is the worlds' best database", busting
| at the seams with fawning sycophants lavishing unconditional
| praise, is a perfect example of this phenomenon.
|
| is exactly the kind of gratuitous over the top statement that
| makes me immediately lose respect for the author.
|
| And the skimming the rest of the article, most the the "points"
| are just complaints about the tradeoffs made in various design
| decisions like MVCC, heap tables with indexes on the side, etc.
| The author is basically complaining "it's not MySQL".
|
| Don't waste your time with this article.
| junippor wrote:
| Ok, what do you say about this one?
|
| > #9: Ridiculous No-Planner-Hints Dogma
|
| One of these "query shifts" that the author mentions happened
| with a production database where I work. It was down for two
| days. The query planner used to like using index X but at some
| point decided it didn't want to use that and decided it wanted
| to do a table scan inside a loop instead. Meaning: one day a
| certain query was working fine, the next day the same query
| never finishes. In my opinion this is unacceptable.
|
| What's your take?
| why_Mr_Anderson wrote:
| (I'm not not very familiar with Postgres, but this is common
| among RDBMSs). What changed is size and/or statistics. Also,
| if the Query Optimizer supports something like parameter
| sniffing, that happened. Unacceptable? Not really. Annoying?
| Very much so.
| crimper wrote:
| i think you need to provide more details for a good reply.
| what changed between the time index was used and when it
| wasn't? I also had to "convince" postgresql to use my index
| but that lead to a much better design
| junippor wrote:
| > i think you need to provide more details for a good
| reply. what changed between the time index was used and
| when it wasn't? I also had to "convince" postgresql to use
| my index but that lead to a much better design
|
| I disagree: given that nothing changed, I don't think any
| details need to be provided.
|
| The question is NOT "Is postgresql's choice better than
| mine?" The question is "A certain design was working and
| suddenly broke because one day the query planner decided to
| start choosing a different (and unusable) plan - is this
| ever acceptable?" and the answer is obviously No,
| regardless of the details.
| lmm wrote:
| I guarantee you that something changed. Maybe the row
| count passed a certain threshold. Maybe you upgraded the
| database version.
|
| If you don't want the query planner to pull arbitrary
| execution behaviour out of its ass, why are you using an
| SQL database in the first place? The whole point of SQL
| is that you declare your queries and leave it up to the
| planner to decide, and for that to be at all workable the
| planner needs to be free to decide arbitrarily based on
| its own heuristics, which will sometimes be wrong.
| barrkel wrote:
| Thing is, MySQL, with judicious use of STRAIGHT_JOIN,
| won't do the same thing. And generally MySQL is much more
| predictable because it's much less sophisticated: it only
| has a couple of join strategies (pre 8.0, only nested
| loop join) and quite limited query rewriting, so you can
| - with practice - expect a query plan as you write the
| SQL. And in practice, there's usually only two or three
| really big tables involved in performance-sensitive
| queries, tables which you need to pay attention that you
| don't end up doing scans on. The rest of the tables you
| can leave up to the planner.
| SigmundA wrote:
| The problem is something changed at a random time in a
| production db on the weekend in the middle of the night,
| what changed, is that logged somewhere?
|
| Other databases show that you can have the planner decide
| if you don't specify but with some simple hints you can
| override because I as the developer am in charge not the
| planner.
| sgift wrote:
| > I disagree: given that nothing changed, I don't think
| any details need to be provided.
|
| You sound like a typical enterprise customer. "The whole
| system stopped working!!!!" "What did you change?"
| "Nothing!!!" "Are you sure?" "Yes!!!" .. searching
| around, looking into logs, and so on .. "Could it be that
| someone did x? The logs say x has happened and had to be
| done manually." "Oh yes. x was done by me."
|
| But, obviously, nothing has changed.
| junippor wrote:
| You sound like you have to deal with idiots all the time
| and resent that.
|
| You also sound like you don't know much about PostgreSQL
| if you can't immediately see what happened.
| jeltz wrote:
| I feel that is the least fair of the complaints (I agree with
| several of them and have some of my own too). Not because
| query hints are not disreable but because who is going to pay
| for maintaing them? It is not really dogma (I, with dome
| help, managed to convince them to merge one very specific
| query hint: MATERIALIZED for CTEs) but that they do not want
| to expose more of the innards of the query planner than
| necessary to not slow down planner development. Planner
| development is hard enough as is, with query hints it is
| going to become at least twice as hard due to even more fear
| of breaking applications.
| barrkel wrote:
| STRAIGHT_JOIN is probably my favourite feature of MySQL in
| terms of planner hints; but there's actually a deeper
| inconsistency behind the philosophy.
|
| Usually, when you get a bad query plan, it's because the
| join order isn't right. Outside the start table and hash
| joins, indexes need to match up with both predicates and
| the join keys. Get the wrong join order and then your
| indexes aren't used.
|
| Since you need to specify which indexes to build and
| maintain, and such indexes are generally predicated on the
| query plan, why not ensure that the query is using the
| expected indexes?
|
| If one really wants to go down the route of no optimizer
| hints, then the planner should start making decisions about
| what indexes to build and update. Go all in.
| SigmundA wrote:
| Right db developers decide what indexes to add but aren't
| allowed decide if and how they are used.
|
| Join order / type and which indexes to use would go a
| long way, thats pretty much all I need to do on MSSQL
| server if the planner is not cooperating.
| magicalhippo wrote:
| > Join order
|
| Had to fight this a few times, planner thought it was
| smart to scan an index for a few million rows, then throw
| almost all of them away in a join further up, ending up
| with a few hundred rows.
|
| Caused the query to take almost a minute. Once the join
| order was inverted (think I ended up with nesting
| queries) the thing took a second or two.
| jeltz wrote:
| > why not ensure that the query is using the expected
| indexes?
|
| How should your database know this unless you explicitly
| named the indexes in your query? Just because there
| exists an index on a particular predicate does not mean
| that using the index would result in a faster query.
| fuy wrote:
| I wouldn't paint all hints with the same brush - it's not
| like the very fact of having a hint exposes query planner
| internals for arbitrary usage. Some hints may be more
| useful than others, and some may be less complicated to
| maintain - why not try to investigate if there's an
| intersection of these two sets that would be a valuable
| addition to Postgres?
| jeltz wrote:
| Maybe. The traditional query hinting implementations like
| MySQL's are all about exposing planner internals, but
| maybe if someone proposed a form of query hints which is
| less intrusive then there might be fruitful discussion
| about it. I think a huge issue is that as soon as someone
| mentions query hints they directly think about MySQL's
| and similar solutions.
| magicalhippo wrote:
| We've had a few cases like that at work with SQLAnywhere,
| where it suddenly switches to table scans of some critical
| table.
|
| In almost all cases simply recalculating statistics fixes it.
| We had one or two cases where we needed to drop and recreate
| some of the indexes, which was much more annoying.
|
| Doesn't happen often, but really annoying when it does.
| chalst wrote:
| This comment is very far from the mark: it would be closer to
| the truth to say the article is mostly complaining about lack
| of tools to help ameliorate problems caused by the
| architecture.
|
| The author clearly likes Postgres and ends the piece by saying
| he expects all the problems he talks about to be solved in
| time.
| yygbnku wrote:
| The connection issue is the most important one.
| osigurdson wrote:
| I agree. PgBouncer (or something better) should be baked into
| Postgres.
| drewbug01 wrote:
| > #1: Disastrous XID Wraparound
|
| > Pretty much any non-trivial PostgreSQL install that isn't
| staffed with a top expert will run into it eventually.
|
| I agree that this landmine is particularly nasty - and I think it
| needs to be fixed upstream somehow. But I do think it is fairly
| well known at this point. Or at least, people outside of "top
| expert[s]" have heard of it and are at least aware of the problem
| by now.
| dfox wrote:
| In normal use XID wraparound is not particularly problematic.
| For it to be an issue you have to either have application that
| for some reason assumes that XIDs monotonically increase (for
| example in its implementation of optimistic locking) or you
| have significantly larger issue with totally unmanaged MVCC
| bloat (caused by either not running vacuum at all or by really
| having ridiculous amount of ridiculously long-running
| transactions active at once).
|
| But then there is interesting related issue in some client
| libraries: the XID is 32bit unsigned value and some libraries
| which transparently implement optimistic locking (eg. ODBC)
| interpret it as 32b signed value. I somewhat suspect that most
| people who had "significant production outage caused by XID
| wraparound" were in fact bitten by this or something similar.
| candiddevmike wrote:
| Only thing I really hate about PostgreSQL (probably not specific
| to it) is the lack of visibility into triggers. Give me jaeger
| style tracing for each trigger with per statement durations and I
| would be a very happy dev.
| drewbug01 wrote:
| Your statement intrigued me, so I fired up the ol' googles and
| started looking to see if anyone had tried this. And within the
| first page of results I found a comment from you a few months
| ago saying the same thing! :)
|
| This seems really interesting - at least for debugging (I worry
| that it would tank performance under load). Have you considered
| trying to work on it? My googling suggest that you seem rather
| interested in the idea! The postgres community is overall
| really welcoming to contributions (as is the OpenTelemetry
| community, hint hint).
| candiddevmike wrote:
| I keep posting it on HN hoping a Postgres dev hears my plea
| :)
|
| I've never programmed in C for anything serious, so I'm not
| sure where I'd even start. I _think_, based on my limited
| knowledge of postgres extensions, you'd have to bake the
| jaeger sampling into PG proper--I don't think extensions can
| intercept/inspect triggers.
| sitkack wrote:
| I think the solution would be to add triggers to the dtrace
| probes.
|
| https://www.postgresql.org/docs/current/dynamic-trace.html
| throw1234651234 wrote:
| 1. Can't define variables as easily as in SQL.
|
| 2. Weak support for functions/SPs as a result.
|
| 3. Naming conventions inconsistent with SQL which makes any ORMs
| a pain.
| erezsh wrote:
| imho all of these are easily offset by the fact that unlike in
| sqlite or mysql, `cast('1,031' as int)` will throw an error
| instead of returning 1.
| throw1234651234 wrote:
| Schemas are also a lot more usable than namespaces in SQL, as
| a random note.
| dvdkon wrote:
| What are you referring to as "SQL"? As far as I know, ISO-
| standard SQL doesn't have variables outside of its procedural
| language. I'm also not aware of any SQL-wide naming
| conventions. But I haven't read any of the standards, so I
| could be missing something.
| throw1234651234 wrote:
| "T-SQL" - in reality, everyone just refers to it as SQL
| Server even though it's obviously not a language.
|
| I find this syntax super helpful:
|
| DECLARE @someName AS VARCHAR(100) SET @someName= 'Bob'
|
| When coupled with SPs and Functions, you can write some
| fairly readable logic in SQL. I am completely against this
| for most applications, but when you need it, it helps.
| dvdkon wrote:
| Oh, alright. I know T-SQL and have actually missed
| variables in PostgreSQL before. Just please don't shorten
| it to "SQL", it's only one dialect of many.
| dragonwriter wrote:
| > have actually missed variables in PostgreSQL before
|
| You can just use a DO-block with plpgsql (or any other
| installed procedural language) code to make variables
| available in Postgres. T-SQL just somewhat elides the
| distinction between declarative and procedural code.
|
| https://www.postgresql.org/docs/current/sql-do.html
| ksec wrote:
| >While much of this praise is certainly well-deserved, the lack
| of meaningful dissent left me a bit bothered.
|
| Had the same feeling when I was reading that thread. And has been
| for quite some time when the hype is over the top.
|
| The problem is seemingly Tech is often a cult. On HN, mentioning
| MySQL is better at certain things and hoping Postgres improve
| will draw out the Oracle haters and Postgres apologist. Or they
| are titled in Silicon valley as evangelist.
|
| And I am reading through all the blog post from the author and
| this [1] caught my attention. Part of this is relevant to the
| discussion because AWS RDS solves most of those shortcomings.
| What I didn't realise, were the 78% premium over EC2.
|
| [1] RDS Pricing Has More Than Doubled
|
| https://rbranson.medium.com/rds-pricing-has-more-than-double...
| twic wrote:
| I have a kneejerk reaction against "there is something,
| anything at all, wrong with PostgreSQL" posts. I don't _think_
| it 's because i'm in a cult.
|
| I think it's because, despite real flaws, PostgreSQL is still
| the best all-round option, and still the thing i would most
| like to find when i move to a new company. Every post pointing
| out a flaw with PostgreSQL is potentially ammunition for an
| energetic but misguided early-stage employee of that company to
| say "no, let's not use PostgreSQL, let's use
| ${some_random_database_you_will_regret} instead".
|
| I suppose the root of this is that i basically don't trust
| other programmers to make good decisions.
| ksec wrote:
| That is also true as well. I guess my point is I want balance
| views. I dont want a one sided opinion pieces.
| twic wrote:
| Me neither! As long as those balanced views are only posted
| on the secret internet, where mature and sophisticated
| programmers such as the two of us can read them.
| nesarkvechnep wrote:
| Hate is a strong feeling to have towards a database.
| wideareanetwork wrote:
| I find it hard to understand that despite long term withering
| criticism of the XID Wraparound issue, a definitive solution does
| not seem to have been prioritized.
___________________________________________________________________
(page generated 2021-04-06 23:01 UTC)