hngopher.com

       [HN Gopher] Why Uber Engineering Switched from Postgres to MySQL...
       ___________________________________________________________________
        
       Why Uber Engineering Switched from Postgres to MySQL (2016)
        
       Author : syspec
       Score  : 110 points
       Date   : 2021-02-27 07:34 UTC (15 hours ago)
        
 (HTM) web link (eng.uber.com)
 (TXT) w3m dump (eng.uber.com)
        
       | blowski wrote:
       | I spent a whole decade saying "Why do I need Postgres? MySQL is
       | fine."
       | 
       | Started using Postgres a couple of years ago, and I now can't
       | believe I ever lived without window functions, native arrays,
       | custom types, etc.
        
         | nkozyra wrote:
         | My experience as well. I first got the itch when mysql lagged
         | on postgis. Found so many features that have saved me so much
         | time. More sensible defaults, too.
        
         | mmcgaha wrote:
         | Arrays in Postgres are my guilty pleasure. I know I shouldn't
         | use them but I just cannot help myself.
        
           | lucian1900 wrote:
           | They're great for denormalisation, which is often an
           | appropriate trade off.
        
             | bitexploder wrote:
             | Even storing JSON blobs isn't the hit it used to be. It's
             | not relational, but you can index on fields in the JSON and
             | query it effectively. The Postgres array and other types
             | are great for stuff at the edges of your database or when
             | you know you won't need to build relations into or on that
             | data. RDBMS like Postgres really provide a powerful,
             | powerful technology for managing data.
        
             | colanderman wrote:
             | Yes! Have one-to-many data that you know you will usually
             | always grab all in one go, and won't ever participate in a
             | relation? Arrays are the way to go. JSONB can be similarly
             | highly appropriate. They can greatly reduce the number of
             | disk reads needed for certain workloads.
             | 
             | Don't forget that both can be indexed in Postgres! And the
             | indexes are more powerful than what you can do with the
             | equivalent relational layout, as they support efficient
             | subset queries.
        
               | sk5t wrote:
               | Same goes for hstore and the @> style operators.
        
         | dcposch wrote:
         | MySQL has window functions and native JSON now.
        
         | darksaints wrote:
         | Postgres was my first, and the Postgres docs were foundational
         | for someone like me.
         | 
         | Tried MySQL a couple years later, and every day I used it I
         | found a new reason to never use it again.
        
           | spiralx wrote:
           | I remember reading the MySQL developers justifications for
           | why transactions weren't important along with articles such
           | as these back in the 00s
           | 
           | https://sql-info.de/mysql/gotchas.html
           | 
           | https://fromdual.com/mysql-limitations
           | 
           | and deciding a database that treats 1/0 as NULL and allows
           | inserting February 31st as a valid date wasn't worth
           | bothering with.
        
           | axegon_ wrote:
           | If you liked the docs, you should definitely check out the
           | source code. It's not just poetry by C standards, it's poetry
           | by Shakespearean standards. Hands down the best code I've
           | ever seen. What makes it more astonishing is that it's a
           | product of relatively small community scattered around the
           | globe.
        
         | tpetry wrote:
         | The best part is transactional ddl statements. You can do your
         | database migration in a transaction, if something fails the
         | transaction is rolled back compared to an invalid state with
         | mysql.
        
           | chousuke wrote:
           | Being used to PostgreSQL, this one actually bit me once
           | during a production upgrade. A database migration failed due
           | to a Galera transaction size limit that I unfortunately
           | hadn't caught testing the migrations on a single database,
           | and I had to restore from the pre-upgrade backup before
           | resolving the issue and continuing. It wasn't a major issue
           | (the upgrade finished well within the acceptable downtime
           | window), but until then I had assumed that the migrations
           | would be transactional because of course they should be, it's
           | a database!
           | 
           | Now I know better than to assume everything you do in a
           | database is transactional. :P
        
             | spuz wrote:
             | Which database are you talking about? Postgres or Mysql?
        
               | llarsson wrote:
               | Something that happened when they were on MySQL, before
               | PostgreSQL.
               | 
               | The phrasing at the start and "Galera" mention shows
               | this.
        
           | derekperkins wrote:
           | That was added a couple years ago in MySQL 8, as were window
           | functions and arbitrary checks. The two databases continue to
           | fill in the gaps that were historically reasons to choose one
           | over the other.
        
             | paulryanrogers wrote:
             | Are Mysql 8 DDL changes transactional across tables? If not
             | then in what sense are they transactional?
             | 
             | Because to this day I still have to deal with locking
             | problems unless my changes are within a narrow window: last
             | column, no on-update or on-delete, not changing type, etc.
        
         | bombcar wrote:
         | I've wanted to try post geese but have never really had a
         | chance - everything I do is "prepackaged" and things like
         | Wordpress or Confluence really don't seem to care if it is
         | MySQL or Postgres.
        
           | [deleted]
        
           | noir_lord wrote:
           | You should definitely have a gander.
        
             | rrauenza wrote:
             | Especially at the possibilities of a lack of down time.
        
             | dismalpedigree wrote:
             | This really goosed my energy levels this morning!
        
           | blowski wrote:
           | I feel that pain! I knew MySQL so well that it felt risky to
           | use a different database, yet if I used it on a non-serious
           | project, how would I get real-world experience?
           | 
           | I learned a lot from a book from one of the core contributors
           | to Postgres - https://theartofpostgresql.com/. It has actual
           | real world examples with realistic datasets to experiment
           | with.
        
             | Proziam wrote:
             | I wish this resource existed (or that I knew it existed if
             | it did) years ago.
             | 
             | I always seem to learn about the things that would have
             | made my life easier after I've already done things the hard
             | way.
        
               | denysvitali wrote:
               | No pain no gain I guess. Learning things in a difficult
               | way is not a waste of time tbh
        
       | frankietaylr wrote:
       | Has Postgres architecture changed since Postgres 9.2 in terms of
       | the inefficiencies mentioned in the article?
        
         | pizza234 wrote:
         | The main point, clustered vs. nonclustered indexing, is
         | architectural, and not inherently inefficient; it depends on
         | the use case.
         | 
         | "Highly advanced" databases give both options, but AFAIK,
         | MySQL/PGSQL will likely not offer this, at least for a very
         | long time, since it requires radical changes.
        
           | evanelias wrote:
           | On the one hand, MySQL has offered this for two decades, by
           | virtue of pluggable storage engines being core to its design.
           | Some storage engines use clustered indexes and some do not.
           | The user can decide which one matches their use-case; very
           | large companies can design their own custom special-purpose
           | storage engines; etc.
           | 
           | On the other hand, mixing storage engines in a single db
           | instance has operational downsides (especially re: crash-safe
           | replication). And InnoDB is by far the dominant storage
           | engine, and is probably unlikely to offer nonclustered
           | indexing, so from that perspective I agree with your point.
        
             | Tostino wrote:
             | It'll be interesting to see how things shake out when some
             | of the other implementations using postgres's pluggable
             | storage API start maturing. I wonder if it'll have some of
             | the same operational downsides that mixing storage in MySQL
             | has.
        
         | glogla wrote:
         | Sure, but it does not really matter.
         | 
         | Uber has not switched from _Postgres used as RDBMS_ to _MySQL
         | used as RDBMS_ , they switched from _Postgres used as RDBMS_ to
         | _MySQL used as key-value storage layer of homegrown sharded
         | non-relational database_.
         | 
         | This has pretty much no bearing on anyone using Postgres or
         | MySQL in reasonable way.
        
           | throwdbaaway wrote:
           | Exactly. I think the prior HN discussions failed to call out
           | how painful it is to do any sort of schema migration against
           | a big InnoDB table [1][2].
           | 
           | Well known MySQL uses such as Facebook TAO and this Uber
           | Schemaless are typically abstractions built on top of MySQL,
           | which means the schemas are pretty much static, and they
           | don't feel the schema migration pain.
           | 
           | For a typical RoR startup that relies on a RDBMS, please,
           | stay away from MySQL.
           | 
           | [1] Yes, I know about the INSTANT ADD COLUMN patch from
           | Tencent Games that landed in MySQL 8.0, and which has had
           | major bug fixes in at least 8.0.14 and 8.0.20.
           | 
           | [2] A side effect is that MySQL now has a thriving ecosystem
           | of schema migration tools (pt-osc, lhm, gh-ost), while
           | Postgres has none, and there are situations where there is
           | indeed no choice but to rewrite the table, e.g. changing a
           | column type from int to bigint.
        
             | evanelias wrote:
             | To echo and add to rwultsch's sibling comment:
             | 
             | * Facebook had extremely frequent schema changes, and
             | powerful declarative schema management automation to
             | support this
             | 
             | * The TAO (or more correctly "UDB") use-case supported
             | using many separate tables, not one giant generic key/value
             | table as people seem to assume
             | 
             | * The non-UDB MySQL use-cases at Facebook, in combination,
             | are still larger than the vast, vast majority of all other
             | companies' databases. These non-UDB databases use a wide
             | range of MySQL's functionality. The frequent claims that
             | "Facebook used MySQL just as a dumb K/V store" are
             | absolutely incorrect and have never been correct.
        
             | rwultsch wrote:
             | FB had plenty of schema changes. I know, I wrote software
             | to push them out. The important concepts for pushing schema
             | at scale became part of Skeema.
        
           | alberth wrote:
           | What about Postgres native key-store called HStore?
           | 
           | https://www.postgresql.org/docs/8.3/hstore.html
           | 
           | I love Postgres just as much as anyone but Uber use case
           | still seemed to be a better fit for MySQL. I was hopeful this
           | would kickstart a renewed focus on features / architecture
           | within thr Postgres community and I'm not certain anything
           | resulted from this. Hope to be wrong obviously.
        
         | user5994461 wrote:
         | The replication was redone around that time (not sure which
         | version exactly). It's still working on the same principles
         | though, sending queries and redoing them on each replica.
         | 
         | Before in short, the WAL was sent every minute and always 10MB
         | even if there were no changes. Now it's more adaptive, actually
         | doing nothing when they are no changes, and picking up quicker
         | when changes begin.
         | 
         | I am surprised they don't mention this point because the
         | replication was really unusable in PostgreSQL.
         | 
         | There are still spikes (write amplification) and other drawback
         | from this design, but at least it doesn't shit itself under no
         | activity.
        
           | Quekid5 wrote:
           | I don't understand this. Why would it be sending queries and
           | redoing on a replica _and_ sending the WAL? Just sending the
           | WAL would seem to be sufficient, or alternatively: sending
           | queries would be redundant if you just send the WAL and apply
           | directly at the secondaries.
        
             | amenonsen wrote:
             | It doesn't make sense to you because everything in the
             | comment you're replying to is wrong.
             | 
             | Neither log shipping (copying WAL files one by one) nor
             | streaming replication (sending a stream of WAL) works by
             | sending queries. WAL segments are 16MB by default, and the
             | default archive_timeout is 0, not 1 minute (and the archive
             | timeout is not applicable to streaming replication anyway).
             | There is also nothing "adaptive" about the replication--
             | when there is no traffic, there will be ~no changes, and
             | when there are changes, they will be sent to the replica.
             | 
             | I don't understand what the comment is suggesting used to
             | happen in periods of no activity that made replication
             | unusable, but it is also probably incorrect, and has
             | nothing to do with the write amplification problem.
        
               | Quekid5 wrote:
               | Thank you for confirming my suspicions :).
        
             | chousuke wrote:
             | In streaming (physical) replication, PostgreSQL sends the
             | WAL only, and it's applied on the replica; no "sending of
             | queries" is involved, or even possible; with physical
             | replication, the secondary has to keep itself identical
             | with the primary, otherwise replication will fail. This is
             | why you can't use physical replication across major
             | versions, since they can't be bit-for-bit identical.
             | 
             | In more recent versions there's "logical replication",
             | which sort of "sends the queries", in that the secondary
             | node has its own database state that does not have to be
             | exactly identical with the primary, allowing for
             | replication across major versions.
             | 
             | In my opinion though, unless you really _need_ logical
             | replication for some reason, stick with streaming
             | replication. It 's much easier to understand and there are
             | fewer failure modes.
        
               | amenonsen wrote:
               | > In more recent versions there's "logical replication",
               | which sort of "sends the queries"
               | 
               | What it sends is not the queries, but a logical
               | description of the changes to each row that were made by
               | running the query. So an UPDATE that changes N rows would
               | generate N changes to be applied to the corresponding
               | rows (usually identified by primary key) on the logical
               | replica, not a single update that had to be "re-
               | executed".
        
       | moonbug wrote:
       | Article about Uber are always instructive: read and then do the
       | exact opposite.
        
       | chovybizzass wrote:
       | because its simpler? i found postgres to be overly complicated
       | compared to MySQL/Maria
        
         | DaiPlusPlus wrote:
         | Define: "complicated"
        
           | corty wrote:
           | PostgreSQL: "your date 2020-02-31 isn't a date, fix that"
           | 
           | MySQL: "2020-02-31? Whatever man, I'll just enter
           | something..."
        
             | evanelias wrote:
             | MySQL's default settings reject invalid dates for over 5
             | years now, since MySQL 5.7.
             | 
             | How long are people going to keep repeating this complaint?
             | Literally every version of MySQL and MariaDB that allows
             | invalid dates by default (MySQL 5.6 and older, MariaDB 10.1
             | and older) has reached end-of-life for upstream support
             | from the vendor!
        
               | [deleted]
        
             | consp wrote:
             | Considering the US uses a weird date format, I definitely
             | prefer the former in combination with input sanitation
             | forcing you to thing about your actions before assuming the
             | database will fix it for you.
        
               | corty wrote:
               | Agreed. As with strong typing in programming languages, I
               | do prefer a database to be strict in rejecting invalid
               | inputs. MySQL does two bad things here: It accepts an
               | invalid input plus it interprets it creatively, producing
               | something the user most probably didn't intend. In that
               | respect, MySQL is almost as bad as Excel creatively
               | "interpreting" dates.
               | 
               | Another example of a database doing improper things would
               | be Oracle mixing up the empty string with NULL. In
               | Oracle, both are the same...
               | 
               | MySQL has a few more of those gotchas, e.g. regarding
               | broken charsets (UTF-8 isn't 'utf8', it is 'utf8mb4',
               | 'utf8' is an alias for 'utf8mb3' which is a broken
               | subset). I wouldn't use MySQL for any data that was
               | important to get back consistently. However, since Uber
               | seems to be using some schemaless "we don't care"-layer
               | anyways, that point is moot for the original article.
        
               | vinger wrote:
               | utf8mb3 isn't a broken subset. It includes characters in
               | the basic multilingual plane which is fine for 99% of
               | cases.
               | 
               | uft8mb4 uses more bytes than necessary for most. You get
               | 255 varchar limits with uft8mb3, I think you only get 192
               | characters with uft8mb4.
        
               | Alexendoo wrote:
               | uft8mb4 does not use more bytes than utf8mb3, for
               | anything representable by utf8mb3 the size in uft8mb4 is
               | identical. Anything that would be 4 bytes in uft8mb4 is
               | not able to be stored in utf8mb3
               | 
               | utf8mb3 is definitely a broken subset, it's deprecated at
               | the very least
        
               | bbarnett wrote:
               | https://dev.mysql.com/doc/refman/8.0/en/sql-
               | mode.html#sql-mo...
               | 
               | Much of your MySQL complaint is not a MySQL issue, but a
               | config issue.
               | 
               | And yes, powerful config options are good, not bad.
        
               | BitPirate wrote:
               | One can still complain about mysqls dumb defaults.
               | 
               | https://dev.mysql.com/doc/refman/8.0/en/innodb-
               | parameters.ht...
               | 
               | "Hey, let's just not act transactional on a timeout by
               | default"
        
               | evanelias wrote:
               | That default is sensible. You're making it sound like
               | this setting applies to all "timeouts" regardless of
               | type. That is not the case.
               | 
               | This setting applies to lock-wait timeouts. The default
               | value allows applications to decide on the correct course
               | of action: either re-try just the statement that timed
               | out (without having to re-do the previous parts of the
               | transaction), or rollback the transaction.
               | 
               | The application still receives an error on the timeout
               | regardless of this setting. The database doesn't
               | automatically commit the previous statements in the
               | transaction regardless of this setting.
        
             | exikyut wrote:
             | Ooh, sounds like PHP 5!
        
             | rubyist5eva wrote:
             | The mysql behavior terrifies me because eventually you're
             | going to end up with something you didn't expect and it's
             | going to be a pain in the butt to track it down.
        
             | isoprophlex wrote:
             | That doesn't exactly sell me on MySQL, sounds like a recipe
             | for disaster
        
               | corty wrote:
               | There is more, however not everything is still valid or
               | valid for every table type in MySQL: https://sql-
               | info.de/mysql/gotchas.html
        
               | evanelias wrote:
               | "This page deals with issues related to MySQL 4.1 and
               | earlier, not 5.0"
               | 
               | MySQL 5.0 came out over _15 years ago_.
        
             | chris_wot wrote:
             | That's not decreasing complexity, that's just shifting it
             | out of the database layer.
             | 
             | See https://news.ycombinator.com/item?id=26272084
        
             | martimarkov wrote:
             | Oh yeah old complication of sanitising input...
             | 
             | It's like asking you for an int and you entering 2.358 and
             | saying that's just simple.
        
           | sschueller wrote:
           | Recovering a split brain Galera cluster. Fun times...
        
           | eznzt wrote:
           | Just creating a new user is annoying enough.
           | 
           | Permissions are also much more complex.
           | 
           | What the hell are schemas?
        
             | looperhacks wrote:
             | How is creating a new user complicated? The normal CREATE
             | USER is all I've ever needed to create a new user in
             | postgres (assuming I don't have set up the pg_hba so that I
             | need to allow every user separately)
        
               | mgkimsal wrote:
               | Most tutorials/instructions I read have you use
               | "createuser" command from the system shell. But... you
               | have to be able to switch to a system 'postgres' user
               | _first_ , which ... perhaps you don't have privileges to
               | do, or need sudo access or whatnot.
               | 
               | If you _can_ install postgres, _connect to it directly
               | with some sort of root identity_ , then immediately
               | create users and databases (as is the case with pretty
               | much every mysql walk-through I've ever seen), it's not a
               | default.
               | 
               | https://wiki.postgresql.org/wiki/First_steps
               | 
               | "The default authentication mode is set to 'ident' which
               | means a given Linux user xxx can only connect as the
               | postgres user xxx."
               | 
               | This alone is a complicated/confusing thing, because it's
               | mixing system accounts with the db server accounts/access
               | - and none of that is obvious, and doesn't quite map to
               | how other databases handle things. I've never had to have
               | matching system account names for user access in MSSQL,
               | for example.
        
               | chousuke wrote:
               | With MySQL, you'll still have to switch to root to
               | connect by default? I honestly don't remember, since it's
               | been ages since I set up MySQL manually.
               | 
               | If MySQL actually allows administrative access out-of-
               | the-box without any kind of special authorization, then
               | that's a _terribly_ insecure default.
               | 
               | With PostgreSQL, you have to switch to the superuser to
               | configure things further because that's the only sane
               | default you can have on an unconfigured system. If you
               | can run commands as the user PostgreSQL is running as,
               | you are "safe" to trust, and PostgreSQL will let you in.
               | 
               | UNIX ident authentication is also is extremely convenient
               | for local applications, since you don't even have to have
               | a password for the account, or make the PostgreSQL server
               | network-accessible in any way.
               | 
               | Oracle can do the same thing, and so can MySQL,
               | apparently (with IDENTIFIED VIA unix_socket).
               | 
               | MySQL user management has its own complexity in that you
               | have to manage "user@address" identities, and the same
               | user at different addresses or auth methods can have
               | different permissions. How's that "simple"? With
               | PostgreSQL, your users will at least map to the same user
               | regardless of how they authenticate themselves.
        
               | mgkimsal wrote:
               | "With MySQL, you'll still have to switch to root to
               | connect by default? "
               | 
               | You connect with a root account from any account, and
               | when installed, the root account password is part of the
               | setup process.
               | 
               | "and the same user at different addresses or auth methods
               | can have different permissions"....
               | 
               | It joe@localhost and joe@remotehost don't have to be 'the
               | same user' in that they're not tied to a system account
               | in any way.
               | 
               | Granting different privileges to joe@local and joe@remote
               | based on where they're coming from isn't necessarily
               | "simple", but no one claimed it was. My own response was
               | validating that PostgreSQL user setup was somewhat
               | confusing.
               | 
               | EDIT: Bringing up "mysql sucks" points when I was
               | explaining how PostgreSQL 'create user' stuff can be
               | confusing just reeks of whataboutism.
        
               | chousuke wrote:
               | I'm just not sure how it's confusing? PostgreSQL users
               | aren't "tied" to system accounts either. You can have any
               | number of PostgreSQL users that have no system
               | equivalent.
               | 
               | In fact, the process seems to be exactly the same as with
               | MySQL: I just tried installing the MariaDB server (dnf
               | install mariadb-server), and it didn't prompt me for an
               | admin user; instead, I can directly connect to the
               | database as root using sudo, so in this case it appears
               | to be doing the exact same thing that PostgreSQL does.
               | 
               | It just happens to be that by default the "postgres"
               | superuser has a corresponding "postgres" system user that
               | can log in via OS authentication, so you need to switch
               | to the postgres user instead of root.
               | 
               | EDIT: Maybe some of the confusion stems from the fact
               | that the documentation you linked seems to assume that
               | the database is created according to convention to run as
               | the "postgres" user (as it usually is). If your user
               | didn't have the required permission to switch to the
               | postgres user, they wouldn't be able to install the
               | database as said user in the first place.
               | 
               | If you install PostgreSQL as your own user (which is not
               | a good idea if you have any other option), you will not
               | need to switch users as you will obviously have access to
               | the database files and can do whatever you want, anyway.
        
               | rleigh wrote:
               | This depends entirely on how you want to set up and run
               | the system. For packaged versions running as a system
               | service with a dedicated service user, this is absolutely
               | correct. And I would argue, it's a pretty sensible
               | default arrangement.
               | 
               | But... there's absolutely nothing prohibiting you from
               | running initdb as a regular user and then running the
               | main daemon with your credentials. You are then the
               | database owner and superuser. This type of thing is
               | really useful for integration testing. But it's
               | potentially useful when you don't care about the
               | multiuser aspect and just want to have it run.
        
               | vinger wrote:
               | Does postgres still require a separate user to access the
               | db.
               | 
               | I remember this being a limitation in 2008.
               | 
               | That forced user creation always pushed me to mysql
               | because I hate having separate users for each service
               | because you still have to manage and account for these
               | extra accounts.
        
             | isoprophlex wrote:
             | "What the hell is scoping? Why not put every variable in
             | global scope? Much less complex"
        
             | paulryanrogers wrote:
             | Schemas are SQL standard namespaces within a DB. You can
             | join between them. Mysql allows joining across databases,
             | so it doesn't implement schemas.
        
             | tinus_hn wrote:
             | It would be great if there was some management GUI for
             | these tasks so you don't have to look up the syntax for
             | these things that in many deployments you only do once.
        
               | mvanbaak wrote:
               | pgadmin ;-P
        
               | tinus_hn wrote:
               | This actually looks pretty reasonable, I am going to look
               | into it. First I need to figure out how to open up the
               | server for connections but still limit it, though.
        
               | jhauris wrote:
               | Look at the pg_hba.conf file (probably something like
               | /etc/postgresql/<version>/main/pg_hba.conf).
               | 
               | https://www.postgresql.org/docs/current/auth-pg-hba-
               | conf.htm...
        
               | tinus_hn wrote:
               | Perhaps it's better to only listen to localhost and
               | connect through a SSH proxy connection
        
               | rleigh wrote:
               | Or DataGrip if you already have a JetBrains licence.
        
               | rektide wrote:
               | both kubernetes operators have some ok user management
               | built in
        
             | lucian1900 wrote:
             | What MySQL calls databases are actually schemas. They're
             | even aliases as such.
             | 
             | MySQL doesn't have multiple SQL databases, you've been
             | using multiple schemas.
        
             | oftenwrong wrote:
             | Schemas are similar to databases in mysql. They serve as a
             | namespace. In mysql you can have a database `foo` and a
             | table `foo.bar`. In postgres you can have a schema `foo`
             | and a table `foo.bar`. In postgres you can have multiple
             | databases in a cluster, and multiple schemas within each of
             | those databases.
        
               | eznzt wrote:
               | What you are basically saying is that yes, they are much
               | more complex.
        
               | desas wrote:
               | It's another layer, you don't have to use it. If you
               | pretend schemas don't exist you basically never know they
               | do, unless you go looking for complexity in the postgres
               | bowels.
        
               | Tostino wrote:
               | To be honest, it's better to think of MySQL db = Postgres
               | schema, because I'm MySQL you can do cross db queries and
               | there is no intermediary schema level, and in Postgres
               | you can do cross schema queries, but not cross db
               | queries.
        
               | chris_wot wrote:
               | No, what you mistake for "complexity" seems to be your
               | general unfamiliarity of what a schema is. In fact, when
               | you understand what schemas are, then it actually makes a
               | lot more sense.
        
               | rleigh wrote:
               | They are not complex, and they are entirely optional.
               | They are just a namespace. You are free to never, ever
               | use them.
               | 
               | For a company the size of Uber, I don't think spending
               | five minutes reading the documentation for createuser is
               | a significant burden to deployment. PostgreSQL is very
               | easy to deploy.
        
       | [deleted]
        
       | petergeoghegan wrote:
       | I committed a patch that added a mechanism I called "bottom-up
       | index deletion" recently:
       | 
       | https://www.postgresql.org/docs/devel/btree-implementation.h...
       | 
       | https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
       | 
       | Bottom-up deletion is specifically designed to ameliorate what
       | the blog post refers to as "write amplification". Testing has
       | shown that it's very effective with many workloads.
        
         | Tostino wrote:
         | Just wanted to say how impressed I was with this solution and
         | the results it achieved when I was following the development on
         | -hackers.
        
           | petergeoghegan wrote:
           | Thanks
        
           | syspec wrote:
           | What is -hackers?
        
             | petergeoghegan wrote:
             | The Postgres community mailing list for development work --
             | pgsql-hackers.
        
       | KingOfCoders wrote:
       | 2016.
        
       | junon wrote:
       | There's some historical context for this article. 2016 was a year
       | of RAPID growth for Uber. There was a running statistic
       | internally that your employee ID would be at the median point
       | just 6 months after being hired.
       | 
       | They were trying to hire (and poach) just about anyone they could
       | around this time. Therefore, these articles are... very shiny,
       | compared to the actual tech applied internally (note that even
       | though Uber is referred to in the third person here, this is on
       | uber.com and written by an Uber employee).
       | 
       | I worked at Uber for a year. Schemaless was... meh. Nobody really
       | liked using it, nobody really understood it, and you weren't
       | really allowed to host your own instance - you had to have
       | another internal team do it for you, which didn't help the
       | "understanding" problem.
       | 
       | It smelled distinctly of "not invented here" syndrome. A number
       | of things inside Uber worked that way - the culture was so
       | competitive and brutal, performance reviews were always a
       | massacre, so everyone was trying to outshine their peers (or
       | outright climb on their backs, etc).
       | 
       | This resulted in a LOT of "tech" being "invented" that 1:1 did
       | something already prominent in open-source or was already an
       | enterprise solution (probably cheaper than paying engineers to do
       | it) but since actually achieving it and having your name on it
       | meant you would look better for a promotion or a bonus or
       | whatever over a colleague meant it was worth it to the individual
       | to reinvent the wheel. Rinse and repeat over and over again.
       | 
       | I'm not an enemy of reinventing the wheel, mind you. But only if
       | the new wheel works significantly better than the old one. This
       | was rarely the case at Uber.
       | 
       | Postgres was still used somewhat commonly at Uber when I was
       | there, but they were really pushing for Schemaless internally. It
       | felt very overkill for just about everything outside the platform
       | teams and was always, without fail, a massive pain to deal with.
       | 
       | Don't be fooled by these Uber engineering articles. This was PR
       | to bolster up their OSS image to outsiders to help with hiring
       | and poaching at the time. Things internally looked very
       | different.
        
         | midrus wrote:
         | I think this applies to most companies. What they write in
         | their blogs is a shiny, optimistic, limited view of the best
         | part of their best system or similar. Once inside, things are
         | never that great.
         | 
         | I myself was very ashamed of a company I worked for (also SF
         | based) blog post... even the author of the post was a very well
         | known open source maintainer of many libraries of a very
         | popular programming language. Reading the posts in the blog was
         | like.... I cannot believe we lie this big... internally things
         | were just crap, and what the blog post made look like it was
         | the norm, was just a side project of this person.
         | 
         | So, never trust companies blog posts by default.
        
           | fma wrote:
           | Can we maybe get a list of company blogs that are legit? As
           | in they practice what they preach?
           | 
           | Facebook, Amazon, Netflix Google, Microsoft comes to the top
           | of my head but someone is free to burst my bubble.
        
             | staticassertion wrote:
             | Cloudflare's blogs seem far less markety and far more about
             | sharing interesting technical wins.
        
               | noir_lord wrote:
               | I've dealt with cloudflare people semi-regulary recently
               | and they seem to know their shit, they also (from an
               | outside perspective) seem to have a really good culture.
               | 
               | Questions get answered quickly and with a level of detail
               | appropriate to the person asking.
               | 
               | Really impressive given the usual "Enterprise(TM)" level
               | of support from most service suppliers.
        
             | spiralx wrote:
             | Figma and Instagram have had good articles on their blogs,
             | although I'm not a regular follower of either.
        
           | mlthoughts2018 wrote:
           | I agree. I once worked for a company that wrote a blog post
           | on reinventing and upgrading A/B tests, but the ugly truth
           | was that the company couldn't run an A/B test to save its
           | life. Every A/B test was a disaster, there were fragmented
           | different frameworks for A/B tests in different parts of the
           | product, with inconsistent and unreliable data, and the core
           | clickstream ingestion system that was the foundation for any
           | possible way of testing would crash and go down for hours
           | every few weeks, and product management would just silently
           | ignore any effects of missing data or correlation between
           | failures in different product features. Even though we had a
           | team of statistical researchers, they were treated as if
           | legit stats 101 concerns were just ivory tower academic hair
           | splitting and often were silently omitted from being in A/B
           | test design, recap or decision meetings, which were instead
           | run by product managers with no stats training.
           | 
           | I remember interviewing a candidate once who said he was
           | excited about the role because of that A/B testing blog post,
           | and I just thought - geez what a completely soulless bait and
           | switch ploy.
        
         | supergirl wrote:
         | is schemaless still the db now?
        
           | junon wrote:
           | No idea, but I would wager so. The scale it was being used at
           | would be hard to replace, and they'd have little reason to do
           | so.
           | 
           | EDIT: To be clear, there is no "the DB" at Uber. It was the
           | main database flavor that the larger teams used, but they
           | used everything at Uber, from MySQL to postgres, to mongo,
           | etc. Sometimes with things on top, sometimes directly. For
           | more analytical/financial things, they used
           | HBase/Hadoop/Cassandra, or even older things like old IBM
           | database tech from the 80's. Really weird mix of stuff, and
           | it really depended on which high-profile engineers they hired
           | in which part of the company.
        
       | arnejenssen wrote:
       | Does Uber use event sourcing?
        
         | exhaze wrote:
         | Yes
        
       | lumost wrote:
       | TBH, everything that was listed as a complaint could be a
       | complaint for nearly any transactional RDBMS. For workloads that
       | require heavy always on replication and availability RDBMS's
       | haven't been the go to solution for a long time vs. distributed
       | DBs. Changing from Postgres to Mysql or MySQL to Postgres (or
       | even Oracle) won't really buy you much if you're running into
       | these issues.
       | 
       | Even this one
       | 
       | > The bug we ran into only affected certain releases of Postgres
       | 9.2 and has been fixed for a long time now. However, we still
       | find it worrisome that this class of bug can happen at all.
       | 
       | (rare/specific) Data Corruption bugs around master-promotion and
       | handoff occur in every major DB. MySQL is no different, and I've
       | personally had to track down issues in a few popular products. If
       | you run thousands of copies of a piece of software with different
       | workloads and hardware configurations... you're going to find
       | bugs.
       | 
       | After all - how many DBs passed Jepsen on the first shot!
        
       | ignoramous wrote:
       | Previous discussions:
       | 
       | 2016: https://news.ycombinator.com/item?id=12166585
       | 
       | 2018: https://news.ycombinator.com/item?id=17280239
       | 
       | Community responses:
       | 
       | - https://news.ycombinator.com/item?id=12216680
       | 
       | - https://news.ycombinator.com/item?id=12179222
        
         | chris_wot wrote:
         | The response was:
         | 
         | https://news.ycombinator.com/item?id=14222721
        
           | asah wrote:
           | Even though 99.9% of applications will never run into Uber's
           | issues, it's been 4 years and 4 major versions later, and I'd
           | love to review these complaints and see if they still apply
           | to PG 13.
        
             | Tostino wrote:
             | Wait until 14 for that comparison and it'll look much
             | better. Bottom up index deletion helped solve some of the
             | write amplification issues.
        
               | petergeoghegan wrote:
               | There is also index deduplication in Postgres 13 and the
               | B-Tree enhancements in Postgres 12. All of these
               | enhancements significantly improved the situation for
               | workloads affected by what the blog post calls write
               | amplification. (I myself call this phenomenon index
               | version churn, since it is more descriptive and has less
               | baggage.)
               | 
               | I was the author of all of the above, including the
               | Postgres 14 work you mentioned (though Anastasia
               | Lubennikova was the primary author of index
               | deduplication). To me it feels like one very large
               | project -- the effects are cumulative, and each major
               | Postgres version had B-Tree work that built on the last
               | release in one way or another.
        
         | dang wrote:
         | Thanks! Here's an annotated list of all those (plus chris_wot's
         | and a couple others):
         | 
         |  _Why Uber Engineering Switched from Postgres to MySQL (2016)_
         | - https://news.ycombinator.com/item?id=17280239 - June 2018 (47
         | comments)
         | 
         |  _Re: Why Uber Engineering Switched from Postgres to MySQL_ -
         | https://news.ycombinator.com/item?id=12179222 - July 2016 (67
         | comments)
         | 
         |  _Why Uber Engineering Switched from Postgres to MySQL_ -
         | https://news.ycombinator.com/item?id=12166585 - July 2016 (294
         | comments)
         | 
         |  _Thoughts on Uber's List of Postgres Limitations_ -
         | https://news.ycombinator.com/item?id=12216680 - Aug 2016 (103
         | comments)
         | 
         |  _A PostgreSQL response to Uber [pdf]_ -
         | https://news.ycombinator.com/item?id=14222721 - April 2017 (82
         | comments)
         | 
         |  _Why we lost Uber as a user_ -
         | https://news.ycombinator.com/item?id=12201353 - Aug 2016 (285
         | comments)
         | 
         |  _Uber 's Move Away from PostgreSQL_ -
         | https://news.ycombinator.com/item?id=12223216 - Aug 2016 (15
         | comments)
        
       | burnthrow wrote:
       | 2016
        
       | huy-nguyen wrote:
       | How does Schemaless compare to Vitess (https://vitess.io/)?
        
       | villgax wrote:
       | Lol, a db that squirms at unicode/utf-8 out of the box?
        
       ___________________________________________________________________
       (page generated 2021-02-27 23:03 UTC)