[HN Gopher] Why Uber Engineering Switched from Postgres to MySQL...
___________________________________________________________________
Why Uber Engineering Switched from Postgres to MySQL (2016)
Author : syspec
Score : 110 points
Date : 2021-02-27 07:34 UTC (15 hours ago)
(HTM) web link (eng.uber.com)
(TXT) w3m dump (eng.uber.com)
| blowski wrote:
| I spent a whole decade saying "Why do I need Postgres? MySQL is
| fine."
|
| Started using Postgres a couple of years ago, and I now can't
| believe I ever lived without window functions, native arrays,
| custom types, etc.
| nkozyra wrote:
| My experience as well. I first got the itch when mysql lagged
| on postgis. Found so many features that have saved me so much
| time. More sensible defaults, too.
| mmcgaha wrote:
| Arrays in Postgres are my guilty pleasure. I know I shouldn't
| use them but I just cannot help myself.
| lucian1900 wrote:
| They're great for denormalisation, which is often an
| appropriate trade off.
| bitexploder wrote:
| Even storing JSON blobs isn't the hit it used to be. It's
| not relational, but you can index on fields in the JSON and
| query it effectively. The Postgres array and other types
| are great for stuff at the edges of your database or when
| you know you won't need to build relations into or on that
| data. RDBMS like Postgres really provide a powerful,
| powerful technology for managing data.
| colanderman wrote:
| Yes! Have one-to-many data that you know you will usually
| always grab all in one go, and won't ever participate in a
| relation? Arrays are the way to go. JSONB can be similarly
| highly appropriate. They can greatly reduce the number of
| disk reads needed for certain workloads.
|
| Don't forget that both can be indexed in Postgres! And the
| indexes are more powerful than what you can do with the
| equivalent relational layout, as they support efficient
| subset queries.
| sk5t wrote:
| Same goes for hstore and the @> style operators.
| dcposch wrote:
| MySQL has window functions and native JSON now.
| darksaints wrote:
| Postgres was my first, and the Postgres docs were foundational
| for someone like me.
|
| Tried MySQL a couple years later, and every day I used it I
| found a new reason to never use it again.
| spiralx wrote:
| I remember reading the MySQL developers justifications for
| why transactions weren't important along with articles such
| as these back in the 00s
|
| https://sql-info.de/mysql/gotchas.html
|
| https://fromdual.com/mysql-limitations
|
| and deciding a database that treats 1/0 as NULL and allows
| inserting February 31st as a valid date wasn't worth
| bothering with.
| axegon_ wrote:
| If you liked the docs, you should definitely check out the
| source code. It's not just poetry by C standards, it's poetry
| by Shakespearean standards. Hands down the best code I've
| ever seen. What makes it more astonishing is that it's a
| product of relatively small community scattered around the
| globe.
| tpetry wrote:
| The best part is transactional ddl statements. You can do your
| database migration in a transaction, if something fails the
| transaction is rolled back compared to an invalid state with
| mysql.
| chousuke wrote:
| Being used to PostgreSQL, this one actually bit me once
| during a production upgrade. A database migration failed due
| to a Galera transaction size limit that I unfortunately
| hadn't caught testing the migrations on a single database,
| and I had to restore from the pre-upgrade backup before
| resolving the issue and continuing. It wasn't a major issue
| (the upgrade finished well within the acceptable downtime
| window), but until then I had assumed that the migrations
| would be transactional because of course they should be, it's
| a database!
|
| Now I know better than to assume everything you do in a
| database is transactional. :P
| spuz wrote:
| Which database are you talking about? Postgres or Mysql?
| llarsson wrote:
| Something that happened when they were on MySQL, before
| PostgreSQL.
|
| The phrasing at the start and "Galera" mention shows
| this.
| derekperkins wrote:
| That was added a couple years ago in MySQL 8, as were window
| functions and arbitrary checks. The two databases continue to
| fill in the gaps that were historically reasons to choose one
| over the other.
| paulryanrogers wrote:
| Are Mysql 8 DDL changes transactional across tables? If not
| then in what sense are they transactional?
|
| Because to this day I still have to deal with locking
| problems unless my changes are within a narrow window: last
| column, no on-update or on-delete, not changing type, etc.
| bombcar wrote:
| I've wanted to try post geese but have never really had a
| chance - everything I do is "prepackaged" and things like
| Wordpress or Confluence really don't seem to care if it is
| MySQL or Postgres.
| [deleted]
| noir_lord wrote:
| You should definitely have a gander.
| rrauenza wrote:
| Especially at the possibilities of a lack of down time.
| dismalpedigree wrote:
| This really goosed my energy levels this morning!
| blowski wrote:
| I feel that pain! I knew MySQL so well that it felt risky to
| use a different database, yet if I used it on a non-serious
| project, how would I get real-world experience?
|
| I learned a lot from a book from one of the core contributors
| to Postgres - https://theartofpostgresql.com/. It has actual
| real world examples with realistic datasets to experiment
| with.
| Proziam wrote:
| I wish this resource existed (or that I knew it existed if
| it did) years ago.
|
| I always seem to learn about the things that would have
| made my life easier after I've already done things the hard
| way.
| denysvitali wrote:
| No pain no gain I guess. Learning things in a difficult
| way is not a waste of time tbh
| frankietaylr wrote:
| Has Postgres architecture changed since Postgres 9.2 in terms of
| the inefficiencies mentioned in the article?
| pizza234 wrote:
| The main point, clustered vs. nonclustered indexing, is
| architectural, and not inherently inefficient; it depends on
| the use case.
|
| "Highly advanced" databases give both options, but AFAIK,
| MySQL/PGSQL will likely not offer this, at least for a very
| long time, since it requires radical changes.
| evanelias wrote:
| On the one hand, MySQL has offered this for two decades, by
| virtue of pluggable storage engines being core to its design.
| Some storage engines use clustered indexes and some do not.
| The user can decide which one matches their use-case; very
| large companies can design their own custom special-purpose
| storage engines; etc.
|
| On the other hand, mixing storage engines in a single db
| instance has operational downsides (especially re: crash-safe
| replication). And InnoDB is by far the dominant storage
| engine, and is probably unlikely to offer nonclustered
| indexing, so from that perspective I agree with your point.
| Tostino wrote:
| It'll be interesting to see how things shake out when some
| of the other implementations using postgres's pluggable
| storage API start maturing. I wonder if it'll have some of
| the same operational downsides that mixing storage in MySQL
| has.
| glogla wrote:
| Sure, but it does not really matter.
|
| Uber has not switched from _Postgres used as RDBMS_ to _MySQL
| used as RDBMS_ , they switched from _Postgres used as RDBMS_ to
| _MySQL used as key-value storage layer of homegrown sharded
| non-relational database_.
|
| This has pretty much no bearing on anyone using Postgres or
| MySQL in reasonable way.
| throwdbaaway wrote:
| Exactly. I think the prior HN discussions failed to call out
| how painful it is to do any sort of schema migration against
| a big InnoDB table [1][2].
|
| Well known MySQL uses such as Facebook TAO and this Uber
| Schemaless are typically abstractions built on top of MySQL,
| which means the schemas are pretty much static, and they
| don't feel the schema migration pain.
|
| For a typical RoR startup that relies on a RDBMS, please,
| stay away from MySQL.
|
| [1] Yes, I know about the INSTANT ADD COLUMN patch from
| Tencent Games that landed in MySQL 8.0, and which has had
| major bug fixes in at least 8.0.14 and 8.0.20.
|
| [2] A side effect is that MySQL now has a thriving ecosystem
| of schema migration tools (pt-osc, lhm, gh-ost), while
| Postgres has none, and there are situations where there is
| indeed no choice but to rewrite the table, e.g. changing a
| column type from int to bigint.
| evanelias wrote:
| To echo and add to rwultsch's sibling comment:
|
| * Facebook had extremely frequent schema changes, and
| powerful declarative schema management automation to
| support this
|
| * The TAO (or more correctly "UDB") use-case supported
| using many separate tables, not one giant generic key/value
| table as people seem to assume
|
| * The non-UDB MySQL use-cases at Facebook, in combination,
| are still larger than the vast, vast majority of all other
| companies' databases. These non-UDB databases use a wide
| range of MySQL's functionality. The frequent claims that
| "Facebook used MySQL just as a dumb K/V store" are
| absolutely incorrect and have never been correct.
| rwultsch wrote:
| FB had plenty of schema changes. I know, I wrote software
| to push them out. The important concepts for pushing schema
| at scale became part of Skeema.
| alberth wrote:
| What about Postgres native key-store called HStore?
|
| https://www.postgresql.org/docs/8.3/hstore.html
|
| I love Postgres just as much as anyone but Uber use case
| still seemed to be a better fit for MySQL. I was hopeful this
| would kickstart a renewed focus on features / architecture
| within thr Postgres community and I'm not certain anything
| resulted from this. Hope to be wrong obviously.
| user5994461 wrote:
| The replication was redone around that time (not sure which
| version exactly). It's still working on the same principles
| though, sending queries and redoing them on each replica.
|
| Before in short, the WAL was sent every minute and always 10MB
| even if there were no changes. Now it's more adaptive, actually
| doing nothing when they are no changes, and picking up quicker
| when changes begin.
|
| I am surprised they don't mention this point because the
| replication was really unusable in PostgreSQL.
|
| There are still spikes (write amplification) and other drawback
| from this design, but at least it doesn't shit itself under no
| activity.
| Quekid5 wrote:
| I don't understand this. Why would it be sending queries and
| redoing on a replica _and_ sending the WAL? Just sending the
| WAL would seem to be sufficient, or alternatively: sending
| queries would be redundant if you just send the WAL and apply
| directly at the secondaries.
| amenonsen wrote:
| It doesn't make sense to you because everything in the
| comment you're replying to is wrong.
|
| Neither log shipping (copying WAL files one by one) nor
| streaming replication (sending a stream of WAL) works by
| sending queries. WAL segments are 16MB by default, and the
| default archive_timeout is 0, not 1 minute (and the archive
| timeout is not applicable to streaming replication anyway).
| There is also nothing "adaptive" about the replication--
| when there is no traffic, there will be ~no changes, and
| when there are changes, they will be sent to the replica.
|
| I don't understand what the comment is suggesting used to
| happen in periods of no activity that made replication
| unusable, but it is also probably incorrect, and has
| nothing to do with the write amplification problem.
| Quekid5 wrote:
| Thank you for confirming my suspicions :).
| chousuke wrote:
| In streaming (physical) replication, PostgreSQL sends the
| WAL only, and it's applied on the replica; no "sending of
| queries" is involved, or even possible; with physical
| replication, the secondary has to keep itself identical
| with the primary, otherwise replication will fail. This is
| why you can't use physical replication across major
| versions, since they can't be bit-for-bit identical.
|
| In more recent versions there's "logical replication",
| which sort of "sends the queries", in that the secondary
| node has its own database state that does not have to be
| exactly identical with the primary, allowing for
| replication across major versions.
|
| In my opinion though, unless you really _need_ logical
| replication for some reason, stick with streaming
| replication. It 's much easier to understand and there are
| fewer failure modes.
| amenonsen wrote:
| > In more recent versions there's "logical replication",
| which sort of "sends the queries"
|
| What it sends is not the queries, but a logical
| description of the changes to each row that were made by
| running the query. So an UPDATE that changes N rows would
| generate N changes to be applied to the corresponding
| rows (usually identified by primary key) on the logical
| replica, not a single update that had to be "re-
| executed".
| moonbug wrote:
| Article about Uber are always instructive: read and then do the
| exact opposite.
| chovybizzass wrote:
| because its simpler? i found postgres to be overly complicated
| compared to MySQL/Maria
| DaiPlusPlus wrote:
| Define: "complicated"
| corty wrote:
| PostgreSQL: "your date 2020-02-31 isn't a date, fix that"
|
| MySQL: "2020-02-31? Whatever man, I'll just enter
| something..."
| evanelias wrote:
| MySQL's default settings reject invalid dates for over 5
| years now, since MySQL 5.7.
|
| How long are people going to keep repeating this complaint?
| Literally every version of MySQL and MariaDB that allows
| invalid dates by default (MySQL 5.6 and older, MariaDB 10.1
| and older) has reached end-of-life for upstream support
| from the vendor!
| [deleted]
| consp wrote:
| Considering the US uses a weird date format, I definitely
| prefer the former in combination with input sanitation
| forcing you to thing about your actions before assuming the
| database will fix it for you.
| corty wrote:
| Agreed. As with strong typing in programming languages, I
| do prefer a database to be strict in rejecting invalid
| inputs. MySQL does two bad things here: It accepts an
| invalid input plus it interprets it creatively, producing
| something the user most probably didn't intend. In that
| respect, MySQL is almost as bad as Excel creatively
| "interpreting" dates.
|
| Another example of a database doing improper things would
| be Oracle mixing up the empty string with NULL. In
| Oracle, both are the same...
|
| MySQL has a few more of those gotchas, e.g. regarding
| broken charsets (UTF-8 isn't 'utf8', it is 'utf8mb4',
| 'utf8' is an alias for 'utf8mb3' which is a broken
| subset). I wouldn't use MySQL for any data that was
| important to get back consistently. However, since Uber
| seems to be using some schemaless "we don't care"-layer
| anyways, that point is moot for the original article.
| vinger wrote:
| utf8mb3 isn't a broken subset. It includes characters in
| the basic multilingual plane which is fine for 99% of
| cases.
|
| uft8mb4 uses more bytes than necessary for most. You get
| 255 varchar limits with uft8mb3, I think you only get 192
| characters with uft8mb4.
| Alexendoo wrote:
| uft8mb4 does not use more bytes than utf8mb3, for
| anything representable by utf8mb3 the size in uft8mb4 is
| identical. Anything that would be 4 bytes in uft8mb4 is
| not able to be stored in utf8mb3
|
| utf8mb3 is definitely a broken subset, it's deprecated at
| the very least
| bbarnett wrote:
| https://dev.mysql.com/doc/refman/8.0/en/sql-
| mode.html#sql-mo...
|
| Much of your MySQL complaint is not a MySQL issue, but a
| config issue.
|
| And yes, powerful config options are good, not bad.
| BitPirate wrote:
| One can still complain about mysqls dumb defaults.
|
| https://dev.mysql.com/doc/refman/8.0/en/innodb-
| parameters.ht...
|
| "Hey, let's just not act transactional on a timeout by
| default"
| evanelias wrote:
| That default is sensible. You're making it sound like
| this setting applies to all "timeouts" regardless of
| type. That is not the case.
|
| This setting applies to lock-wait timeouts. The default
| value allows applications to decide on the correct course
| of action: either re-try just the statement that timed
| out (without having to re-do the previous parts of the
| transaction), or rollback the transaction.
|
| The application still receives an error on the timeout
| regardless of this setting. The database doesn't
| automatically commit the previous statements in the
| transaction regardless of this setting.
| exikyut wrote:
| Ooh, sounds like PHP 5!
| rubyist5eva wrote:
| The mysql behavior terrifies me because eventually you're
| going to end up with something you didn't expect and it's
| going to be a pain in the butt to track it down.
| isoprophlex wrote:
| That doesn't exactly sell me on MySQL, sounds like a recipe
| for disaster
| corty wrote:
| There is more, however not everything is still valid or
| valid for every table type in MySQL: https://sql-
| info.de/mysql/gotchas.html
| evanelias wrote:
| "This page deals with issues related to MySQL 4.1 and
| earlier, not 5.0"
|
| MySQL 5.0 came out over _15 years ago_.
| chris_wot wrote:
| That's not decreasing complexity, that's just shifting it
| out of the database layer.
|
| See https://news.ycombinator.com/item?id=26272084
| martimarkov wrote:
| Oh yeah old complication of sanitising input...
|
| It's like asking you for an int and you entering 2.358 and
| saying that's just simple.
| sschueller wrote:
| Recovering a split brain Galera cluster. Fun times...
| eznzt wrote:
| Just creating a new user is annoying enough.
|
| Permissions are also much more complex.
|
| What the hell are schemas?
| looperhacks wrote:
| How is creating a new user complicated? The normal CREATE
| USER is all I've ever needed to create a new user in
| postgres (assuming I don't have set up the pg_hba so that I
| need to allow every user separately)
| mgkimsal wrote:
| Most tutorials/instructions I read have you use
| "createuser" command from the system shell. But... you
| have to be able to switch to a system 'postgres' user
| _first_ , which ... perhaps you don't have privileges to
| do, or need sudo access or whatnot.
|
| If you _can_ install postgres, _connect to it directly
| with some sort of root identity_ , then immediately
| create users and databases (as is the case with pretty
| much every mysql walk-through I've ever seen), it's not a
| default.
|
| https://wiki.postgresql.org/wiki/First_steps
|
| "The default authentication mode is set to 'ident' which
| means a given Linux user xxx can only connect as the
| postgres user xxx."
|
| This alone is a complicated/confusing thing, because it's
| mixing system accounts with the db server accounts/access
| - and none of that is obvious, and doesn't quite map to
| how other databases handle things. I've never had to have
| matching system account names for user access in MSSQL,
| for example.
| chousuke wrote:
| With MySQL, you'll still have to switch to root to
| connect by default? I honestly don't remember, since it's
| been ages since I set up MySQL manually.
|
| If MySQL actually allows administrative access out-of-
| the-box without any kind of special authorization, then
| that's a _terribly_ insecure default.
|
| With PostgreSQL, you have to switch to the superuser to
| configure things further because that's the only sane
| default you can have on an unconfigured system. If you
| can run commands as the user PostgreSQL is running as,
| you are "safe" to trust, and PostgreSQL will let you in.
|
| UNIX ident authentication is also is extremely convenient
| for local applications, since you don't even have to have
| a password for the account, or make the PostgreSQL server
| network-accessible in any way.
|
| Oracle can do the same thing, and so can MySQL,
| apparently (with IDENTIFIED VIA unix_socket).
|
| MySQL user management has its own complexity in that you
| have to manage "user@address" identities, and the same
| user at different addresses or auth methods can have
| different permissions. How's that "simple"? With
| PostgreSQL, your users will at least map to the same user
| regardless of how they authenticate themselves.
| mgkimsal wrote:
| "With MySQL, you'll still have to switch to root to
| connect by default? "
|
| You connect with a root account from any account, and
| when installed, the root account password is part of the
| setup process.
|
| "and the same user at different addresses or auth methods
| can have different permissions"....
|
| It joe@localhost and joe@remotehost don't have to be 'the
| same user' in that they're not tied to a system account
| in any way.
|
| Granting different privileges to joe@local and joe@remote
| based on where they're coming from isn't necessarily
| "simple", but no one claimed it was. My own response was
| validating that PostgreSQL user setup was somewhat
| confusing.
|
| EDIT: Bringing up "mysql sucks" points when I was
| explaining how PostgreSQL 'create user' stuff can be
| confusing just reeks of whataboutism.
| chousuke wrote:
| I'm just not sure how it's confusing? PostgreSQL users
| aren't "tied" to system accounts either. You can have any
| number of PostgreSQL users that have no system
| equivalent.
|
| In fact, the process seems to be exactly the same as with
| MySQL: I just tried installing the MariaDB server (dnf
| install mariadb-server), and it didn't prompt me for an
| admin user; instead, I can directly connect to the
| database as root using sudo, so in this case it appears
| to be doing the exact same thing that PostgreSQL does.
|
| It just happens to be that by default the "postgres"
| superuser has a corresponding "postgres" system user that
| can log in via OS authentication, so you need to switch
| to the postgres user instead of root.
|
| EDIT: Maybe some of the confusion stems from the fact
| that the documentation you linked seems to assume that
| the database is created according to convention to run as
| the "postgres" user (as it usually is). If your user
| didn't have the required permission to switch to the
| postgres user, they wouldn't be able to install the
| database as said user in the first place.
|
| If you install PostgreSQL as your own user (which is not
| a good idea if you have any other option), you will not
| need to switch users as you will obviously have access to
| the database files and can do whatever you want, anyway.
| rleigh wrote:
| This depends entirely on how you want to set up and run
| the system. For packaged versions running as a system
| service with a dedicated service user, this is absolutely
| correct. And I would argue, it's a pretty sensible
| default arrangement.
|
| But... there's absolutely nothing prohibiting you from
| running initdb as a regular user and then running the
| main daemon with your credentials. You are then the
| database owner and superuser. This type of thing is
| really useful for integration testing. But it's
| potentially useful when you don't care about the
| multiuser aspect and just want to have it run.
| vinger wrote:
| Does postgres still require a separate user to access the
| db.
|
| I remember this being a limitation in 2008.
|
| That forced user creation always pushed me to mysql
| because I hate having separate users for each service
| because you still have to manage and account for these
| extra accounts.
| isoprophlex wrote:
| "What the hell is scoping? Why not put every variable in
| global scope? Much less complex"
| paulryanrogers wrote:
| Schemas are SQL standard namespaces within a DB. You can
| join between them. Mysql allows joining across databases,
| so it doesn't implement schemas.
| tinus_hn wrote:
| It would be great if there was some management GUI for
| these tasks so you don't have to look up the syntax for
| these things that in many deployments you only do once.
| mvanbaak wrote:
| pgadmin ;-P
| tinus_hn wrote:
| This actually looks pretty reasonable, I am going to look
| into it. First I need to figure out how to open up the
| server for connections but still limit it, though.
| jhauris wrote:
| Look at the pg_hba.conf file (probably something like
| /etc/postgresql/<version>/main/pg_hba.conf).
|
| https://www.postgresql.org/docs/current/auth-pg-hba-
| conf.htm...
| tinus_hn wrote:
| Perhaps it's better to only listen to localhost and
| connect through a SSH proxy connection
| rleigh wrote:
| Or DataGrip if you already have a JetBrains licence.
| rektide wrote:
| both kubernetes operators have some ok user management
| built in
| lucian1900 wrote:
| What MySQL calls databases are actually schemas. They're
| even aliases as such.
|
| MySQL doesn't have multiple SQL databases, you've been
| using multiple schemas.
| oftenwrong wrote:
| Schemas are similar to databases in mysql. They serve as a
| namespace. In mysql you can have a database `foo` and a
| table `foo.bar`. In postgres you can have a schema `foo`
| and a table `foo.bar`. In postgres you can have multiple
| databases in a cluster, and multiple schemas within each of
| those databases.
| eznzt wrote:
| What you are basically saying is that yes, they are much
| more complex.
| desas wrote:
| It's another layer, you don't have to use it. If you
| pretend schemas don't exist you basically never know they
| do, unless you go looking for complexity in the postgres
| bowels.
| Tostino wrote:
| To be honest, it's better to think of MySQL db = Postgres
| schema, because I'm MySQL you can do cross db queries and
| there is no intermediary schema level, and in Postgres
| you can do cross schema queries, but not cross db
| queries.
| chris_wot wrote:
| No, what you mistake for "complexity" seems to be your
| general unfamiliarity of what a schema is. In fact, when
| you understand what schemas are, then it actually makes a
| lot more sense.
| rleigh wrote:
| They are not complex, and they are entirely optional.
| They are just a namespace. You are free to never, ever
| use them.
|
| For a company the size of Uber, I don't think spending
| five minutes reading the documentation for createuser is
| a significant burden to deployment. PostgreSQL is very
| easy to deploy.
| [deleted]
| petergeoghegan wrote:
| I committed a patch that added a mechanism I called "bottom-up
| index deletion" recently:
|
| https://www.postgresql.org/docs/devel/btree-implementation.h...
|
| https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
|
| Bottom-up deletion is specifically designed to ameliorate what
| the blog post refers to as "write amplification". Testing has
| shown that it's very effective with many workloads.
| Tostino wrote:
| Just wanted to say how impressed I was with this solution and
| the results it achieved when I was following the development on
| -hackers.
| petergeoghegan wrote:
| Thanks
| syspec wrote:
| What is -hackers?
| petergeoghegan wrote:
| The Postgres community mailing list for development work --
| pgsql-hackers.
| KingOfCoders wrote:
| 2016.
| junon wrote:
| There's some historical context for this article. 2016 was a year
| of RAPID growth for Uber. There was a running statistic
| internally that your employee ID would be at the median point
| just 6 months after being hired.
|
| They were trying to hire (and poach) just about anyone they could
| around this time. Therefore, these articles are... very shiny,
| compared to the actual tech applied internally (note that even
| though Uber is referred to in the third person here, this is on
| uber.com and written by an Uber employee).
|
| I worked at Uber for a year. Schemaless was... meh. Nobody really
| liked using it, nobody really understood it, and you weren't
| really allowed to host your own instance - you had to have
| another internal team do it for you, which didn't help the
| "understanding" problem.
|
| It smelled distinctly of "not invented here" syndrome. A number
| of things inside Uber worked that way - the culture was so
| competitive and brutal, performance reviews were always a
| massacre, so everyone was trying to outshine their peers (or
| outright climb on their backs, etc).
|
| This resulted in a LOT of "tech" being "invented" that 1:1 did
| something already prominent in open-source or was already an
| enterprise solution (probably cheaper than paying engineers to do
| it) but since actually achieving it and having your name on it
| meant you would look better for a promotion or a bonus or
| whatever over a colleague meant it was worth it to the individual
| to reinvent the wheel. Rinse and repeat over and over again.
|
| I'm not an enemy of reinventing the wheel, mind you. But only if
| the new wheel works significantly better than the old one. This
| was rarely the case at Uber.
|
| Postgres was still used somewhat commonly at Uber when I was
| there, but they were really pushing for Schemaless internally. It
| felt very overkill for just about everything outside the platform
| teams and was always, without fail, a massive pain to deal with.
|
| Don't be fooled by these Uber engineering articles. This was PR
| to bolster up their OSS image to outsiders to help with hiring
| and poaching at the time. Things internally looked very
| different.
| midrus wrote:
| I think this applies to most companies. What they write in
| their blogs is a shiny, optimistic, limited view of the best
| part of their best system or similar. Once inside, things are
| never that great.
|
| I myself was very ashamed of a company I worked for (also SF
| based) blog post... even the author of the post was a very well
| known open source maintainer of many libraries of a very
| popular programming language. Reading the posts in the blog was
| like.... I cannot believe we lie this big... internally things
| were just crap, and what the blog post made look like it was
| the norm, was just a side project of this person.
|
| So, never trust companies blog posts by default.
| fma wrote:
| Can we maybe get a list of company blogs that are legit? As
| in they practice what they preach?
|
| Facebook, Amazon, Netflix Google, Microsoft comes to the top
| of my head but someone is free to burst my bubble.
| staticassertion wrote:
| Cloudflare's blogs seem far less markety and far more about
| sharing interesting technical wins.
| noir_lord wrote:
| I've dealt with cloudflare people semi-regulary recently
| and they seem to know their shit, they also (from an
| outside perspective) seem to have a really good culture.
|
| Questions get answered quickly and with a level of detail
| appropriate to the person asking.
|
| Really impressive given the usual "Enterprise(TM)" level
| of support from most service suppliers.
| spiralx wrote:
| Figma and Instagram have had good articles on their blogs,
| although I'm not a regular follower of either.
| mlthoughts2018 wrote:
| I agree. I once worked for a company that wrote a blog post
| on reinventing and upgrading A/B tests, but the ugly truth
| was that the company couldn't run an A/B test to save its
| life. Every A/B test was a disaster, there were fragmented
| different frameworks for A/B tests in different parts of the
| product, with inconsistent and unreliable data, and the core
| clickstream ingestion system that was the foundation for any
| possible way of testing would crash and go down for hours
| every few weeks, and product management would just silently
| ignore any effects of missing data or correlation between
| failures in different product features. Even though we had a
| team of statistical researchers, they were treated as if
| legit stats 101 concerns were just ivory tower academic hair
| splitting and often were silently omitted from being in A/B
| test design, recap or decision meetings, which were instead
| run by product managers with no stats training.
|
| I remember interviewing a candidate once who said he was
| excited about the role because of that A/B testing blog post,
| and I just thought - geez what a completely soulless bait and
| switch ploy.
| supergirl wrote:
| is schemaless still the db now?
| junon wrote:
| No idea, but I would wager so. The scale it was being used at
| would be hard to replace, and they'd have little reason to do
| so.
|
| EDIT: To be clear, there is no "the DB" at Uber. It was the
| main database flavor that the larger teams used, but they
| used everything at Uber, from MySQL to postgres, to mongo,
| etc. Sometimes with things on top, sometimes directly. For
| more analytical/financial things, they used
| HBase/Hadoop/Cassandra, or even older things like old IBM
| database tech from the 80's. Really weird mix of stuff, and
| it really depended on which high-profile engineers they hired
| in which part of the company.
| arnejenssen wrote:
| Does Uber use event sourcing?
| exhaze wrote:
| Yes
| lumost wrote:
| TBH, everything that was listed as a complaint could be a
| complaint for nearly any transactional RDBMS. For workloads that
| require heavy always on replication and availability RDBMS's
| haven't been the go to solution for a long time vs. distributed
| DBs. Changing from Postgres to Mysql or MySQL to Postgres (or
| even Oracle) won't really buy you much if you're running into
| these issues.
|
| Even this one
|
| > The bug we ran into only affected certain releases of Postgres
| 9.2 and has been fixed for a long time now. However, we still
| find it worrisome that this class of bug can happen at all.
|
| (rare/specific) Data Corruption bugs around master-promotion and
| handoff occur in every major DB. MySQL is no different, and I've
| personally had to track down issues in a few popular products. If
| you run thousands of copies of a piece of software with different
| workloads and hardware configurations... you're going to find
| bugs.
|
| After all - how many DBs passed Jepsen on the first shot!
| ignoramous wrote:
| Previous discussions:
|
| 2016: https://news.ycombinator.com/item?id=12166585
|
| 2018: https://news.ycombinator.com/item?id=17280239
|
| Community responses:
|
| - https://news.ycombinator.com/item?id=12216680
|
| - https://news.ycombinator.com/item?id=12179222
| chris_wot wrote:
| The response was:
|
| https://news.ycombinator.com/item?id=14222721
| asah wrote:
| Even though 99.9% of applications will never run into Uber's
| issues, it's been 4 years and 4 major versions later, and I'd
| love to review these complaints and see if they still apply
| to PG 13.
| Tostino wrote:
| Wait until 14 for that comparison and it'll look much
| better. Bottom up index deletion helped solve some of the
| write amplification issues.
| petergeoghegan wrote:
| There is also index deduplication in Postgres 13 and the
| B-Tree enhancements in Postgres 12. All of these
| enhancements significantly improved the situation for
| workloads affected by what the blog post calls write
| amplification. (I myself call this phenomenon index
| version churn, since it is more descriptive and has less
| baggage.)
|
| I was the author of all of the above, including the
| Postgres 14 work you mentioned (though Anastasia
| Lubennikova was the primary author of index
| deduplication). To me it feels like one very large
| project -- the effects are cumulative, and each major
| Postgres version had B-Tree work that built on the last
| release in one way or another.
| dang wrote:
| Thanks! Here's an annotated list of all those (plus chris_wot's
| and a couple others):
|
| _Why Uber Engineering Switched from Postgres to MySQL (2016)_
| - https://news.ycombinator.com/item?id=17280239 - June 2018 (47
| comments)
|
| _Re: Why Uber Engineering Switched from Postgres to MySQL_ -
| https://news.ycombinator.com/item?id=12179222 - July 2016 (67
| comments)
|
| _Why Uber Engineering Switched from Postgres to MySQL_ -
| https://news.ycombinator.com/item?id=12166585 - July 2016 (294
| comments)
|
| _Thoughts on Uber's List of Postgres Limitations_ -
| https://news.ycombinator.com/item?id=12216680 - Aug 2016 (103
| comments)
|
| _A PostgreSQL response to Uber [pdf]_ -
| https://news.ycombinator.com/item?id=14222721 - April 2017 (82
| comments)
|
| _Why we lost Uber as a user_ -
| https://news.ycombinator.com/item?id=12201353 - Aug 2016 (285
| comments)
|
| _Uber 's Move Away from PostgreSQL_ -
| https://news.ycombinator.com/item?id=12223216 - Aug 2016 (15
| comments)
| burnthrow wrote:
| 2016
| huy-nguyen wrote:
| How does Schemaless compare to Vitess (https://vitess.io/)?
| villgax wrote:
| Lol, a db that squirms at unicode/utf-8 out of the box?
___________________________________________________________________
(page generated 2021-02-27 23:03 UTC)