[HN Gopher] Spending $5k to learn how database indexes work
       ___________________________________________________________________
        
       Spending $5k to learn how database indexes work
        
       Author : anglinb
       Score  : 119 points
       Date   : 2021-11-06 18:25 UTC (4 hours ago)
        
 (HTM) web link (briananglin.me)
 (TXT) w3m dump (briananglin.me)
        
       | sushsjsuauahab wrote:
       | Ssh into an ec2 instance, install mysql, and you'll never pay
       | more than $7.50 a month!
        
       | NicoJuicy wrote:
       | 0,15 $ per query... The world has gone insane.
        
         | cperciva wrote:
         | It's not quite so crazy when you phrase it as $0.15 _for
         | reading a million items from the database_.
        
           | Spooky23 wrote:
           | 25 years ago i built and ran a biggish database system that
           | supported a reservation system.
           | 
           | Even given the limitations of the time (RDBMS cost, 9GB
           | disks, Sun kit, etc), our cost of good sold for that type of
           | workload was exponentially less. (At scale) Today, I could
           | probably run that company off my MacBook Pro and have room to
           | spare.
           | 
           | That said, the rationale for choosing this technology is
           | cute: "After seeing a ton of the best GitHub engineers end up
           | at PlanetScale and seeing the process GitHub went through to
           | issue simple migrations, we chose to use their service."
           | 
           | If you use the same methodology to choose a database that the
           | public uses to choose between Bud Light, Miller Lite, and
           | Coors Light, expect a suboptimal outcome.
        
             | Nextgrid wrote:
             | I have seen so much over-engineered startup shit costing
             | thousands in AWS fees that could run faster on my laptop.
        
           | williamdclt wrote:
           | No that's still crazy. Scanning a million items isn't a big
           | workload.
        
             | cperciva wrote:
             | I'm not saying that it's not crazy. Just that it's _less_
             | crazy.
        
             | friedman23 wrote:
             | I agree, it might be distributed in some way which is
             | driving up the cost?
        
               | Nextgrid wrote:
               | The greed is distributed across the cloud provider's
               | C-suite for sure. To be fair to them they refunded the
               | bill afterwards according to the article but IMO we
               | should not be accepting this kind of pricing models as
               | normal.
               | 
               | A million items isn't a big deal, distributed or not. If
               | anything, if your distributed architecture makes reading
               | a million items _more_ costly than a single machine doing
               | it then it 's time to go back to the drawing board.
        
               | williamdclt wrote:
               | They mention that PlanetScale uses Vitess, so indeed it
               | might be distributed
        
           | akudha wrote:
           | One of my employers was using BigQuery. I was so scared that
           | I might accidentally run queries and get a big bill, even
           | though our tables weren't that big.
           | 
           | It is funny to look back, but getting huge bills without even
           | realizing that we're doing something wrong is very real
           | possibility. Cloud vendors happily make their pricing opaque
           | as it benefits them.
           | 
           | I'd avoid even the best product in the industry, if their
           | pricing is opaque. Or if there is a "Contact Us" button when
           | there needn't be.
        
           | XCSme wrote:
           | Still crazy, charging $0.15 for probably less than a few ms
           | of computing power.
        
             | cperciva wrote:
             | If those million items need to be read from different
             | servers -- as they might well, in a distributed database --
             | it's definitely not just a few ms of computing power.
             | 
             | For reference, reading a million items of up to 1 kB each
             | costs $0.125 with on-demand dynamodb.
        
           | NicoJuicy wrote:
           | 20 of those queries give me a VPS at Hetzner for a month.
           | 
           | Or 1 query = 3 gb / snapshot storage for a month.
        
       | smoldesu wrote:
       | Huh, learning about this "Superwall" product constitutes as my
       | horror-story-of-the-day. It's paywalling as a service, just what
       | the industry needed. Thankfully it appears to be quarantined to
       | iOS right now, but God does it feel like we're headed right back
       | into Stallman's predictions about how SAASS will ruin the
       | landscape of commercial technology.
        
         | jakemor wrote:
         | Two sides to every coin. Ethical developers need ways to make
         | low prices work, which is impossible without good testing
         | suites.
        
           | smoldesu wrote:
           | These days, if you're developing something for profit it's
           | pretty hard to see your software as ethical. You're either
           | trying to empower your user or trying to monetize them, the
           | two will always fight one another and snuff the other out
           | unless you, the developer, take a stand.I fully understand
           | the market for proprietary software, but trying to define
           | some ethical middle ground is just blatant lip service,
           | nothing else.
        
         | Nextgrid wrote:
         | It's better than ads though.
        
           | andybak wrote:
           | It's better than targetted ads built on intrusive tracking
           | that also enables several other abusive business practices.
           | It's not better than "good old fashioned ads". Heck I'd even
           | be ok with targeted ads if it could be done without the rest
           | of the "destroy civil society" that seems to come along for
           | the ride.
        
             | Nextgrid wrote:
             | Even if you solve the privacy problem, there's still a
             | problem with advertising which is that it's inherently at
             | odds with the user's interests.
             | 
             | An advertising-funded product will always prioritize
             | engagement - they want you to "engage" with the product
             | even if it means degrading the experience intentionally
             | such as making a process manual or take more steps than
             | necessary (so that you are exposed to more ads). The
             | "destroy civil society" problem you mention is a direct
             | consequence of the pursuit of engagement.
             | 
             | In contrast, with a paid product, the company's interests
             | are directly aligned with yours and they have no incentive
             | to intentionally degrade the experience or get in your way
             | any more than necessary. They don't care about how much you
             | "engage" with the product as long as the bill gets paid (if
             | anything, the less you engage the better as it uses less
             | server resources).
        
               | smoldesu wrote:
               | Paid products can still do all the same datamining that
               | advertisers do, and there are markets for buying/selling
               | that info (eg. Palantir). The truth is that all forms of
               | monetization are inherently at-odds with the user's
               | interests. Paying for an app doesn't magically make this
               | friction go away, and it _certainly_ doesn 't reduce the
               | incentive for developers to abuse your trust.
        
               | Nextgrid wrote:
               | > there are markets for buying/selling that info (eg.
               | Palantir)
               | 
               | A major reason for these markets is that the information
               | can be used for advertising or marketing targeting. In a
               | world where the majority of products/services are paid
               | and the amount of advertising is significantly reduced
               | there will be much less demand for this information,
               | leading to lower prices and even lower payoff from
               | selling this information, not to mention potential legal
               | risks (GDPR, CCPA, etc).
        
               | smoldesu wrote:
               | Palantir's biggest customers are government entities and
               | market researchers, not advertisers. In a world where
               | advertising has been significantly reduced, their
               | products would become _more valuable_ , since regular
               | analytics would become inaccessible. Compared to the data
               | these corporate aggregators collect, ad fingerprinting
               | seems trivial.
               | 
               | The biggest crux of this, though, is the fact that both
               | of these monetization schemes are destructive. Paying for
               | software simply doesn't make sense in most cases, as
               | there aren't that many people who are developing novel
               | solutions these days. That's the exact reason why
               | advertising is so popular: the market _knows_ that
               | relying on your conscious contribution is unsustainable,
               | so why should you believe otherwise?
        
               | dhosek wrote:
               | You're assuming it's either/or. When I worked at eHarmony
               | we had a monitor that would scroll user feedback
               | submitted online near the developer area. One of the most
               | common complaints was about ads being shown to paid
               | users.
               | 
               | I asked a PM about this and was told that the money they
               | made was too much to turn down.
               | 
               | Capitalism. It's why we can't have nice things.
        
               | Nextgrid wrote:
               | I don't disagree with that, but that's still not a reason
               | to turn down paid options. With ads, you are guaranteed
               | to get a bad experience. With paid product, there is
               | _potential_ for a bad experience, but at least it 's not
               | the guaranteed default.
               | 
               | Regulation around ads is the only definitive solution,
               | but in the meantime if there's a business model that's
               | non-toxic I'm not going to hate on it even if technically
               | someone could still misuse it.
        
       | radu_floricica wrote:
       | I'm just leaving this here:
       | 
       | https://www.hetzner.com/dedicated-rootserver/ax161/configura...
       | 
       | Draw that nice red slide all the way to the right. No, it's not
       | storage. Yeah, it's actually affordable. Yeah, that was a sexual
       | sound you just made.
       | 
       | You do have to be prepared to know some basic sysadmin, or pay
       | somebody to do it for you. My newest server has about 60 cores
       | and half a tera of ram. Surprisingly, it's not uber sharp - I
       | went with high core count so individual queries actually got
       | slower for about 20%. But that load... you can't even tell if the
       | cpu load gauge is working. I can't wait to fill it up :D Maybe
       | this black friday season I'll get it to 10%.
        
         | eknkc wrote:
         | We are currently on AWS, and we have several dedicated servers
         | on LeaseWeb that we offload computational work on. These are
         | cheap beasts.
         | 
         | Still, I'd not run my RDBMS on an unmanaged, non replicated
         | dedicated server and I'd not bother setting up multiple servers
         | with failover, automated backups etc and keep updating them.
         | Fuck that, I'll pay whatever AWS says RDS costs.
        
         | amenod wrote:
         | ...and Hetzner just started offering their services in the US a
         | few days ago. (EDIT: not affiliated)
         | 
         | If you do something stupid with your code at least you won't go
         | bankrupt, only your service will be slower.
        
           | Nextgrid wrote:
           | With the performance of these servers you have a huge margin
           | for stupidity before you even notice any slowdowns.
        
             | [deleted]
        
           | throwawayapples wrote:
           | I was just comparing the pricing to OVH's cloud VPS's
           | (https://us.ovhcloud.com/) and, accounting for the currency
           | conversion and OVH's huge amount of free unmetered bandwidth
           | (vs 20TB for H), it actually looks like OVH is even cheaper.
           | 
           | How can DO and Vultr even compete? Probably on the basis of
           | their nicer dashboards and easier sign-up flow (especially
           | OVH's)
        
             | radu_floricica wrote:
             | Tried OVH once. Couldn't even finish the account setup. Was
             | rather dissapointed because I wanted not to depend on just
             | one provider. Might give them a chance again.
        
               | kingcharles wrote:
               | I just recently signed up for some stupidly low-priced
               | VPS on OVH and encountered no problems at all with the
               | sign-up. The only issue I had was when I went to increase
               | the RAM - it said the price was $1/month, but when I
               | bought it I was charged $5 and it took support a week to
               | get back to me to tell me I didn't read the small print
               | properly and that it said they would have to upgrade the
               | Windows license which was an extra fee.
               | 
               | tl;dr: watch the small print and their support is slow as
               | fuck. otherwise, incredible value for money.
        
               | joconde wrote:
               | I rented a VPS from the French website (their native
               | language), and it was confusing. I remember looping
               | through the same 2 pages 10 times before actually finding
               | the instance's access instructions.
               | 
               | Of course, you don't have to care about the console once
               | you have SSH access to the server. It seems to be pretty
               | good service for the money.
        
               | NicoJuicy wrote:
               | Hetzner is German, not french.
               | 
               | Ovh is french.
        
             | johndough wrote:
             | I think Hetzner removed their traffic limit for dedicated
             | servers https://www.hetzner.com/news/traffic-limit/
        
           | pronoiac wrote:
           | Oooooooooh!
        
           | jaden wrote:
           | Just to clarify, Hetzner now has Cloud servers in the US.
           | Dedicated servers are still only available in Europe.
        
             | adventured wrote:
             | I appreciate that info. I tried Hetzner a decade ago from
             | the US just to see what kind of latency I might expect.
             | I've been waiting for them to finally get US located
             | services to give them a serious shot (vs Digital Ocean &
             | Co.).
        
         | flippant wrote:
         | What are you using this for?
        
           | radu_floricica wrote:
           | www.couriermanager.com - SaaS for courier companies,
           | basically. The new server is the common pool instance - I
           | have others for dedicated clients.
        
       | crorella wrote:
       | It amazes me that things so basic and fundamental like
       | understanding the way indexes work are often overlooked or not
       | leveraged
        
       | ihusasmiiu wrote:
       | Let me understand please. These people are selling a commercial
       | product and their team has no idea whatsoever of what an index
       | is? And this is news?
        
         | fabian2k wrote:
         | It sounds a bit more like they were confused by the automatic
         | index creation for foreign keys they expected to be there. So
         | they probably knew they'd need an index, just assumed this was
         | implicit in the foreign key.
        
       | rabuse wrote:
       | Gotta love cloud pricing. This is why I colocate.
        
       | arpa wrote:
       | Well this was an embarrasing read.
        
       | max_hammer wrote:
       | I hate this pricing model
       | 
       | My company in boarded `fivetran` to source data from different
       | tools.
       | 
       | Budget got exhausted in sourcing `iterable` data
        
       | foreigner wrote:
       | The real answer here is cost limiting. I don't want my cloud
       | provider to keep working at the cost of an order of magnitude
       | higher bill than I was expecting because of a bug in my code. I
       | want to be able to set a billing limit and have them degrade or
       | stop there service if I exceed the limit.
       | 
       | AFAIK AWS doesn't have that. They do have the ability to send me
       | alerts if my bill is unexpectedly high, but they still keep
       | working until I go bankrupt. It's possible to use those alerts to
       | implement your own "broke man's switch", but they don't have it
       | built in.
        
         | babayega2 wrote:
         | That's why we use DigitalOcean a lot in Africa. You know
         | upfront how much you will spend.
        
       | racl101 wrote:
       | Yeah, It cost me two bad months of high RDS fees to learn about
       | indexes. $900 in total.
       | 
       | Then a bro showed me one night about the magic of indexes. 5
       | minutes worth of advice saved me hundreds of dollars per month in
       | the future and all he asked in return was for some beer and
       | chicken wings.
       | 
       | Now that is a good bro.
       | 
       | I'm happy to say I've paid it forward myself.
        
       | bigbillheck wrote:
       | I'm not a DB expert, but "750k users in a month." doesn't sound
       | like a quantity that you'd need to use some kind of fancy special
       | tooling for.
        
         | dahdum wrote:
         | It's far from big data even if they grow 10x in the next year,
         | but if you are unfamiliar with database migrations and branches
         | I can see the appeal of the product.
        
         | mkl95 wrote:
         | In a world where Juicero raised $120m, selling overengineered
         | solutions for simple problems is not necessarily a bad idea.
        
           | Nextgrid wrote:
           | The problem isn't _selling_ (there 's plenty of dubious or
           | badly-priced products being around), it's that someone
           | thought _buying_ said product was a good idea.
        
       | xupybd wrote:
       | No foreign keys to make migrations easier. That doesn't sound
       | like the best trade off to me.
       | 
       | Having the database constrained as much as possible makes
       | maintenance so much easier. Many bugs don't escape into
       | production as they're caught by the database constraints. Those
       | that do get out do less damage to the data.
       | 
       | I know scale comes with trade offs but that seems extreme to me.
        
         | cerved wrote:
         | oh no, foreign keys are useless because it's the apps
         | responsibility to delete
         | 
         | /s
        
         | derekperkins wrote:
         | I'm a Vitess maintainer and I feel the same way. I don't plan
         | to use any of the Online DDL because you'll have to pry my
         | foreign keys out of my cold, dead hands. I understand the
         | reasoning and limitations, but like you, the trade-off isn't
         | worth it to me.
        
           | anglinb wrote:
           | I'm so curious, so you maintain Vitess but don't use it
           | personally?
        
             | derekperkins wrote:
             | I do use it in production and have for years, just not the
             | online schema changes. It's fantastic and FKs are supported
             | in a single shard, which we use heavily.
        
           | leetrout wrote:
           | What are your thoughts on Citus and Cockroach where foreign
           | keys are still supported when creating partitioned clusters?
           | 
           | Is due to fundamental differences in postgres vs innodb?
        
             | derekperkins wrote:
             | Vitess still supports foreign keys (single shard), using
             | MySQL, just not with the Online DDL functionality.
             | 
             | I think Cockroach tries to be a little too magical, which
             | is great for starting up a cluster, but I think you can
             | architect for much better performance with Vitess and
             | owning your sharding model. I'm also very happy to use
             | InnoDB, one of the most battle tested db engines to ever
             | exist, while Cockroach is currently rewriting theirs from
             | scratch. At the distributed level, I don't know of any
             | massive scale adopters of Cockroach yet, though I'm not
             | 100% looped in, so forgive me if I'm ignorant of them. On
             | the other hand, Vitess has seen adopters like Slack,
             | GitHub, Square, HubSpot, YouTube, with many more in various
             | stages of adoption.
             | 
             | I feel like Citus might be trying to be too many things and
             | so hasn't gotten the traction that Vitess has. Vitess has
             | nailed OLTP at scale, while Citus is trying to also do OLAP
             | and be a single source. That's the holy grail, but I'm not
             | sure that any technology is close to handling both of those
             | well yet.
        
         | dhosek wrote:
         | It's possible during a migration to drop a constraint, make the
         | update and restore the constraint. If a schema migration tool
         | doesn't automate this or at least permit it, it's not a good
         | schema migration tool.
        
       | fabian2k wrote:
       | That pricing model seems rather inherently tricky to me, and also
       | quite expensive. At $1.50 per 10 million rows read this can get
       | very expensive the moment you do a full table scan on any non-
       | trivial table. And while this example is a trivial case where you
       | only need minimal database knowledge to ensure that no full table
       | scan is necessary, many real world cases are much more complex.
       | 
       | It also seems very expensive compared to just renting DBs by
       | instance, if you put any real load onto this. I can see this
       | being attractive if your use case only queries single rows by
       | key, but it's essentially a big minefield for any query more
       | complex than that. A database with a rather opaque query planner
       | doesn't seem like a good fit for this kind of pricing.
        
       | nrmitchi wrote:
       | As the author touches on, the main problem here isn't learning
       | about indexes. It's about "infinity scaling" working _too well_
       | for people who do not understand the consequences.
       | 
       | In no sane version of the world should "not adding a db index"
       | lead to getting a 50x bill at the end of the month without
       | knowing.
       | 
       | I am a strong believer that services that are based on "scale
       | infinitly" really need hard budget controls, and slower-scaling
       | (unless explicitly overidden/allowed, of course).
       | 
       | If I accidently push very non-performant code, I kind of expect
       | my service to get less performant, quickly realize the problem,
       | and fix it. I don't expect a service to seemingly-magically
       | detect my poor code, increase my bill by a couple orders-of-
       | magnitude, and only alert me hours (if not days) later.
        
         | anglinb wrote:
         | Haha yep, I was like wait I'm used to getting feedback from the
         | system telling me I messed up and this I barley noticed.
         | PlanetScale has Query Statistics that are really useful for
         | spotting slow queries but don't expose the "rows read" so you
         | can't really tie this view back to billing. I think they're
         | aware of this though and might expose that information.
        
       | revskill wrote:
       | Thanks. At least i'll never use PlatnetScale. A good service
       | should have config for me to alert/prevent these kinds of money
       | wasting cases.
       | 
       | Imagine how many wasted $$$ they earned based on common knowledge
       | that they should prevent for customers instead.
        
       | Grimm1 wrote:
       | I feel like indexes are a pretty fundamental type of DB
       | knowledge. In fact I'd say it's table stakes knowledge you should
       | have if you're working with them. Further more, knowing that
       | ForeignKeys typically apply an index to that column is also in my
       | head basic knowledge. I'm sorry you got burnt, and congrats on
       | learning a lesson, but you could have gotten the same knowledge
       | by ever googling MySql ForeignKeys and saved yourself a headache.
       | 
       | In fact it's like a big bullet point near the top of the docs
       | page.
       | 
       | "MySQL requires indexes on foreign keys and referenced keys so
       | that foreign key checks can be fast and not require a table scan.
       | In the referencing table, there must be an index where the
       | foreign key columns are listed as the first columns in the same
       | order. Such an index is created on the referencing table
       | automatically if it does not exist. This index might be silently
       | dropped later if you create another index that can be used to
       | enforce the foreign key constraint. index_name, if given, is used
       | as described previously."
       | 
       | I'm not entirely sure why buzz around "developer learns basic
       | knowledge" has this on the front page.
        
         | Nextgrid wrote:
         | > I'm not entirely sure why buzz around "developer learns basic
         | knowledge" has this on the front page.
         | 
         | The problem is that in the old days, not knowing about indexes
         | left you with an underperforming system or downtime. But in The
         | Cloud(tm) it leaves you with an unreasonably huge bill and that
         | somehow as an industry we're accepting this as normal.
        
           | aspenmayer wrote:
           | Using money to solve business problems is good business
           | sense, but only if that's the best way to spend that money. I
           | agree with you that the status quo is normal, but
           | nonsensical.
        
           | Grimm1 wrote:
           | Which really is a head scratcher. You'd figure especially as
           | a startup seeing a 5k oopsie isn't really as acceptable.
           | Mistakes do happen and I don't mean any shade to this
           | particular person (they'll never make this mistake again) but
           | as an industry the aggregate consequence of this is you have
           | a lot of waste and stupid choices that then have to be
           | cleaned up when more knowledgeable (read highly paid) people
           | are introduced later on.
           | 
           | They'll have to clean up the mess which causes real business
           | consequences that, and I've personally seen this, will
           | directly impact bottom line and have no quick or easy
           | solution to wiggle out of.
           | 
           | Maybe it's acceptable for products like this because the
           | balance between good engineering and company health probably
           | aren't as cut and clear but stuff like this always makes me
           | sad because it's such low hanging fruit, it doesn't require
           | any real effort, just basic curiosity around your job.
        
             | xupybd wrote:
             | The best technical people aren't always the best to start a
             | business. The goal is to make money not have perfect code.
        
               | Nextgrid wrote:
               | If someone comes to me and tries to sell me a service
               | that can leave me with an infinite bill I'd look at them
               | funny and walk away. But that's just me and maybe I just
               | don't get it and I'm not "startuping" right.
        
               | Grimm1 wrote:
               | Cool, and if this was a case of bike shedding over
               | something that hadn't just cost that early stage startup
               | 5 grand I'd agree with you.
               | 
               | However regurgitating a platitude that everyone,
               | including myself, learned when we tried getting our first
               | business off the ground doesn't add much value here.
               | 
               | Had this been a Database with 10million rows it would
               | have cost them 50k, and this is incredibly basic
               | programming knowledge.
               | 
               | Basic proficiency is a far cry from worrying about best
               | technical talent and not a particularly egregious ask.
        
             | Nextgrid wrote:
             | I have no problems with a developer doing a 5k oopsie with
             | things like card processing or an area that has a
             | legitimate potential for direct monetary losses (such as
             | payment processing where a bug could allow customers to
             | order goods without actually paying).
             | 
             | I have a problem with whoever looked at <insert your
             | favorite on-prem RDBMS here> and said "nah, let's go with a
             | cloud-based solution that charges per-query and gives us an
             | essentially infinite financial liability".
        
               | Buttons840 wrote:
               | It's not so clear cut. What's the cost of losing the
               | entire on-prem database? Do you trust a company who hired
               | a developer who didn't know about indexes to hire a rock
               | solid DBA? And how much does that DBA cost?
        
               | Nextgrid wrote:
               | > What's the cost of losing the entire on-prem database?
               | 
               | Backing up an on-prem DB doesn't require specialist DBA
               | knowledge. Basic UNIX skills are enough. Not to mention,
               | since you're not in the cloud, bandwidth or efficiency is
               | not a concern - feel free to rsync your entire DB off to
               | a backup server every 5 minutes.
               | 
               | > to hire a rock solid DBA? And how much does that DBA
               | cost?
               | 
               | They didn't have a DBA here either, and this "cloud"
               | didn't save them. But at least with an on-prem Postgres
               | the worst they'd have is significantly reduced
               | performance*, where as here they had a 5k bill.
               | 
               | *actually the price/performance ratio for bare-metal
               | servers is so good that a 100$/month server would
               | probably take their unindexed query and work totally fine
               | (as a side effect of the entire working dataset being in
               | the kernel's RAM-based IO cache).
        
               | fabian2k wrote:
               | Rsyncing the database won't work in many cases, this
               | doesn't ensure your backup is consistent. That is really,
               | really dangerous advice, especially as you might not
               | notice this if you test the process while the database is
               | idle.
               | 
               | For Postgres you either use the pg-dump command and
               | backup the dump or you setup WAL archiving and save base
               | backups and the WAL files as they are created.
               | 
               | This isn't rocket science, but you really should read the
               | manual at least once before doing this. Just copying the
               | files is not the right way to backup a database (unless
               | you really know what you're doing and are ensuring
               | consistency in some other ways).
        
               | Nextgrid wrote:
               | I am not saying that rsync or cp is the right way to
               | backup a DB, I was just giving a very crude example. I
               | absolutely agree with the issues you're raising.
               | 
               | However, I'd still take recovering a DB that has been
               | backed up by rsync/cp over a DB that's not been backed up
               | at all. If you really can't be bothered to do it the
               | right way, you're still better off doing _something_ than
               | running with no backups at all.
        
               | arpa wrote:
               | HA/clustered/replicated DB setups are not rocket science.
               | Backups are not rocket science. Losing an on-prem
               | database irrevocably never happened for me in 20 years.
        
               | fabian2k wrote:
               | You can test backups, that doesn't require much
               | expertise, only effort.
        
           | heisenbit wrote:
           | Yes we really should not accept this. The ability to impose
           | limits on spending is key to control an enterprise. Whole
           | security certification guacamole is based on having
           | established controls. But where the bit hits the fan control
           | is absent.
        
         | scottlamb wrote:
         | Good for you. But I think you're being uncharitable by failing
         | to distinguish between "concept I didn't understand" and "thing
         | I forgot to consider until I saw the problem it caused". The
         | title also suggests the former, but I think the author is being
         | a bit humble by underplaying his existing knowledge. Likely he
         | actually did know what indexes are before; if you asked him to
         | detail how MySQL foreign keys work he might have even
         | remembered to say they add an implicit index. But it's super
         | easy to miss that you're depending on a side effect like that
         | until you see the slow query (or, in this case, high bill).
         | 
         | When you're programming, how many compiler errors do you see a
         | day? (For me, easily dozens, likely hundreds.) Do you think
         | each one indicates a serious gap in your knowledge?
         | 
         | Along these lines: imposter syndrome is a common problem in our
         | industry. One way it can manifest is junior engineers can
         | thinking they're bad programmers when they repeatedly see walls
         | of compiler errors. I think it'd help a lot to show them a
         | ballpark of how often senior engineers see the same thing. [1]
         | I know that when I'm actively writing new code (especially in
         | languages that deliberately produce errors at compile time
         | rather than runtime), I see dozens and dozens of errors during
         | a normal day. I don't think this is a sign I'm a bad
         | programmer. I think it just means I'm moving fast and trusting
         | the compiler to point out the problems it can find rather than
         | wasting time and headspace on finding them myself. I pay more
         | attention to potential errors that I know won't get caught
         | automatically and particularly to ones that can have serious
         | consequences.
         | 
         | I think the most important thing the author learned is that
         | failing to add an index can cost this much money before you
         | notice.
         | 
         | Ideally the author and/or the vendor will also brainstorm ways
         | to make these errors obvious _before_ the high bill. Load
         | testing with realistic data is one way (though people talk
         | about load testing a lot more than they actually do it).
         | Another would be watching for abrupt changes in the operations
         | the billing is based on.
         | 
         | [1] This is something I wish I'd done while at Google. They
         | have the raw data for this with their cloud-based work trees
         | (with FUSE) and cloud-based builds. I think the hardest part
         | would be to classify when someone is actively developing new
         | code, but it seems doable.
        
           | Grimm1 wrote:
           | No you've missed my point, the author seemingly didn't know
           | that ForeignKeys applied indexes by default in MySql. It's
           | not "Concept I didn't understand", clearly they're capable of
           | understanding because they did after they ran into the issue.
           | It's about not having had basic knowledge to begin with.
           | 
           | But he didn't see compiler errors, he caused monetary cost to
           | his employer.
           | 
           | When I deploy something that unintentionally causes a large
           | monetary bill to my employer, then yes I do believe that
           | indicates a gap in knowledge so I don't in anyway believe I'm
           | being uncharitable. Or and this would be worse, a lack of
           | caring. (Which is not what I think happened here though)
           | 
           | I won't respond to your imposter syndrome bit I don't really
           | think it's relevant to my point.
        
             | scottlamb wrote:
             | I think you didn't read through to this part of my comment:
             | 
             | > I think the most important thing the author learned is
             | that failing to add an index can cost this much money
             | before you notice.
             | 
             | > Ideally the author and/or the vendor will also brainstorm
             | ways to make these errors obvious _before_ the high bill.
             | Load testing with realistic data is one way (though people
             | talk about load testing a lot more than they actually do
             | it). Another would be watching for abrupt changes in the
             | operations the billing is based on.
        
               | Grimm1 wrote:
               | No I did, but since I disagree with your earlier point
               | about how much existing knowledge they have it kind of by
               | default means I disagree with what they took away from
               | this incident.
               | 
               | It's also highly speculative so like I'm not going to go
               | back and forth on it.
               | 
               | Needing a vendor to hand hold your likely highly paid dev
               | seems like a bad fix to me.
               | 
               | Also not having an index _isn 't an error_ it can be a
               | valid choice based on your situation and query load which
               | is why people should know the situations when they're
               | needed.
               | 
               | I think people should simply be better. A lot of people
               | don't like hearing that though so usually I keep it to my
               | private chats where people seem more willing to cop to
               | that fact.
               | 
               | I know we disagree, I know you're going to continue
               | disagreeing, I know I don't want to have the
               | conversation.
        
               | scottlamb wrote:
               | > I know we disagree, I know you're going to continue
               | disagreeing, I know I don't want to have the
               | conversation.
               | 
               | Please consider not chiming in on the next article like
               | this then. I think your attitude of (paraphrasing) "no
               | good programmer would have made the costly mistake you
               | shared, and articles about it aren't worthwhile" is super
               | harmful to our industry. It's the polar opposite of the
               | blameless postmortem approach I'm fond of.
        
               | Grimm1 wrote:
               | This one in particular is not worthwhile on the front
               | page of HN, that's my take. They're most definitely
               | useful for beginners, or maybe people just learning about
               | databases.
               | 
               | I'm not going to not post simply because you find it
               | disagreeable, there are plenty of people here who seem to
               | agree with me.
               | 
               | Blameless post mortems are great, for your team. I am not
               | his team mate, and I don't really feel a kinship with
               | every developer under the sun. And for what it's worth I
               | don't blame this developer for anything. If anything I
               | lament the institutions that failed them on the way to
               | this point in time. To me this is a symptom of systemic
               | rot.
        
             | Nextgrid wrote:
             | > When I deploy something that unintentionally causes a
             | large monetary bill to my employer, then yes I do believe
             | that indicates a gap in knowledge so I don't in anyway
             | believe I'm being uncharitable.
             | 
             | It depends, if you've been given a loaded footgun it's not
             | entirely your fault when it inevitably goes off.
             | 
             | Let's go back to your "compiler errors" scenario, and let's
             | say someone decided that the company should be using a
             | cloud-based compiler that happens to charge per error. I
             | wouldn't blame developers for falling into a trap that
             | challenges all known assumptions.
             | 
             | The problem is that there is a DB that charges insane
             | amounts of money per row processed with no upper limit
             | _and_ that someone actually thought it was a good idea to
             | use it.
        
               | Grimm1 wrote:
               | First its not _my_ "compiler errors" scenario it's the
               | person who initially replied to me. Sure whatever, I
               | don't think I ever insinuated I thought that was a good
               | idea, it runs in parallel with the issue I have.
        
               | tristor wrote:
               | I am somewhat shocked to find that an RDBMS is considered
               | a "loaded footgun" in 2021. Perhaps grandparent isn't the
               | most charitable in their interpretation, but I am in full
               | agreement. It continues to astound me how little about
               | the basics of databases most developers know, and how
               | strongly resistant they are to trying to learn.
        
               | Nextgrid wrote:
               | An RDBMS that scales infinitely while charging you per-
               | row goes against the usual assumptions learned in the
               | past decades, so I'd say yes that's a loaded footgun.
        
         | AnotherGoodName wrote:
         | You'd be surprised and frustrated. If you ever see someone say
         | "We hired Oracle consultants and they are miracle workers" or
         | "NoSQL is sooo much faster than SQL" you can be pretty sure
         | they missed databases 101 and the requirement to add indexes.
        
         | pjscott wrote:
         | One of the best database habits I've ever developed is to run
         | EXPLAIN on every query that I expect to run repeatedly, then
         | sanity-check the output. It's very little effort, and has
         | prevented _so_ much hassle.
        
         | yashap wrote:
         | Seriously. Like, every junior dev has to learn DB indexing
         | basics sometime, and apparently the other of this blog post
         | just did. But really can't understand why this article is
         | getting voted to the top of HN.
        
           | azeirah wrote:
           | What I gained from the article wasn't that the dev was
           | unaware of indices, it's that he didn't realise indices were
           | missing due to how planetscale's database disallows foreign
           | keys.
           | 
           | I never worked with a database that doesn't have foreign keys
           | and it's not unthinkable to forget when you do for the first
           | time, that foreign keys were what created indexes for you
           | automatically.
           | 
           | A little bit of planning could have prevented that though :/
        
         | watt wrote:
         | You are absolutely missing the point. The point is not about
         | indexes or full table scans, but it's a about cloud providers
         | who will charge you for every row "inspected" and how a full
         | table scan might cost you $0.15 and it would add up. It's not
         | about slow performance which you can diagnose and fix, it's
         | about getting an unexpected $5k bill, which you can't fix.
         | 
         | And in the end, if the cloud provider wants to charge you for
         | rows "inspected", this can't be buried in small print. That's
         | unacceptable!
         | 
         | The billing must come with up-front, red capital letters
         | warning, and must come with alerts when your bill is
         | unexpectedly little high (higher than expected, not just 10x or
         | 100x higher). It must automatically shut down the process,
         | requiring the customer confirm they want proceed, that you
         | actually want to spend all that money. And it must be on the
         | cloud provider to detect billing anomalies and fully own them
         | in case it goes the wrong way. This is the cloud "bill of
         | rights" we need.
        
         | derekdahmer wrote:
         | I've been using relational databases for web apps for my entire
         | career and probably would have made this same mistake if using
         | PlanetScale for the first time.
         | 
         | The author had two misunderstandings:
         | 
         | 1) An index isn't created automatically
         | 
         | 2) You get billed for the number of rows scanned, not the
         | number of rows returned
         | 
         | Even if I noticed #1, I probably wouldn't have guessed at #2
         | for the same reason as the author.
        
         | cerved wrote:
         | they were using some kind of foreign keyless MySQLish whatever
         | thing
        
       | AnotherGoodName wrote:
       | I've seen things you people wouldn't believe. Millions burnt on
       | consultants and licensing Oracle. I watched C series startups
       | throwing it all away in a move to NoSQL. All those Amazon RDS
       | fees will be lost in time.
        
         | ushakov wrote:
         | > I watched C series startups throwing it all away in a move to
         | NoSQL
         | 
         | have you forgot that MongoDB is web scale?
         | 
         | https://www.youtube.com/watch?v=b2F-DItXtZs
        
         | vincentpants wrote:
         | Like tears in the rain
        
           | phoenixdblack wrote:
           | These two might just be the best comments i have ever seen on
           | HN
        
             | redisman wrote:
             | Time to die
        
           | xiphias2 wrote:
           | That last metaphore was added by the actor himself...the
           | director asked him to have some human showing in him, and now
           | it's history.
        
         | [deleted]
        
         | racl101 wrote:
         | I knew the cadence of this sentence sounded familiar.
         | 
         | Nice!
        
         | bombcar wrote:
         | I felt a great disturbance in the Billing, as if millions of
         | rows suddenly cried out in read and were suddenly repeated. I
         | fear something terrible has happened.
        
       | samlambert wrote:
       | This is definitely a lesson in the importance of indexes in
       | general. We are well aware of the potential pitfalls with our
       | current pricing. I'm happy to say we are nearly done modeling
       | different metering rates for the product which would mean
       | significantly lower bills for our users and avoid issues like
       | this.
       | 
       | It's core to our mission that our product's pricing is accessible
       | and friendly to small teams. Part of being in beta was us wanting
       | to figure out the best pricing based on usage patterns. That work
       | is nearly done. As the post mentions we've credited back the
       | amount.
        
         | Nextgrid wrote:
         | Would it be possible for you to add a spending cap? The user
         | should be able to tell your system "here's how much I want to
         | spend max" and if they exceed that they start getting errors or
         | reduced performance.
        
           | samlambert wrote:
           | Yes absolutely. We have daily and monthly budgets on the
           | roadmap to make sure nobody gets a surprise bill in the
           | future. This and more tools to make sure you are running your
           | database optimally.
        
         | andybak wrote:
         | Glad you came here to say this because my takeaway was "What a
         | terrifying pricing model. That would keep me up at night"
        
         | anglinb wrote:
         | Thanks Sam! As mentioned in the post, the PlantScale team was
         | quick to credit our account for the overages and help us figure
         | out what was going on. I'm personally super bullish on
         | PlanetScale!
         | 
         | With any new product there will be tradeoffs and rough edges
         | but the positives, like easy migrations and database branches
         | have definitely outweighed any difficulties.
        
           | brasetvik wrote:
           | Kudos for being open about your mistakes.
           | 
           | Could you share a little bit about what your thought process
           | was in general when picking a database technology?
           | 
           | You call out "easy migrations and database branches"
           | outweighing other quirks, so some pros and cons weighing
           | must've happened :)
           | 
           | Is it easy, for example, to test things in your dev
           | environment with realistic amounts of data, and to get an
           | understanding of how the queries will execute, etc? These
           | seem somewhat basic, and would've probably caught this (also
           | kinda basic, sorry :) problem early. (As in discovering "why
           | is this query that should be a few ms with an index lookup
           | taking so long?" early on)
        
             | anglinb wrote:
             | Thanks! The decision making process was pretty
             | unsophisticated tbh. Basically I spent the last couple
             | years working at GitHub as a security engineer and had been
             | pretty comfortable with MySQL so wanted to stick with that.
             | I had heard from our database team how annoying migrations
             | were and I had previously locked a postgres database
             | multiple times in production trying to deploy a migration,
             | so MySQL + safe migrations + some of the best engineers
             | I've worked with pouring all their time into PlanetScale,
             | made a ton of sense. So basically a combination of proven
             | underlying tech + believing in the team.
             | 
             | The migration workflow is really cool, basically when we
             | create a PR we branch our production database and apply any
             | migrations that are included in that PR and then that
             | branch is used in our per-PR review environment. (Just
             | Heroku Review Apps), then when we merge the PR, we also
             | merge the deploy request in PlanetScale. Database branching
             | is a super powerful concept once you've leaned into it.
             | 
             | We don't really do any sort of load testing in a dev
             | environment. We have one customer who is also a co-founder
             | of our company so we just deploy whatever changes we're
             | unsure about (after automated testing) to his application
             | and see what happens. If anything looks off in Grafana
             | we'll make take a look but it's usually "good enough" or
             | "totally broken", very rarely do we take time to make
             | something 25% better if it already works. The time to fix
             | vs speed of shipping features tradeoff doesn't make sense
             | for us.
             | 
             | In this specific case, the query was taking place in a
             | background job so 10ms - 500ms didn't really matter to us
             | so we didn't really measure the timing, if we had we may
             | have noticed it was slow but kinda a testament to
             | PlanetScale that we didn't even notice ;)
        
               | brasetvik wrote:
               | Thanks for the background :)
               | 
               | Lock behaviour of migrations are indeed very important to
               | be aware of.
               | 
               | Braintree (which are heavy Postgres users) have a pretty
               | good post on that - https://medium.com/paypal-
               | tech/postgresql-at-scale-database-...
        
           | samlambert wrote:
           | We are so glad to have you as customers and I can't wait to
           | partner as you grow to mega scale.
        
       | XCSme wrote:
       | I was actually considering PlanetScale, but them saying "Every
       | time a query retrieves a row from the database, it is counted as
       | a row read." when it's actually all the scanned rows, sounds
       | intentionally confusing. "Retrieving" sounds like it should only
       | be counted rows returned by a query.
        
       | dreyfan wrote:
       | Don't use DB providers that charge for rows/data scanned. Use
       | Amazon RDS or Google Cloud SQL or just install it yourself on a
       | VM. Pay for CPU, memory, and storage instead.
        
         | ldoughty wrote:
         | Rows returned model works really well for certain data loads
         | (where all data customers use is customer-keyed)....
         | 
         | This model also scales DOWN really well .. while still
         | providing good scalable availability...
         | 
         | That said, I DO agree with the sentiment of paying for a set
         | performance level (clu, memory, storage), to provide
         | predictable pricing.. obviously these guys were bit by the
         | scaling capability.
         | 
         | I do a lot of pet projects, and I find DynamoDB works really
         | well because my pet projects cost $0 most months... And I don't
         | have to worry about servers, maintenance, or what not... I'm
         | happy to do that at work, but I don't want that for my friends
         | & fun projects... And I've not seen a decent DB managed RDS for
         | <$5/month
        
           | boulos wrote:
           | Disclosure: I used to work on Google Cloud.
           | 
           | This is why BigQuery offers both models and lets you control
           | the caps [1].
           | 
           | Buying fixed compute is effectively buying a throughput cap.
           | Hard Quotas provide a similar function, but aren't a useful
           | budgeting tool if you can't set them yourself.
           | 
           | "Serverless" without limits is basically "infinite
           | throughput, infinite budget" (though App Engine had quotas
           | since day 1 and then budgets once charging was added). The
           | _default_ quotas give you some of that budget  / throughput
           | capping, but again if you can't lower them they might not
           | help you.
           | 
           | Either way, BQ won't drop _ingestion_ or storage though
           | because almost nobody wants their data deleted. As a
           | provider, implementing _strict_ budgets is impossible without
           | having a fairly complex policy  "if over $X/second stop all
           | activity, oh except let me still do admin work, like adding
           | indexes? Over $Y/second delete everything". I think having
           | user adjustable quotas and throughput caps per "dimension"
           | makes more sense but it puts the burden on the user and _no_
           | provider offers good enough user control over quota.
           | 
           | tl;dr: true budgets are hard to do, but every provider should
           | strive to offer better quota/throughput controls.
           | 
           | [1] https://cloud.google.com/bigquery/pricing
           | 
           | [2] https://cloud.google.com/bigquery/docs/reservations-
           | workload...
        
             | [deleted]
        
           | [deleted]
        
         | mike_hock wrote:
         | Rent metal and run your own MySQL/Postgres/...
         | 
         | One insert every 3 seconds. Could run that off a 10 year old
         | laptop.
        
           | marcinzm wrote:
           | The time spent setting it up and managing it and then having
           | to deal with backups/environment clones/access
           | control/scaling limitations/etc. outweighs the savings for
           | almost any company paying US wages. Especially since you'd
           | need metal for everything and not just the db due to network
           | latency.
        
             | Nextgrid wrote:
             | Every cloud-hosted startup I've consulted for had a full-
             | time devops guy wrangling Terraform and YAML files. The
             | cloud requires an equivalent time investment.
        
               | marcinzm wrote:
               | Bare metal requires the equivalent of all of that devops
               | stuff and then more. That is if you actually want parity
               | and not just a half assed version that hurts developer
               | productivity and causes technical debt.
        
             | rytcio wrote:
             | I think you're over estimating how complicated that stuff
             | is...
        
               | marcinzm wrote:
               | It's very easy to do it in a half-assed way and much
               | harder to do it at scale in a production environment with
               | many developers without hurting developer productivity at
               | all.
        
           | dreyfan wrote:
           | Sure, but we're addressing people who are so far on the other
           | end of the spectrum they're using a "serverless" database
           | where they pay for the number of rows scanned per query. I
           | think a managed VM is a better middle-ground for their
           | capability level while still delivering massive cost-savings.
           | 
           | Amazon RDS lowest-tier runs about $13/mo for 10GB storage, 2
           | vCPUs and 1GB memory with automated backups and push-button
           | restoring. And that would have likely met all of their needs
           | with capacity to spare.
        
       ___________________________________________________________________
       (page generated 2021-11-06 23:00 UTC)