[HN Gopher] Spending $5k to learn how database indexes work
___________________________________________________________________
Spending $5k to learn how database indexes work
Author : anglinb
Score : 119 points
Date : 2021-11-06 18:25 UTC (4 hours ago)
(HTM) web link (briananglin.me)
(TXT) w3m dump (briananglin.me)
| sushsjsuauahab wrote:
| Ssh into an ec2 instance, install mysql, and you'll never pay
| more than $7.50 a month!
| NicoJuicy wrote:
| 0,15 $ per query... The world has gone insane.
| cperciva wrote:
| It's not quite so crazy when you phrase it as $0.15 _for
| reading a million items from the database_.
| Spooky23 wrote:
| 25 years ago i built and ran a biggish database system that
| supported a reservation system.
|
| Even given the limitations of the time (RDBMS cost, 9GB
| disks, Sun kit, etc), our cost of good sold for that type of
| workload was exponentially less. (At scale) Today, I could
| probably run that company off my MacBook Pro and have room to
| spare.
|
| That said, the rationale for choosing this technology is
| cute: "After seeing a ton of the best GitHub engineers end up
| at PlanetScale and seeing the process GitHub went through to
| issue simple migrations, we chose to use their service."
|
| If you use the same methodology to choose a database that the
| public uses to choose between Bud Light, Miller Lite, and
| Coors Light, expect a suboptimal outcome.
| Nextgrid wrote:
| I have seen so much over-engineered startup shit costing
| thousands in AWS fees that could run faster on my laptop.
| williamdclt wrote:
| No that's still crazy. Scanning a million items isn't a big
| workload.
| cperciva wrote:
| I'm not saying that it's not crazy. Just that it's _less_
| crazy.
| friedman23 wrote:
| I agree, it might be distributed in some way which is
| driving up the cost?
| Nextgrid wrote:
| The greed is distributed across the cloud provider's
| C-suite for sure. To be fair to them they refunded the
| bill afterwards according to the article but IMO we
| should not be accepting this kind of pricing models as
| normal.
|
| A million items isn't a big deal, distributed or not. If
| anything, if your distributed architecture makes reading
| a million items _more_ costly than a single machine doing
| it then it 's time to go back to the drawing board.
| williamdclt wrote:
| They mention that PlanetScale uses Vitess, so indeed it
| might be distributed
| akudha wrote:
| One of my employers was using BigQuery. I was so scared that
| I might accidentally run queries and get a big bill, even
| though our tables weren't that big.
|
| It is funny to look back, but getting huge bills without even
| realizing that we're doing something wrong is very real
| possibility. Cloud vendors happily make their pricing opaque
| as it benefits them.
|
| I'd avoid even the best product in the industry, if their
| pricing is opaque. Or if there is a "Contact Us" button when
| there needn't be.
| XCSme wrote:
| Still crazy, charging $0.15 for probably less than a few ms
| of computing power.
| cperciva wrote:
| If those million items need to be read from different
| servers -- as they might well, in a distributed database --
| it's definitely not just a few ms of computing power.
|
| For reference, reading a million items of up to 1 kB each
| costs $0.125 with on-demand dynamodb.
| NicoJuicy wrote:
| 20 of those queries give me a VPS at Hetzner for a month.
|
| Or 1 query = 3 gb / snapshot storage for a month.
| smoldesu wrote:
| Huh, learning about this "Superwall" product constitutes as my
| horror-story-of-the-day. It's paywalling as a service, just what
| the industry needed. Thankfully it appears to be quarantined to
| iOS right now, but God does it feel like we're headed right back
| into Stallman's predictions about how SAASS will ruin the
| landscape of commercial technology.
| jakemor wrote:
| Two sides to every coin. Ethical developers need ways to make
| low prices work, which is impossible without good testing
| suites.
| smoldesu wrote:
| These days, if you're developing something for profit it's
| pretty hard to see your software as ethical. You're either
| trying to empower your user or trying to monetize them, the
| two will always fight one another and snuff the other out
| unless you, the developer, take a stand.I fully understand
| the market for proprietary software, but trying to define
| some ethical middle ground is just blatant lip service,
| nothing else.
| Nextgrid wrote:
| It's better than ads though.
| andybak wrote:
| It's better than targetted ads built on intrusive tracking
| that also enables several other abusive business practices.
| It's not better than "good old fashioned ads". Heck I'd even
| be ok with targeted ads if it could be done without the rest
| of the "destroy civil society" that seems to come along for
| the ride.
| Nextgrid wrote:
| Even if you solve the privacy problem, there's still a
| problem with advertising which is that it's inherently at
| odds with the user's interests.
|
| An advertising-funded product will always prioritize
| engagement - they want you to "engage" with the product
| even if it means degrading the experience intentionally
| such as making a process manual or take more steps than
| necessary (so that you are exposed to more ads). The
| "destroy civil society" problem you mention is a direct
| consequence of the pursuit of engagement.
|
| In contrast, with a paid product, the company's interests
| are directly aligned with yours and they have no incentive
| to intentionally degrade the experience or get in your way
| any more than necessary. They don't care about how much you
| "engage" with the product as long as the bill gets paid (if
| anything, the less you engage the better as it uses less
| server resources).
| smoldesu wrote:
| Paid products can still do all the same datamining that
| advertisers do, and there are markets for buying/selling
| that info (eg. Palantir). The truth is that all forms of
| monetization are inherently at-odds with the user's
| interests. Paying for an app doesn't magically make this
| friction go away, and it _certainly_ doesn 't reduce the
| incentive for developers to abuse your trust.
| Nextgrid wrote:
| > there are markets for buying/selling that info (eg.
| Palantir)
|
| A major reason for these markets is that the information
| can be used for advertising or marketing targeting. In a
| world where the majority of products/services are paid
| and the amount of advertising is significantly reduced
| there will be much less demand for this information,
| leading to lower prices and even lower payoff from
| selling this information, not to mention potential legal
| risks (GDPR, CCPA, etc).
| smoldesu wrote:
| Palantir's biggest customers are government entities and
| market researchers, not advertisers. In a world where
| advertising has been significantly reduced, their
| products would become _more valuable_ , since regular
| analytics would become inaccessible. Compared to the data
| these corporate aggregators collect, ad fingerprinting
| seems trivial.
|
| The biggest crux of this, though, is the fact that both
| of these monetization schemes are destructive. Paying for
| software simply doesn't make sense in most cases, as
| there aren't that many people who are developing novel
| solutions these days. That's the exact reason why
| advertising is so popular: the market _knows_ that
| relying on your conscious contribution is unsustainable,
| so why should you believe otherwise?
| dhosek wrote:
| You're assuming it's either/or. When I worked at eHarmony
| we had a monitor that would scroll user feedback
| submitted online near the developer area. One of the most
| common complaints was about ads being shown to paid
| users.
|
| I asked a PM about this and was told that the money they
| made was too much to turn down.
|
| Capitalism. It's why we can't have nice things.
| Nextgrid wrote:
| I don't disagree with that, but that's still not a reason
| to turn down paid options. With ads, you are guaranteed
| to get a bad experience. With paid product, there is
| _potential_ for a bad experience, but at least it 's not
| the guaranteed default.
|
| Regulation around ads is the only definitive solution,
| but in the meantime if there's a business model that's
| non-toxic I'm not going to hate on it even if technically
| someone could still misuse it.
| radu_floricica wrote:
| I'm just leaving this here:
|
| https://www.hetzner.com/dedicated-rootserver/ax161/configura...
|
| Draw that nice red slide all the way to the right. No, it's not
| storage. Yeah, it's actually affordable. Yeah, that was a sexual
| sound you just made.
|
| You do have to be prepared to know some basic sysadmin, or pay
| somebody to do it for you. My newest server has about 60 cores
| and half a tera of ram. Surprisingly, it's not uber sharp - I
| went with high core count so individual queries actually got
| slower for about 20%. But that load... you can't even tell if the
| cpu load gauge is working. I can't wait to fill it up :D Maybe
| this black friday season I'll get it to 10%.
| eknkc wrote:
| We are currently on AWS, and we have several dedicated servers
| on LeaseWeb that we offload computational work on. These are
| cheap beasts.
|
| Still, I'd not run my RDBMS on an unmanaged, non replicated
| dedicated server and I'd not bother setting up multiple servers
| with failover, automated backups etc and keep updating them.
| Fuck that, I'll pay whatever AWS says RDS costs.
| amenod wrote:
| ...and Hetzner just started offering their services in the US a
| few days ago. (EDIT: not affiliated)
|
| If you do something stupid with your code at least you won't go
| bankrupt, only your service will be slower.
| Nextgrid wrote:
| With the performance of these servers you have a huge margin
| for stupidity before you even notice any slowdowns.
| [deleted]
| throwawayapples wrote:
| I was just comparing the pricing to OVH's cloud VPS's
| (https://us.ovhcloud.com/) and, accounting for the currency
| conversion and OVH's huge amount of free unmetered bandwidth
| (vs 20TB for H), it actually looks like OVH is even cheaper.
|
| How can DO and Vultr even compete? Probably on the basis of
| their nicer dashboards and easier sign-up flow (especially
| OVH's)
| radu_floricica wrote:
| Tried OVH once. Couldn't even finish the account setup. Was
| rather dissapointed because I wanted not to depend on just
| one provider. Might give them a chance again.
| kingcharles wrote:
| I just recently signed up for some stupidly low-priced
| VPS on OVH and encountered no problems at all with the
| sign-up. The only issue I had was when I went to increase
| the RAM - it said the price was $1/month, but when I
| bought it I was charged $5 and it took support a week to
| get back to me to tell me I didn't read the small print
| properly and that it said they would have to upgrade the
| Windows license which was an extra fee.
|
| tl;dr: watch the small print and their support is slow as
| fuck. otherwise, incredible value for money.
| joconde wrote:
| I rented a VPS from the French website (their native
| language), and it was confusing. I remember looping
| through the same 2 pages 10 times before actually finding
| the instance's access instructions.
|
| Of course, you don't have to care about the console once
| you have SSH access to the server. It seems to be pretty
| good service for the money.
| NicoJuicy wrote:
| Hetzner is German, not french.
|
| Ovh is french.
| johndough wrote:
| I think Hetzner removed their traffic limit for dedicated
| servers https://www.hetzner.com/news/traffic-limit/
| pronoiac wrote:
| Oooooooooh!
| jaden wrote:
| Just to clarify, Hetzner now has Cloud servers in the US.
| Dedicated servers are still only available in Europe.
| adventured wrote:
| I appreciate that info. I tried Hetzner a decade ago from
| the US just to see what kind of latency I might expect.
| I've been waiting for them to finally get US located
| services to give them a serious shot (vs Digital Ocean &
| Co.).
| flippant wrote:
| What are you using this for?
| radu_floricica wrote:
| www.couriermanager.com - SaaS for courier companies,
| basically. The new server is the common pool instance - I
| have others for dedicated clients.
| crorella wrote:
| It amazes me that things so basic and fundamental like
| understanding the way indexes work are often overlooked or not
| leveraged
| ihusasmiiu wrote:
| Let me understand please. These people are selling a commercial
| product and their team has no idea whatsoever of what an index
| is? And this is news?
| fabian2k wrote:
| It sounds a bit more like they were confused by the automatic
| index creation for foreign keys they expected to be there. So
| they probably knew they'd need an index, just assumed this was
| implicit in the foreign key.
| rabuse wrote:
| Gotta love cloud pricing. This is why I colocate.
| arpa wrote:
| Well this was an embarrasing read.
| max_hammer wrote:
| I hate this pricing model
|
| My company in boarded `fivetran` to source data from different
| tools.
|
| Budget got exhausted in sourcing `iterable` data
| foreigner wrote:
| The real answer here is cost limiting. I don't want my cloud
| provider to keep working at the cost of an order of magnitude
| higher bill than I was expecting because of a bug in my code. I
| want to be able to set a billing limit and have them degrade or
| stop there service if I exceed the limit.
|
| AFAIK AWS doesn't have that. They do have the ability to send me
| alerts if my bill is unexpectedly high, but they still keep
| working until I go bankrupt. It's possible to use those alerts to
| implement your own "broke man's switch", but they don't have it
| built in.
| babayega2 wrote:
| That's why we use DigitalOcean a lot in Africa. You know
| upfront how much you will spend.
| racl101 wrote:
| Yeah, It cost me two bad months of high RDS fees to learn about
| indexes. $900 in total.
|
| Then a bro showed me one night about the magic of indexes. 5
| minutes worth of advice saved me hundreds of dollars per month in
| the future and all he asked in return was for some beer and
| chicken wings.
|
| Now that is a good bro.
|
| I'm happy to say I've paid it forward myself.
| bigbillheck wrote:
| I'm not a DB expert, but "750k users in a month." doesn't sound
| like a quantity that you'd need to use some kind of fancy special
| tooling for.
| dahdum wrote:
| It's far from big data even if they grow 10x in the next year,
| but if you are unfamiliar with database migrations and branches
| I can see the appeal of the product.
| mkl95 wrote:
| In a world where Juicero raised $120m, selling overengineered
| solutions for simple problems is not necessarily a bad idea.
| Nextgrid wrote:
| The problem isn't _selling_ (there 's plenty of dubious or
| badly-priced products being around), it's that someone
| thought _buying_ said product was a good idea.
| xupybd wrote:
| No foreign keys to make migrations easier. That doesn't sound
| like the best trade off to me.
|
| Having the database constrained as much as possible makes
| maintenance so much easier. Many bugs don't escape into
| production as they're caught by the database constraints. Those
| that do get out do less damage to the data.
|
| I know scale comes with trade offs but that seems extreme to me.
| cerved wrote:
| oh no, foreign keys are useless because it's the apps
| responsibility to delete
|
| /s
| derekperkins wrote:
| I'm a Vitess maintainer and I feel the same way. I don't plan
| to use any of the Online DDL because you'll have to pry my
| foreign keys out of my cold, dead hands. I understand the
| reasoning and limitations, but like you, the trade-off isn't
| worth it to me.
| anglinb wrote:
| I'm so curious, so you maintain Vitess but don't use it
| personally?
| derekperkins wrote:
| I do use it in production and have for years, just not the
| online schema changes. It's fantastic and FKs are supported
| in a single shard, which we use heavily.
| leetrout wrote:
| What are your thoughts on Citus and Cockroach where foreign
| keys are still supported when creating partitioned clusters?
|
| Is due to fundamental differences in postgres vs innodb?
| derekperkins wrote:
| Vitess still supports foreign keys (single shard), using
| MySQL, just not with the Online DDL functionality.
|
| I think Cockroach tries to be a little too magical, which
| is great for starting up a cluster, but I think you can
| architect for much better performance with Vitess and
| owning your sharding model. I'm also very happy to use
| InnoDB, one of the most battle tested db engines to ever
| exist, while Cockroach is currently rewriting theirs from
| scratch. At the distributed level, I don't know of any
| massive scale adopters of Cockroach yet, though I'm not
| 100% looped in, so forgive me if I'm ignorant of them. On
| the other hand, Vitess has seen adopters like Slack,
| GitHub, Square, HubSpot, YouTube, with many more in various
| stages of adoption.
|
| I feel like Citus might be trying to be too many things and
| so hasn't gotten the traction that Vitess has. Vitess has
| nailed OLTP at scale, while Citus is trying to also do OLAP
| and be a single source. That's the holy grail, but I'm not
| sure that any technology is close to handling both of those
| well yet.
| dhosek wrote:
| It's possible during a migration to drop a constraint, make the
| update and restore the constraint. If a schema migration tool
| doesn't automate this or at least permit it, it's not a good
| schema migration tool.
| fabian2k wrote:
| That pricing model seems rather inherently tricky to me, and also
| quite expensive. At $1.50 per 10 million rows read this can get
| very expensive the moment you do a full table scan on any non-
| trivial table. And while this example is a trivial case where you
| only need minimal database knowledge to ensure that no full table
| scan is necessary, many real world cases are much more complex.
|
| It also seems very expensive compared to just renting DBs by
| instance, if you put any real load onto this. I can see this
| being attractive if your use case only queries single rows by
| key, but it's essentially a big minefield for any query more
| complex than that. A database with a rather opaque query planner
| doesn't seem like a good fit for this kind of pricing.
| nrmitchi wrote:
| As the author touches on, the main problem here isn't learning
| about indexes. It's about "infinity scaling" working _too well_
| for people who do not understand the consequences.
|
| In no sane version of the world should "not adding a db index"
| lead to getting a 50x bill at the end of the month without
| knowing.
|
| I am a strong believer that services that are based on "scale
| infinitly" really need hard budget controls, and slower-scaling
| (unless explicitly overidden/allowed, of course).
|
| If I accidently push very non-performant code, I kind of expect
| my service to get less performant, quickly realize the problem,
| and fix it. I don't expect a service to seemingly-magically
| detect my poor code, increase my bill by a couple orders-of-
| magnitude, and only alert me hours (if not days) later.
| anglinb wrote:
| Haha yep, I was like wait I'm used to getting feedback from the
| system telling me I messed up and this I barley noticed.
| PlanetScale has Query Statistics that are really useful for
| spotting slow queries but don't expose the "rows read" so you
| can't really tie this view back to billing. I think they're
| aware of this though and might expose that information.
| revskill wrote:
| Thanks. At least i'll never use PlatnetScale. A good service
| should have config for me to alert/prevent these kinds of money
| wasting cases.
|
| Imagine how many wasted $$$ they earned based on common knowledge
| that they should prevent for customers instead.
| Grimm1 wrote:
| I feel like indexes are a pretty fundamental type of DB
| knowledge. In fact I'd say it's table stakes knowledge you should
| have if you're working with them. Further more, knowing that
| ForeignKeys typically apply an index to that column is also in my
| head basic knowledge. I'm sorry you got burnt, and congrats on
| learning a lesson, but you could have gotten the same knowledge
| by ever googling MySql ForeignKeys and saved yourself a headache.
|
| In fact it's like a big bullet point near the top of the docs
| page.
|
| "MySQL requires indexes on foreign keys and referenced keys so
| that foreign key checks can be fast and not require a table scan.
| In the referencing table, there must be an index where the
| foreign key columns are listed as the first columns in the same
| order. Such an index is created on the referencing table
| automatically if it does not exist. This index might be silently
| dropped later if you create another index that can be used to
| enforce the foreign key constraint. index_name, if given, is used
| as described previously."
|
| I'm not entirely sure why buzz around "developer learns basic
| knowledge" has this on the front page.
| Nextgrid wrote:
| > I'm not entirely sure why buzz around "developer learns basic
| knowledge" has this on the front page.
|
| The problem is that in the old days, not knowing about indexes
| left you with an underperforming system or downtime. But in The
| Cloud(tm) it leaves you with an unreasonably huge bill and that
| somehow as an industry we're accepting this as normal.
| aspenmayer wrote:
| Using money to solve business problems is good business
| sense, but only if that's the best way to spend that money. I
| agree with you that the status quo is normal, but
| nonsensical.
| Grimm1 wrote:
| Which really is a head scratcher. You'd figure especially as
| a startup seeing a 5k oopsie isn't really as acceptable.
| Mistakes do happen and I don't mean any shade to this
| particular person (they'll never make this mistake again) but
| as an industry the aggregate consequence of this is you have
| a lot of waste and stupid choices that then have to be
| cleaned up when more knowledgeable (read highly paid) people
| are introduced later on.
|
| They'll have to clean up the mess which causes real business
| consequences that, and I've personally seen this, will
| directly impact bottom line and have no quick or easy
| solution to wiggle out of.
|
| Maybe it's acceptable for products like this because the
| balance between good engineering and company health probably
| aren't as cut and clear but stuff like this always makes me
| sad because it's such low hanging fruit, it doesn't require
| any real effort, just basic curiosity around your job.
| xupybd wrote:
| The best technical people aren't always the best to start a
| business. The goal is to make money not have perfect code.
| Nextgrid wrote:
| If someone comes to me and tries to sell me a service
| that can leave me with an infinite bill I'd look at them
| funny and walk away. But that's just me and maybe I just
| don't get it and I'm not "startuping" right.
| Grimm1 wrote:
| Cool, and if this was a case of bike shedding over
| something that hadn't just cost that early stage startup
| 5 grand I'd agree with you.
|
| However regurgitating a platitude that everyone,
| including myself, learned when we tried getting our first
| business off the ground doesn't add much value here.
|
| Had this been a Database with 10million rows it would
| have cost them 50k, and this is incredibly basic
| programming knowledge.
|
| Basic proficiency is a far cry from worrying about best
| technical talent and not a particularly egregious ask.
| Nextgrid wrote:
| I have no problems with a developer doing a 5k oopsie with
| things like card processing or an area that has a
| legitimate potential for direct monetary losses (such as
| payment processing where a bug could allow customers to
| order goods without actually paying).
|
| I have a problem with whoever looked at <insert your
| favorite on-prem RDBMS here> and said "nah, let's go with a
| cloud-based solution that charges per-query and gives us an
| essentially infinite financial liability".
| Buttons840 wrote:
| It's not so clear cut. What's the cost of losing the
| entire on-prem database? Do you trust a company who hired
| a developer who didn't know about indexes to hire a rock
| solid DBA? And how much does that DBA cost?
| Nextgrid wrote:
| > What's the cost of losing the entire on-prem database?
|
| Backing up an on-prem DB doesn't require specialist DBA
| knowledge. Basic UNIX skills are enough. Not to mention,
| since you're not in the cloud, bandwidth or efficiency is
| not a concern - feel free to rsync your entire DB off to
| a backup server every 5 minutes.
|
| > to hire a rock solid DBA? And how much does that DBA
| cost?
|
| They didn't have a DBA here either, and this "cloud"
| didn't save them. But at least with an on-prem Postgres
| the worst they'd have is significantly reduced
| performance*, where as here they had a 5k bill.
|
| *actually the price/performance ratio for bare-metal
| servers is so good that a 100$/month server would
| probably take their unindexed query and work totally fine
| (as a side effect of the entire working dataset being in
| the kernel's RAM-based IO cache).
| fabian2k wrote:
| Rsyncing the database won't work in many cases, this
| doesn't ensure your backup is consistent. That is really,
| really dangerous advice, especially as you might not
| notice this if you test the process while the database is
| idle.
|
| For Postgres you either use the pg-dump command and
| backup the dump or you setup WAL archiving and save base
| backups and the WAL files as they are created.
|
| This isn't rocket science, but you really should read the
| manual at least once before doing this. Just copying the
| files is not the right way to backup a database (unless
| you really know what you're doing and are ensuring
| consistency in some other ways).
| Nextgrid wrote:
| I am not saying that rsync or cp is the right way to
| backup a DB, I was just giving a very crude example. I
| absolutely agree with the issues you're raising.
|
| However, I'd still take recovering a DB that has been
| backed up by rsync/cp over a DB that's not been backed up
| at all. If you really can't be bothered to do it the
| right way, you're still better off doing _something_ than
| running with no backups at all.
| arpa wrote:
| HA/clustered/replicated DB setups are not rocket science.
| Backups are not rocket science. Losing an on-prem
| database irrevocably never happened for me in 20 years.
| fabian2k wrote:
| You can test backups, that doesn't require much
| expertise, only effort.
| heisenbit wrote:
| Yes we really should not accept this. The ability to impose
| limits on spending is key to control an enterprise. Whole
| security certification guacamole is based on having
| established controls. But where the bit hits the fan control
| is absent.
| scottlamb wrote:
| Good for you. But I think you're being uncharitable by failing
| to distinguish between "concept I didn't understand" and "thing
| I forgot to consider until I saw the problem it caused". The
| title also suggests the former, but I think the author is being
| a bit humble by underplaying his existing knowledge. Likely he
| actually did know what indexes are before; if you asked him to
| detail how MySQL foreign keys work he might have even
| remembered to say they add an implicit index. But it's super
| easy to miss that you're depending on a side effect like that
| until you see the slow query (or, in this case, high bill).
|
| When you're programming, how many compiler errors do you see a
| day? (For me, easily dozens, likely hundreds.) Do you think
| each one indicates a serious gap in your knowledge?
|
| Along these lines: imposter syndrome is a common problem in our
| industry. One way it can manifest is junior engineers can
| thinking they're bad programmers when they repeatedly see walls
| of compiler errors. I think it'd help a lot to show them a
| ballpark of how often senior engineers see the same thing. [1]
| I know that when I'm actively writing new code (especially in
| languages that deliberately produce errors at compile time
| rather than runtime), I see dozens and dozens of errors during
| a normal day. I don't think this is a sign I'm a bad
| programmer. I think it just means I'm moving fast and trusting
| the compiler to point out the problems it can find rather than
| wasting time and headspace on finding them myself. I pay more
| attention to potential errors that I know won't get caught
| automatically and particularly to ones that can have serious
| consequences.
|
| I think the most important thing the author learned is that
| failing to add an index can cost this much money before you
| notice.
|
| Ideally the author and/or the vendor will also brainstorm ways
| to make these errors obvious _before_ the high bill. Load
| testing with realistic data is one way (though people talk
| about load testing a lot more than they actually do it).
| Another would be watching for abrupt changes in the operations
| the billing is based on.
|
| [1] This is something I wish I'd done while at Google. They
| have the raw data for this with their cloud-based work trees
| (with FUSE) and cloud-based builds. I think the hardest part
| would be to classify when someone is actively developing new
| code, but it seems doable.
| Grimm1 wrote:
| No you've missed my point, the author seemingly didn't know
| that ForeignKeys applied indexes by default in MySql. It's
| not "Concept I didn't understand", clearly they're capable of
| understanding because they did after they ran into the issue.
| It's about not having had basic knowledge to begin with.
|
| But he didn't see compiler errors, he caused monetary cost to
| his employer.
|
| When I deploy something that unintentionally causes a large
| monetary bill to my employer, then yes I do believe that
| indicates a gap in knowledge so I don't in anyway believe I'm
| being uncharitable. Or and this would be worse, a lack of
| caring. (Which is not what I think happened here though)
|
| I won't respond to your imposter syndrome bit I don't really
| think it's relevant to my point.
| scottlamb wrote:
| I think you didn't read through to this part of my comment:
|
| > I think the most important thing the author learned is
| that failing to add an index can cost this much money
| before you notice.
|
| > Ideally the author and/or the vendor will also brainstorm
| ways to make these errors obvious _before_ the high bill.
| Load testing with realistic data is one way (though people
| talk about load testing a lot more than they actually do
| it). Another would be watching for abrupt changes in the
| operations the billing is based on.
| Grimm1 wrote:
| No I did, but since I disagree with your earlier point
| about how much existing knowledge they have it kind of by
| default means I disagree with what they took away from
| this incident.
|
| It's also highly speculative so like I'm not going to go
| back and forth on it.
|
| Needing a vendor to hand hold your likely highly paid dev
| seems like a bad fix to me.
|
| Also not having an index _isn 't an error_ it can be a
| valid choice based on your situation and query load which
| is why people should know the situations when they're
| needed.
|
| I think people should simply be better. A lot of people
| don't like hearing that though so usually I keep it to my
| private chats where people seem more willing to cop to
| that fact.
|
| I know we disagree, I know you're going to continue
| disagreeing, I know I don't want to have the
| conversation.
| scottlamb wrote:
| > I know we disagree, I know you're going to continue
| disagreeing, I know I don't want to have the
| conversation.
|
| Please consider not chiming in on the next article like
| this then. I think your attitude of (paraphrasing) "no
| good programmer would have made the costly mistake you
| shared, and articles about it aren't worthwhile" is super
| harmful to our industry. It's the polar opposite of the
| blameless postmortem approach I'm fond of.
| Grimm1 wrote:
| This one in particular is not worthwhile on the front
| page of HN, that's my take. They're most definitely
| useful for beginners, or maybe people just learning about
| databases.
|
| I'm not going to not post simply because you find it
| disagreeable, there are plenty of people here who seem to
| agree with me.
|
| Blameless post mortems are great, for your team. I am not
| his team mate, and I don't really feel a kinship with
| every developer under the sun. And for what it's worth I
| don't blame this developer for anything. If anything I
| lament the institutions that failed them on the way to
| this point in time. To me this is a symptom of systemic
| rot.
| Nextgrid wrote:
| > When I deploy something that unintentionally causes a
| large monetary bill to my employer, then yes I do believe
| that indicates a gap in knowledge so I don't in anyway
| believe I'm being uncharitable.
|
| It depends, if you've been given a loaded footgun it's not
| entirely your fault when it inevitably goes off.
|
| Let's go back to your "compiler errors" scenario, and let's
| say someone decided that the company should be using a
| cloud-based compiler that happens to charge per error. I
| wouldn't blame developers for falling into a trap that
| challenges all known assumptions.
|
| The problem is that there is a DB that charges insane
| amounts of money per row processed with no upper limit
| _and_ that someone actually thought it was a good idea to
| use it.
| Grimm1 wrote:
| First its not _my_ "compiler errors" scenario it's the
| person who initially replied to me. Sure whatever, I
| don't think I ever insinuated I thought that was a good
| idea, it runs in parallel with the issue I have.
| tristor wrote:
| I am somewhat shocked to find that an RDBMS is considered
| a "loaded footgun" in 2021. Perhaps grandparent isn't the
| most charitable in their interpretation, but I am in full
| agreement. It continues to astound me how little about
| the basics of databases most developers know, and how
| strongly resistant they are to trying to learn.
| Nextgrid wrote:
| An RDBMS that scales infinitely while charging you per-
| row goes against the usual assumptions learned in the
| past decades, so I'd say yes that's a loaded footgun.
| AnotherGoodName wrote:
| You'd be surprised and frustrated. If you ever see someone say
| "We hired Oracle consultants and they are miracle workers" or
| "NoSQL is sooo much faster than SQL" you can be pretty sure
| they missed databases 101 and the requirement to add indexes.
| pjscott wrote:
| One of the best database habits I've ever developed is to run
| EXPLAIN on every query that I expect to run repeatedly, then
| sanity-check the output. It's very little effort, and has
| prevented _so_ much hassle.
| yashap wrote:
| Seriously. Like, every junior dev has to learn DB indexing
| basics sometime, and apparently the other of this blog post
| just did. But really can't understand why this article is
| getting voted to the top of HN.
| azeirah wrote:
| What I gained from the article wasn't that the dev was
| unaware of indices, it's that he didn't realise indices were
| missing due to how planetscale's database disallows foreign
| keys.
|
| I never worked with a database that doesn't have foreign keys
| and it's not unthinkable to forget when you do for the first
| time, that foreign keys were what created indexes for you
| automatically.
|
| A little bit of planning could have prevented that though :/
| watt wrote:
| You are absolutely missing the point. The point is not about
| indexes or full table scans, but it's a about cloud providers
| who will charge you for every row "inspected" and how a full
| table scan might cost you $0.15 and it would add up. It's not
| about slow performance which you can diagnose and fix, it's
| about getting an unexpected $5k bill, which you can't fix.
|
| And in the end, if the cloud provider wants to charge you for
| rows "inspected", this can't be buried in small print. That's
| unacceptable!
|
| The billing must come with up-front, red capital letters
| warning, and must come with alerts when your bill is
| unexpectedly little high (higher than expected, not just 10x or
| 100x higher). It must automatically shut down the process,
| requiring the customer confirm they want proceed, that you
| actually want to spend all that money. And it must be on the
| cloud provider to detect billing anomalies and fully own them
| in case it goes the wrong way. This is the cloud "bill of
| rights" we need.
| derekdahmer wrote:
| I've been using relational databases for web apps for my entire
| career and probably would have made this same mistake if using
| PlanetScale for the first time.
|
| The author had two misunderstandings:
|
| 1) An index isn't created automatically
|
| 2) You get billed for the number of rows scanned, not the
| number of rows returned
|
| Even if I noticed #1, I probably wouldn't have guessed at #2
| for the same reason as the author.
| cerved wrote:
| they were using some kind of foreign keyless MySQLish whatever
| thing
| AnotherGoodName wrote:
| I've seen things you people wouldn't believe. Millions burnt on
| consultants and licensing Oracle. I watched C series startups
| throwing it all away in a move to NoSQL. All those Amazon RDS
| fees will be lost in time.
| ushakov wrote:
| > I watched C series startups throwing it all away in a move to
| NoSQL
|
| have you forgot that MongoDB is web scale?
|
| https://www.youtube.com/watch?v=b2F-DItXtZs
| vincentpants wrote:
| Like tears in the rain
| phoenixdblack wrote:
| These two might just be the best comments i have ever seen on
| HN
| redisman wrote:
| Time to die
| xiphias2 wrote:
| That last metaphore was added by the actor himself...the
| director asked him to have some human showing in him, and now
| it's history.
| [deleted]
| racl101 wrote:
| I knew the cadence of this sentence sounded familiar.
|
| Nice!
| bombcar wrote:
| I felt a great disturbance in the Billing, as if millions of
| rows suddenly cried out in read and were suddenly repeated. I
| fear something terrible has happened.
| samlambert wrote:
| This is definitely a lesson in the importance of indexes in
| general. We are well aware of the potential pitfalls with our
| current pricing. I'm happy to say we are nearly done modeling
| different metering rates for the product which would mean
| significantly lower bills for our users and avoid issues like
| this.
|
| It's core to our mission that our product's pricing is accessible
| and friendly to small teams. Part of being in beta was us wanting
| to figure out the best pricing based on usage patterns. That work
| is nearly done. As the post mentions we've credited back the
| amount.
| Nextgrid wrote:
| Would it be possible for you to add a spending cap? The user
| should be able to tell your system "here's how much I want to
| spend max" and if they exceed that they start getting errors or
| reduced performance.
| samlambert wrote:
| Yes absolutely. We have daily and monthly budgets on the
| roadmap to make sure nobody gets a surprise bill in the
| future. This and more tools to make sure you are running your
| database optimally.
| andybak wrote:
| Glad you came here to say this because my takeaway was "What a
| terrifying pricing model. That would keep me up at night"
| anglinb wrote:
| Thanks Sam! As mentioned in the post, the PlantScale team was
| quick to credit our account for the overages and help us figure
| out what was going on. I'm personally super bullish on
| PlanetScale!
|
| With any new product there will be tradeoffs and rough edges
| but the positives, like easy migrations and database branches
| have definitely outweighed any difficulties.
| brasetvik wrote:
| Kudos for being open about your mistakes.
|
| Could you share a little bit about what your thought process
| was in general when picking a database technology?
|
| You call out "easy migrations and database branches"
| outweighing other quirks, so some pros and cons weighing
| must've happened :)
|
| Is it easy, for example, to test things in your dev
| environment with realistic amounts of data, and to get an
| understanding of how the queries will execute, etc? These
| seem somewhat basic, and would've probably caught this (also
| kinda basic, sorry :) problem early. (As in discovering "why
| is this query that should be a few ms with an index lookup
| taking so long?" early on)
| anglinb wrote:
| Thanks! The decision making process was pretty
| unsophisticated tbh. Basically I spent the last couple
| years working at GitHub as a security engineer and had been
| pretty comfortable with MySQL so wanted to stick with that.
| I had heard from our database team how annoying migrations
| were and I had previously locked a postgres database
| multiple times in production trying to deploy a migration,
| so MySQL + safe migrations + some of the best engineers
| I've worked with pouring all their time into PlanetScale,
| made a ton of sense. So basically a combination of proven
| underlying tech + believing in the team.
|
| The migration workflow is really cool, basically when we
| create a PR we branch our production database and apply any
| migrations that are included in that PR and then that
| branch is used in our per-PR review environment. (Just
| Heroku Review Apps), then when we merge the PR, we also
| merge the deploy request in PlanetScale. Database branching
| is a super powerful concept once you've leaned into it.
|
| We don't really do any sort of load testing in a dev
| environment. We have one customer who is also a co-founder
| of our company so we just deploy whatever changes we're
| unsure about (after automated testing) to his application
| and see what happens. If anything looks off in Grafana
| we'll make take a look but it's usually "good enough" or
| "totally broken", very rarely do we take time to make
| something 25% better if it already works. The time to fix
| vs speed of shipping features tradeoff doesn't make sense
| for us.
|
| In this specific case, the query was taking place in a
| background job so 10ms - 500ms didn't really matter to us
| so we didn't really measure the timing, if we had we may
| have noticed it was slow but kinda a testament to
| PlanetScale that we didn't even notice ;)
| brasetvik wrote:
| Thanks for the background :)
|
| Lock behaviour of migrations are indeed very important to
| be aware of.
|
| Braintree (which are heavy Postgres users) have a pretty
| good post on that - https://medium.com/paypal-
| tech/postgresql-at-scale-database-...
| samlambert wrote:
| We are so glad to have you as customers and I can't wait to
| partner as you grow to mega scale.
| XCSme wrote:
| I was actually considering PlanetScale, but them saying "Every
| time a query retrieves a row from the database, it is counted as
| a row read." when it's actually all the scanned rows, sounds
| intentionally confusing. "Retrieving" sounds like it should only
| be counted rows returned by a query.
| dreyfan wrote:
| Don't use DB providers that charge for rows/data scanned. Use
| Amazon RDS or Google Cloud SQL or just install it yourself on a
| VM. Pay for CPU, memory, and storage instead.
| ldoughty wrote:
| Rows returned model works really well for certain data loads
| (where all data customers use is customer-keyed)....
|
| This model also scales DOWN really well .. while still
| providing good scalable availability...
|
| That said, I DO agree with the sentiment of paying for a set
| performance level (clu, memory, storage), to provide
| predictable pricing.. obviously these guys were bit by the
| scaling capability.
|
| I do a lot of pet projects, and I find DynamoDB works really
| well because my pet projects cost $0 most months... And I don't
| have to worry about servers, maintenance, or what not... I'm
| happy to do that at work, but I don't want that for my friends
| & fun projects... And I've not seen a decent DB managed RDS for
| <$5/month
| boulos wrote:
| Disclosure: I used to work on Google Cloud.
|
| This is why BigQuery offers both models and lets you control
| the caps [1].
|
| Buying fixed compute is effectively buying a throughput cap.
| Hard Quotas provide a similar function, but aren't a useful
| budgeting tool if you can't set them yourself.
|
| "Serverless" without limits is basically "infinite
| throughput, infinite budget" (though App Engine had quotas
| since day 1 and then budgets once charging was added). The
| _default_ quotas give you some of that budget / throughput
| capping, but again if you can't lower them they might not
| help you.
|
| Either way, BQ won't drop _ingestion_ or storage though
| because almost nobody wants their data deleted. As a
| provider, implementing _strict_ budgets is impossible without
| having a fairly complex policy "if over $X/second stop all
| activity, oh except let me still do admin work, like adding
| indexes? Over $Y/second delete everything". I think having
| user adjustable quotas and throughput caps per "dimension"
| makes more sense but it puts the burden on the user and _no_
| provider offers good enough user control over quota.
|
| tl;dr: true budgets are hard to do, but every provider should
| strive to offer better quota/throughput controls.
|
| [1] https://cloud.google.com/bigquery/pricing
|
| [2] https://cloud.google.com/bigquery/docs/reservations-
| workload...
| [deleted]
| [deleted]
| mike_hock wrote:
| Rent metal and run your own MySQL/Postgres/...
|
| One insert every 3 seconds. Could run that off a 10 year old
| laptop.
| marcinzm wrote:
| The time spent setting it up and managing it and then having
| to deal with backups/environment clones/access
| control/scaling limitations/etc. outweighs the savings for
| almost any company paying US wages. Especially since you'd
| need metal for everything and not just the db due to network
| latency.
| Nextgrid wrote:
| Every cloud-hosted startup I've consulted for had a full-
| time devops guy wrangling Terraform and YAML files. The
| cloud requires an equivalent time investment.
| marcinzm wrote:
| Bare metal requires the equivalent of all of that devops
| stuff and then more. That is if you actually want parity
| and not just a half assed version that hurts developer
| productivity and causes technical debt.
| rytcio wrote:
| I think you're over estimating how complicated that stuff
| is...
| marcinzm wrote:
| It's very easy to do it in a half-assed way and much
| harder to do it at scale in a production environment with
| many developers without hurting developer productivity at
| all.
| dreyfan wrote:
| Sure, but we're addressing people who are so far on the other
| end of the spectrum they're using a "serverless" database
| where they pay for the number of rows scanned per query. I
| think a managed VM is a better middle-ground for their
| capability level while still delivering massive cost-savings.
|
| Amazon RDS lowest-tier runs about $13/mo for 10GB storage, 2
| vCPUs and 1GB memory with automated backups and push-button
| restoring. And that would have likely met all of their needs
| with capacity to spare.
___________________________________________________________________
(page generated 2021-11-06 23:00 UTC)