[HN Gopher] Almost every infrastructure decision I endorse or re...
       ___________________________________________________________________
        
       Almost every infrastructure decision I endorse or regret
        
       Author : slyall
       Score  : 1007 points
       Date   : 2024-02-09 11:05 UTC (1 days ago)
        
 (HTM) web link (cep.dev)
 (TXT) w3m dump (cep.dev)
        
       | shrubble wrote:
       | "Since the database is used by everyone, it becomes cared for by
       | no one. Startups don't have the luxury of a DBA, and everything
       | owned by no one is owned by infrastructure eventually"
       | 
       | I think adding a DBA or hiring one to help you layout your
       | database should not be considered a 'luxury'...
        
         | winrid wrote:
         | Yeah I mean, hiring one person to own that for 5-10 teams is
         | pretty cheap... Cheaper than each team constantly solving the
         | same problems and relearning the same gotchas/operational stuff
         | that doesn't add much value when writing your application code.
        
         | steveBK123 wrote:
         | There's even consultants you can hire out by the day instead of
         | a full-time DBA.
         | 
         | Maybe you need help with setup for a few weeks/months, and then
         | some routine billable hours per month for maintenance / change
         | advice.
        
       | Scubabear68 wrote:
       | The kitchen sink database used by everybody is such a common
       | problem, yet it is repeated over and over again. If you grow it
       | becomes significant tech debt and a performance bottleneck.
       | 
       | Fortunately, with managed DBs like RDS it is really easy to run
       | individual DB clusters per major app.
        
         | sgarland wrote:
         | The downside is then you have many, many DBs to fight with, to
         | monitor, to tune, etc.
         | 
         | This is rarely a problem when things are small, but as they
         | grow, the bad schema decisions made by empowering DBA-less
         | teams to run their own infra become glaringly obvious.
        
           | Scubabear68 wrote:
           | Not a downside to me. Each team maintains their own DB and
           | pays for their own choices.
           | 
           | In the kitchen sink model all teams are tied together for
           | performance and scalability, and some bad apple applications
           | can ruin the party for everyone.
           | 
           | Seen this countless times doing due diligence on startups.
           | The universal kitchen sink DB is almost always one of the
           | major tech debt items.
        
             | sgarland wrote:
             | I'm a DBRE, which means it's somehow always my fault until
             | proven otherwise. And even then, it's usually on me to work
             | around the insane schema dreamt up by the devs.
             | 
             | Multi-tenant DBs can work fine as long as every app has its
             | own users, everyone goes through a connection pooler / load
             | balancer, and every user has rate limits. You want to write
             | shitty queries that time out? Not my problem. Your GraphQL
             | BFF bullshit is trying to make 10,000 QPS? Nope, sorry, try
             | again later.
             | 
             | EDIT: I say "not my problem," but as mentioned, it
             | inevitably becomes my problem. Because "just unblock them
             | so the site is functional" is far more attractive to the
             | C-Suite than "slow down velocity to ensure the dev teams
             | are doing things right."
        
               | CoolCold wrote:
               | You forgot the modern mantra - dev team is always right!
        
               | Scubabear68 wrote:
               | I agree. My gripe was everybody in the same schema with a
               | global "app" user.
        
               | dalyons wrote:
               | Or, you just avoid doing multi tenet from the start and
               | none of those become your problem to unblock. What's the
               | downside?
        
               | sgarland wrote:
               | Done that as well; it still becomes my problem because
               | teams without RDBMS knowledge eventually break it, and...
               | then I get paged.
               | 
               | Full Stack is a lie, and the sooner companies accept that
               | and allow people to specialize again, and to pay for the
               | extra headcount, the better off everyone will be.
        
               | dalyons wrote:
               | I disagree I guess. Multiple companies I've worked at
               | have broken up their shared db into many dbs that
               | individual teams own the operations of, and it works just
               | fine. At significant scale in traffic and # of eng. No
               | central dbas needed - smaller databases require much less
               | skills to manage. The teams that own them learn enough.
        
             | maccard wrote:
             | > Not a downside to me. Each team maintains their own DB
             | and pays for their own choices.
             | 
             | This is how you end up with the infamous "jira and
             | confluence have two different markdown flavors" issue.
        
               | Sankozi wrote:
               | I don't think Jira and Confluence different markdown
               | setup is due to them not sharing their databases. It is
               | just poor product management from Attlasian.
        
               | maccard wrote:
               | My point is that forcing these arbitrary decisions is
               | poor product management.
        
           | vrosas wrote:
           | Bad schema decisions are made regardless of whether you're
           | one database or 50. At least with many databases the problems
           | are localized.
        
             | sgarland wrote:
             | But then the DB Team - if you have one - is responsible for
             | 50 databases, each full of their own unique problems.
             | 
             | This will undoubtedly go over poorly, but honestly I think
             | every data decision should be gated through the DB Team
             | (again, if you have them). Your proposed schema isn't
             | normalized? Straight to jail. You don't want to learn SQL?
             | Also straight to jail. You want to use a UUIDv4 as a
             | primary key? Believe it or not, jail.
             | 
             | The most performant and referentially sound app in the
             | world, because of jail.
        
               | inquist wrote:
               | What's wrong with uuidv4 as PK?
        
               | marcosdumay wrote:
               | Serial integers always work better than any uuid as PKs,
               | but the thing with uuid4 is that it disrupts any kind of
               | index or physical ordering you decide to put on your
               | data.
               | 
               | Uuids are really for external communication, not in-
               | system organization.
        
               | dalyons wrote:
               | FWIW this isn't true anymore with newer uuid schemes like
               | v7 that are roughly time sortable.
        
               | ildjarn wrote:
               | Serial index forces a synchronisation point on every
               | entity that can create records. If this is only ever a
               | single database that's fine but plenty of apps can't
               | scale this way.
        
               | marcosdumay wrote:
               | They don't. Clustered databases deal with parallel
               | generation of them just fine.
               | 
               | They require periodic synchronization. What isn't a big
               | deal at all and is required by many other database
               | features.
        
               | sgarland wrote:
               | If you have a sharded DB, each instance can get its own
               | range of ints, which are periodically refreshed.
               | 
               | PlanetScale uses int PKs [0], and they seem to have
               | scaled just fine.
               | 
               | [0]:
               | https://github.com/planetscale/discussion/discussions/366
        
               | sgarland wrote:
               | Anything non-k-sortable in a B[+,-]tree will cause a ton
               | of page splits. This is a more noticeable performance
               | impact in RDBMS with a clustered index (MySQL's InnoDB,
               | MS SQL Server) [0], but it also impacts Postgres [1] in
               | multiple [2] ways.
               | 
               | [0]: https://www.percona.com/blog/uuids-are-popular-but-
               | bad-for-p...
               | 
               | [1]: https://www.cybertec-postgresql.com/en/unexpected-
               | downsides-...
               | 
               | [2]: https://www.2ndquadrant.com/en/blog/on-the-impact-
               | of-full-pa...
        
               | Glyptodon wrote:
               | What's the best non serial option for PKs in your view?
               | Or do you prefer dual PK approach?
        
               | Sankozi wrote:
               | No single team should not be responsible for all
               | databases. If such team exists they will either become
               | bottleneck for every other team (by auditing carefully
               | each schema change) or become bloated and not utilized
               | 90% of time, or (most common) they will become nearly
               | useless or even harmful - they will not be really
               | responsible and they will act as dumb proxy - they will
               | introduce latency to the schema updates, but they will
               | not bother to check them very well (why would they? they
               | are not responsible for the whole product, just for the
               | database), some DB refactoring/migrations will be totally
               | abandoned because DB team will make them too painful.
               | 
               | DB team could act as an auditor and expert support, but
               | they should never be fully responsible for DB layer.
        
               | sgarland wrote:
               | > If such team exists they will either become bottleneck
               | for every other team (by auditing carefully each schema
               | change)
               | 
               | That's the point. Would you send a backend code review to
               | a frontend team? Why do DBs not deserve domain expertise,
               | especially when the entire company depends on them?
               | 
               | > they are not responsible for the whole product, just
               | for the database
               | 
               | I assure you, that's a lot to be responsible for at
               | scale.
               | 
               | > DB team could act as an auditor and expert support, but
               | they should never be fully responsible for DB layer.
               | 
               | Again, the issue here is when the DB gets borked enough
               | that a SME is required to fix it, they effectively do
               | become responsible, because no CTO is going to accept,
               | "sorry, we'll be down for a couple of days because our
               | team doesn't really know how this thing works."
               | 
               | And if your answer is, "AWS Premium Support," they'll
               | just tell you to upsize the instance. Every time. That is
               | not a long-term strategy.
        
           | calvinmorrison wrote:
           | It's because I hate databases and programming separately. I
           | would rather slow code then have to dig into some database
           | procdure. Its just another level of separation thats too
           | mentally hard to manage. Its like... my queries go into a VM
           | and now I have to worry about how the VM is performing.
           | 
           | I wish and maybe there is a programming language with first
           | class database support. I mean really first class not just
           | let me run queries but almost like embedded into the language
           | in a primal way where I can both deal with my database
           | programming fancyness and my general development together.
           | 
           | Sincerely someone who inherited a project from a DBA.
        
             | sgarland wrote:
             | > I mean really first class not just let me run queries but
             | almost like embedded into the language
             | 
             | Not quite embedded into the OS, but Django is a damn good
             | ORM. I say that as a DBRE, and someone obsessed with
             | performance (inherent issues with interpreted languages
             | aside).
        
             | leetharris wrote:
             | The closest thing to what you're describing is Prisma in
             | Node. It generates a Typescript file from your schema so
             | you get code completion on your data. And it exists
             | somewhere between a query builder and a traditional ORM.
             | 
             | I have worked in many languages with many ORMs and this has
             | been my personal favorite.
        
               | sgarland wrote:
               | Until Prisma can manage JOINs [0] there is no way I can
               | recommend it.
               | 
               | [0]: https://github.com/prisma/prisma/discussions/12715
        
               | kkarimi wrote:
               | The support for JOINs is coming, currently under a
               | feature flag [0]
               | 
               | [0]: https://github.com/prisma/prisma/issues/5184#issueco
               | mment-18...
        
               | mkesper wrote:
               | But the migration stuff is a horrible joke. No way to
               | just rollback a broken migration.
               | https://www.prisma.io/docs/orm/prisma-
               | migrate/workflows/gene...
        
             | chasd00 wrote:
             | The language you're talking about is APEX. I believe it
             | comes from Oracle and is the backend language for
             | Salesforce development. You'll like the first class
             | database support but that's about it.
        
         | el_benhameen wrote:
         | Lots of interesting comments on this one. Anyone have any good
         | resources for learning how not to fuck up schema/db design for
         | those of us who will probably never have a DBA on the team?
        
           | magicalhippo wrote:
           | Good question. We don't have a DBA either. I've learned SQL
           | as needed and while I'm not terrible, it's still daunting
           | when making the schema for a new module that might require
           | 10-20 tables or more.
           | 
           | One thing that has worked well for us is to alway include the
           | top-most parent key in all child tables down yhe hierarchy.
           | This way we can load all the data for say an order without
           | joins/exists.
           | 
           | Oh and never use natural keys. Each time I thought finally I
           | had a good use-case, it has bitten me in some way.
           | 
           | Apart from that we just try to think about the required data
           | access and the queries needed. Main thing is that all queries
           | should go against indexes in our case, so we make sure the
           | schema supports that easily. Requires some educated guesses
           | at times but mostly it's predictable IME.
           | 
           | Anyway would love to see a proper resource. We've made some
           | mistakes but I'm sure there's more to learn.
        
             | AznHisoka wrote:
             | Not to pick on you, but is SQL not basic knowledge for
             | every software engineer these days? Or have times changed?
        
               | rswail wrote:
               | Times have changed. If you have C# programmers and they
               | can't do it in Entity Framework/LINQ, then they can't do
               | it.
        
               | neonsunset wrote:
               | This seems like a stereotype from 2010s and disconnected
               | from reality today.
        
               | mordae wrote:
               | Nope. None of my below 30 colleagues know SQL. They use
               | ORM in REPL or visual tools.
        
               | neonsunset wrote:
               | LINQPad is awesome and EF Core is just _this_ good so I
               | can see why some would just choose not to deal with SQL.
               | 
               | With that said, this still sounds like a strange
               | situation - most colleagues, acquaintances and people I
               | consulted know they way around SQL and dropping down to
               | 'dbset.FromSql($"SELECT {...' is very commonplace out of
               | the need to use sprocs, views or have tighter control
               | over the query.
        
               | deskamess wrote:
               | I had not updated LINQPad in a while and just saw the
               | price this year. Eeesh. I now live in a .NET Interactive
               | (Jupyter like) environment.
        
               | magicalhippo wrote:
               | Perhaps I undersold myself a little. By the time I got my
               | first job I was fairly well versed in SQL querying, and
               | these days I feel comfortable writing what I'd consider
               | complex queries. That is with various window functions,
               | nested queries, recursion (though I try to avoid that)
               | etc, and I have a good handle on what the query optimizer
               | likes and doesn't like.
               | 
               | But schema design is something else. I still take my time
               | doing that.
               | 
               | Especially since our application is written with
               | backwards compatibility in mind, so changing schema after
               | it's deployed is something we try very hard to avoid.
               | 
               | But yeah, when hiring we require they are comfortable
               | writing "normal" SQL queries (multiple joins, aggregation
               | etc).
        
           | marcosdumay wrote:
           | > not to fuck up schema/db design
           | 
           | The neat thing is, you don't. Nobody ever avoids fucking up
           | db design.
           | 
           | The best you can do is decide what is really important to get
           | right, and not fuck that part up.
        
             | gregw2 wrote:
             | Wow, what an astute comment! Thank you!
             | 
             | P.S. to the original person concerned about this though...
             | for your own sake and your successors, please keep trying.
        
               | marcosdumay wrote:
               | Assuming that was sarcastic, you are free to try, I guess
               | everyone needs to try it once.
               | 
               | Just do the exercise of deciding what is really important
               | first, so you can make sure you succeed for that stuff.
        
         | nitwit005 wrote:
         | The moment you have two databases is the moment you need to
         | deal with data consistency problems.
         | 
         | If you can't do something like determine if you can delete
         | data, as the article mentions, you won't be able to produce an
         | answer to how to deal with those problems.
        
         | eduction wrote:
         | Management problem masquerading as a tech problem.
         | 
         | Being shared between applications is literally what databases
         | were invented to do. That's why you learn a special dsl to
         | query and update them instead of just doing it in the same
         | language as your application.
         | 
         | The problem is that data is a shared resource. The database is
         | where multiple groups in an organization come together to get
         | something they all need. So it needs to be managed. It could be
         | a dictator DBA or a set of rules designed in meetings and
         | administered by ops, or whatever.
         | 
         | But imagine it was money. Different divisions produce and
         | consume money just like data. Would anyone imagine suggesting
         | either every team has their own bank account or total
         | unfettered access to the corporate treasury? Of course not. You
         | would make a system. Everyone would at least mildly hate it.
         | That's how databases should generally be managed once the
         | company is any real size.
        
           | dalyons wrote:
           | Why would you make it a shared resource if you don't have to?
           | 
           | Decades of experience have shown us the massive costs of
           | doing so - the crippled velocity and soul crushing agony of
           | dba change control teams, the overhead salary of database
           | priests, the arcane performance nightmares, the nuclear blast
           | radius, the fundamental organizational counter-incentives of
           | a shared resource .
           | 
           | Why on earth would we choose to pay those terrible prices in
           | this day and age, when infrastructure is code, managed
           | databases are everywhere and every team can have their own
           | thing. You didn't have a choice previously, now you do.
        
             | eduction wrote:
             | You wouldn't but in any decent sized organization you will
             | have to. If it is an organization that needs to exist there
             | will be some common set of critical data.
        
               | webo wrote:
               | In my experience, isolated (repeated) data storage
               | paradigm is even more common at large organizations. They
               | share data via services, ETLs, event buses, etc.
        
               | dalyons wrote:
               | That's just not true though, I've worked at decent sized
               | companies without shared RDBMs, so you don't have to.
               | 
               | You DO have to share data in other ways, usually
               | datawarehouse or services, but that is not the same
               | thing.
        
               | eduction wrote:
               | To me this is semantics. So it's a data warehouse rather
               | than a database. Ok. Or we share data from a common
               | source via "services" - ok but that's another word for a
               | database and a client (using http to do the talking
               | doesn't really change anything).
               | 
               | I'm not saying literally every source of data has to be
               | shared and centrally managed. I'm also not saying "rdbms
               | accessed via traditional client and queried via sql" when
               | I say database. I'm just saying a shared database of some
               | shape is inevitable.
        
               | dalyons wrote:
               | Ok, but the OP and the article are talking specifically
               | about a directly shared rdbms scenario, not some nebulous
               | concept of shared data.
               | 
               | Also, operationally it's not "semantics" at all. You
               | don't get into (many) operational problems with analysts
               | sharing a datawarehouse. You absolutely do with online
               | apps sharing a rdbms, they aren't the same thing.
        
           | IggleSniggle wrote:
           | ...I worked at a large software organization where larger
           | teams had their own bank account, and there was a lot of
           | internal billing, etc, mixed with plenty of funny-money to go
           | along with it. That's not a contradiction, though, it
           | perfectly illustrated your point for me.
        
       | hayst4ck wrote:
       | I would love to see this type of thing from multiple sources.
       | This reflects a lot of my own experience.
       | 
       | I think the format of this is great. I suppose it would take a
       | motivated individual to go around and ask people to essentially
       | fill out a form like this to get that.
        
         | kaycebasques wrote:
         | I also think it's a great format.
         | 
         | One suggestion if we're gonna standardize around this format.
         | Avoid the double negatives. In some cases author says "avoided
         | XYZ" and then the judgment was "no regrets". Too many layers
         | for me to parse there. Instead, I suggest each section being
         | the product that was used. If you regret that product, in the
         | details is where you mention the product you should have used.
         | Or you have another section for product ABC and you provide the
         | context by saying "we adopted ABC after we abandoned XYZ".
         | 
         | I don't recommend trying to categorize into general areas like
         | logging, postmortems, etc. Just do a top-level section for each
         | product.
        
       | electroly wrote:
       | > The markup cost of using RDS (or any managed database) is worth
       | it.
       | 
       | Every so often I price out RDS to replace our colocated SQL
       | Server cluster and it's so unrealistically expensive that I just
       | have to laugh. It's absurdly far beyond what I'd be willing to
       | pay. The markup is enough to pay for the colocation rack, the AWS
       | Direct Connects, the servers, the SAN, the SQL Server licenses,
       | the maintenance contracts, _and a full-time in-house DBA_.
       | 
       | https://calculator.aws/#/estimate?id=48b0bab00fe90c5e6de68d0...
       | 
       | Total 12 months cost: 547,441.85 USD
       | 
       | Once you get past the point where the markup can pay for one or
       | more full-time employees, I think you should consider doing that
       | instead of blindly paying more and more to scale RDS up. You're
       | REALLY paying for it with RDS. At least re-evaluate the choices
       | you made as a fledgling startup once you reach the scale where
       | you're paying AWS "full time engineer" amounts of money.
        
         | vasco wrote:
         | That's a huge instance with an enterprise license on top. Most
         | large SaaS companies can run off of $5k / m or cheaper RDS
         | deployments which isn't enough to pay someone. The amount of
         | people running half a million a year RDS bills might not be
         | that large. For most people RDS is worth it as soon as you have
         | backup requirements and would have to implement them yourself.
        
           | electroly wrote:
           | Definitely--I recommend this _after_ you 've reached the
           | point where you're writing huge checks to AWS. Maybe this is
           | just assumed but I've never seen anyone else add that nuance
           | to the "just use RDS" advice. It's always just "RDS is worth
           | it" full stop, as in this article.
        
             | Aeolun wrote:
             | To some extend that is probably true, because when you've
             | built a business that needs a 500k/year database fully on
             | RDS it's already priced into your profits, and switching to
             | a self-hosted database will seem unacceptably risky for
             | something that works just fine.
        
               | groestl wrote:
               | > it's already priced into your profits
               | 
               | Assuming you have any. You might not, because of AWS.
        
             | sroussey wrote:
             | I mean, just use supabase instead. So much easier than RDS.
             | Why even deal with AWS directly? Might as well have a Colo
             | if you need AWS.
        
           | sgarland wrote:
           | > Most large SaaS companies can run off of $5k / m or cheaper
           | RDS
           | 
           | Hard disagree. An r6i.12xl Multi-AZ with 7500 IOPS / 500 GiB
           | io1 books at $10K/month on its own. Add a read replica, even
           | Single-AZ at a smaller size, and you're half that again. And
           | this is without the infra required to run a load balancer /
           | connection pooler.
           | 
           | I don't know what your definition of "large" is, but the
           | described would be adequate at best at the ~100K QPS level.
           | 
           | RDS is expensive as hell, because they know most people don't
           | want to take the time to read docs and understand how to
           | implement a solid backup strategy. That, and they've somehow
           | convinced everyone that you don't have to tune RDS.
        
             | rswail wrote:
             | If you're not using GP3 storage that provides 12K minimum
             | IOPS without requiring provisioned IOPS for >400GB storage,
             | as well as 4 volume striping, then you're overpaying.
             | 
             | If you don't have a reserved instance, then you're giving
             | up potentially a 50% discount on on-demand pricing.
             | 
             | An r6i.12xl is a huge instance.
             | 
             | There are other equivalents in the range of instances
             | available (and you can change them as required, with
             | downtime).
        
               | sgarland wrote:
               | > GP3... as well as 4 volume striping
               | 
               | For MySQL and Postgres, RDS stripes across four volumes
               | once you hit 400 GiB. Doesn't matter the type.
               | 
               | The latency variation on gp3 is abysmal [0], and the
               | average [1] isn't great either. It's probably fine if you
               | have low demands, or if your working set fits into memory
               | and you can risk the performance hit when you get an
               | uncached query.
               | 
               | 12K IOPS sounds nice until you add latency into it. If
               | you have 2 msec latency, then (ignoring various other
               | overheads, and kernel or EBS command merging) the maximum
               | a single thread can accomplish in one second is (1000
               | msec / 1 sec / 2 msec) = 500 I/O. Depending on your needs
               | that may be fine, of course.
               | 
               | > If you don't have a reserved instance, then you're
               | giving up potentially a 50% discount on on-demand
               | pricing.
               | 
               | True, of course. Large customers also don't pay retail.
               | 
               | > An r6i.12xl is a huge instance.
               | 
               | I mean, it goes well past that to .32xl, so I wouldn't
               | say it's huge. I work with DBs with 1 TiB of RAM, and I'm
               | positive there are people here who think those are toys.
               | The original comment I replied to said, "large SaaS," and
               | a .12xl, as I said, would be roughly adequate for ~100K
               | QPS, assuming no absurdly bad queries.
               | 
               | [0]: https://www.percona.com/blog/performance-of-various-
               | ebs-stor...
               | 
               | [1]: https://silashansen.medium.com/looking-into-the-new-
               | ebs-gp3-...
        
           | dzikimarian wrote:
           | >Most large SaaS companies can run off of $5k / m or cheaper
           | RDS deployments which isn't enough to pay someone.
           | 
           | After initial setup, managing equivalent of $5k/m RDS is not
           | full time job. If you add to this, that wages differ a lot
           | around the world, $5k can take you very, very far in terms of
           | paying someone.
        
         | renewiltord wrote:
         | You don't get the higher end machines on AWS unless you're a
         | big guy. We have Epyc 9684X on-prem. Cannot match that at the
         | price on AWS. That's just about making the choices. Most
         | companies are not DB-primary.
        
           | sgarland wrote:
           | I think most people who've never experienced native NVMe for
           | a DB are also unaware of just how blindingly fast it is. Even
           | io2 Block Express isn't the same.
        
             | renewiltord wrote:
             | Yes. We have it 4x striped on those same machines. Burns
             | like lightning.
        
               | sgarland wrote:
               | The only problem is it hides all of the horrible queries.
               | Ah well, can't have it all.
        
               | Cacti wrote:
               | I have one of those. It's so fast I don't even know what
               | to do with it.
        
               | icelancer wrote:
               | Ha, I did just the same thing - and also optimized for an
               | extremely fast per-thread CPU (which you never get from
               | managed service providers).
               | 
               | The query times are incredible.
        
             | sroussey wrote:
             | Most databases expressly say don't run storage over a
             | network.
        
               | amluto wrote:
               | To be fair, most networked filesystems are nowhere near
               | as good as EBS. That's one AWS service that takes real
               | work to replicate on-prem.
               | 
               | OTOH, as noted, EBS does not perform as well as native
               | NVMe and is hilariously expensive if you try. And quite a
               | few use cases are just fine on plain old NVMe.
        
               | tpetry wrote:
               | Thats because EBS is a network block device and not a
               | network filesystem - that would be EFS. And with network
               | block devices you get the same perf and better compared
               | to EBS.
        
             | ndriscoll wrote:
             | Funny enough, the easiest way to experience this is
             | probably to do some performance experimentation on the
             | machine you code on. If it's a laptop made in the last few
             | years, the performance you can get out of it knowing that
             | it's sipping on a 45W power brick with probably not great
             | cooling will make you very skeptical of when people talk
             | about "scale".
        
         | steveBK123 wrote:
         | RDS pricing is deranged at the scales I've seen too. $60k/year
         | for something I could run on just a slice of one of my on-prem
         | $20k servers. This is something we would have run 10s of.
         | $600k/year operational against sub-$100k capital cost pays
         | DBAs, backups, etc with money to spare.
         | 
         | Sure, maybe if you are some sort of SaaS with a need for a
         | small single DB, that also needs to be resilient, backed up,
         | rock solid bulletproof.. it makes sense? But how many cases are
         | there of this? If its so fundamental to your product and needs
         | such uptime & redundancy, what are the odds its also reasonably
         | small?
        
           | macNchz wrote:
           | > Sure, maybe if you are some sort of SaaS with a need for a
           | small single DB, that also needs to be resilient, backed up,
           | rock solid bulletproof.. it makes sense? But how many cases
           | are there of this?
           | 
           | Most software startups these days? The blog post is about
           | work done at a startup after all. By the time your db is big
           | enough to cost an unreasonable amount on RDS, you're likely a
           | big enough team to have options. If you're a small startup,
           | saving a couple hundred bucks a month by self managing your
           | database is rarely a good choice. There're more valuable
           | things to work on.
        
             | tw04 wrote:
             | >By the time your db is big enough to cost an unreasonable
             | amount on RDS, you're likely a big enough team to have
             | options.
             | 
             | By the time your db is big enough to cost an unreasonable
             | amount on RDS, you've likely got so much momentum that
             | getting off is nearly impossible as you bleed cash.
             | 
             | You can buy a used server and find colocation space and
             | still be pennies on the dollar for even the smallest
             | database. If you're doing more than prototyping, you're
             | probably wasting money.
        
               | theptip wrote:
               | That's just another way of saying the opportunity cost
               | isn't worth paying to do the migration.
               | 
               | Optionality and flexibility are extremely valuable, and
               | that is why cloud compute continues to be popular,
               | especially for rapidly/burstily growing businesses like
               | startups.
        
               | latch wrote:
               | I don't mean to pick on your specific comments, but I
               | find these analysis almost always lack a crucial
               | perspective: level of knowledge. This is the single
               | biggest factor, and it's the hardest one to be honest
               | about. No one wants to say "RDS is a good choice . . .
               | because I don't know how nor have I ever self managed a
               | database."
               | 
               | If you want a different opportunity cost, get people with
               | different experience. If RDS is objectively expensive,
               | objectively slow, but subjectively easy, change the
               | subject.
        
               | pcl wrote:
               | _> No one wants to say  "RDS is a good choice . . .
               | because I don't know how nor have I ever self managed a
               | database."_
               | 
               | I don't think that's accurate. I've self-managed
               | databases, and I still think that RDS is compelling for
               | small engineering teams.
               | 
               | There's a lot to get right when managing a database, and
               | it's easy to screw something up. Perhaps none of the
               | individual parts are super-complicated, but the cost of
               | failure is high. Outsourcing that cost to AWS is pretty
               | compelling.
               | 
               | At a certain team size, you'll end up with a section of
               | the team that's dedicated to these sorts of careful
               | processes. But the first place these issues come up is
               | with the database, and if you can put off that bit of
               | organizational scaling until later, then that's a great
               | path to choose.
        
               | maccard wrote:
               | I disagree here. This falls apart when you zoom out one
               | step. I'm perfectly capable of managing a database. I'm
               | also capable of maintaining load balancers, redis,
               | container orchestrators, Jenkins, perforce, grafana,
               | Loki, Oncall, individually. But each of those has the
               | high chance of being a distraction from what our software
               | actually does.
               | 
               | Its about tradeoffs, and some tradeoffs are often more
               | applicable than others - getting a ping at 7am on a
               | Sunday because your ec2 instance filled it's drive up
               | with logs and your log rotation script failed because it
               | didn't have a long enough retey is a problem I'm happy to
               | outsource when I should be focusing on the actual app.
        
               | graemep wrote:
               | People do not really understand the value of the former.
               | Even dealing with financial options (buy/sell and
               | underlying) which are a pure form of it, people either do
               | not understand the value, or do so in a very abstract way
               | they do not intuit.
        
               | matwood wrote:
               | Good point. And, since you brought up financials, you
               | also see this when people use a majority of their savings
               | to lump sum pay off a mortgage. They take an overweighted
               | view of saving on interest and, IMO, underweight the
               | flexibility of liquidity.
        
               | graemep wrote:
               | On the other hand cloud platforms can be hard to migrate
               | off, which is very much taking away options.
        
               | macNchz wrote:
               | In the small SaaS startup case, I'd say the production
               | database is typically the most critical single piece of
               | infra, so self hosting is just not a compelling
               | proposition unless you have a strong technical reason
               | where having super powerful database hardware is
               | important, or a team with multiple people who have
               | sysadmin or DBA experience. I think both of those cases
               | are unusual.
               | 
               | I've been the guy managing a critical self-hosted
               | database in a small team, and it's such a distraction
               | from focusing on the actual core product.
               | 
               | To me, the cost of RDS covers tons of risks and time
               | sinks: having to document the db server setup so I'm not
               | the only one on the team who actually knows how to
               | operate it, setting up monitoring, foolproof backups so I
               | don't need to worry that they're silently failing because
               | a volume is full and I misconfigured the monitoring, PITR
               | for when someone ships a bad migration, one click HA so
               | the database itself is very unlikely to wake me at 3am,
               | blue/green deploys to make major version upgrades totally
               | painless, never having to think about hardware failures
               | or borked dist-upgrades, and so on.
               | 
               | Each of those is ultimately either undifferentiated work
               | to develop in-house RDS features that could have been
               | better spent on product, or a risk of significant data
               | loss, downtime, or firefighting. RDS looks like a pretty
               | good deal, up to a point.
        
               | remus wrote:
               | I like fiddling with databases, but I totally agree with
               | this. Unless you really need a big database and are going
               | to save 100k+ per year by going self managed then RDS or
               | similar just saves you so much stress. We've been using
               | it for the best part of 10 years and uptime and latency
               | have consistently been excellent, and functionality is
               | all rock solid. I never have to think about it, which is
               | just what I want from something so core to the business.
        
               | matwood wrote:
               | I _am_ good at databases (have been a DBA in the past),
               | and 100% agree with this. RDS is easy to standup and get
               | all the things you mentioned, and not have to think about
               | again. If we grow to the point where the overhead is more
               | than a FT DBA, awesome. It means we are successful, and
               | are fortunate to have options.
        
               | rnts08 wrote:
               | Unfortunately there are so many people and teams who
               | thinks that simply running their databases on RDS means
               | that they're backed up, highly-available and can be
               | easily load balanced, upgraded, partitioned, migrated and
               | so on which is simply not the case with the basic
               | configuration.
               | 
               | RDS is a great choice, for prototyping and only for
               | production if you know what you're doing when setting it
               | up.
               | 
               | FWIW, this is common in all cloud deployments, people
               | assume that running something "severless" is a magical
               | silver bullet.
        
               | macNchz wrote:
               | Well...just using the defaults when creating an RDS
               | Postgres in the console give you an HA cluster with two
               | read replicas, 7 days of backups restorable to any point
               | in time, automatic minor version upgrades, and very easy
               | major upgrades. So unless you start actively unchecking
               | stuff those are not entirely invalid assumptions.
        
               | optymizer wrote:
               | I agree, but I also classify some of these as "learn them
               | once and you're all set".
               | 
               | Maybe it takes you a month the first time around and a
               | week the 10th time around. First product suffers, the
               | other products not so much. Now it just takes a week of
               | your time and does not require you to pay large AWS fees,
               | which means you are not bleeding money
               | 
               | I like to set up scrappy products that do not rack up
               | large monthly fees. This means I can let them run
               | unprofitable for longer and I don't have to seek an
               | investor early, which would light up a large fire under
               | everyone's butts and start influencing timelines because
               | now they have the money and want a return asap.
               | 
               | I'll launch a week later - no biggie usually. I could
               | have come up with the idea a month later, so I'm still 3
               | weeks early ;)
               | 
               | It doesn't work for all projects, obviously, but I've
               | seen plenty of SaaS start out with a shopping spree, then
               | pay monthly fees and purchase licenses for stuff that
               | they could have set up for free if they put some (usually
               | not a lot) effort into it. When times get rough, the
               | shorter runway becomes a hard fact of life. Maybe they
               | wouldn't have needed a VC and could have bootstrapped and
               | also survived for longer.
        
               | macNchz wrote:
               | Learning it all is what gave me an appreciation for RDS!
               | I've self managed a number of Postgres and MySQL
               | databases, including a 10TB Postgres cluster with all of
               | the HA and backup niceties.
               | 
               | While I generally agree as far as initial setup time
               | goes, I favor RDS because I can forget about it, whereas
               | the hand rolled version demands ongoing maintenance, and
               | incurs a nonzero chance of simple mistakes that, if made,
               | could result in a 100% dataloss unrecoverable scenario.
               | 
               | I'm also mostly talking about typical, funded startups
               | here, as opposed to indie/solo devs. If you're flying
               | solo launching a tiny proof of concept that may only ever
               | have a few users, by all means run it yourself if you'd
               | like, but if you've raised money to grow faster and are
               | paying employees to iterate rapidly searching for
               | PMF...just pay for RDS and make sure as much time as
               | possible is spent on product features that provide actual
               | business value. It starts at like $15/month. The cost of
               | simply not being laser-focused on product is far greater.
        
               | crazygringo wrote:
               | > _you 've likely got so much momentum that getting off
               | is nearly impossible as you bleed cash._
               | 
               | Databases are not particularly difficult to migrate
               | between machines. Of all the cloud services to migrate,
               | they might actually be the easiest, since the databases
               | don't have different API's that need to be rewritten for,
               | and database replication is a well-established thing.
               | 
               | Getting off is quite the _opposite_ of nearly impossible.
        
           | viraptor wrote:
           | Lots of cases. It doesn't even have to be a tiny database.
           | Within <1TB range there's a huge number of online companies
           | that don't need to do more than hundreds of queries per
           | second, but need the reliability and quick failover that RDS
           | gives them. The $600k cost is absurd indeed, but it's not the
           | range of what those companies spend.
           | 
           | Also, Aurora gives you the block level cluster that you can't
           | deploy on your own - it's way easier to work with than the
           | usual replication.
        
             | steveBK123 wrote:
             | Once you commit to more deeply Amazon flavored parts of AWS
             | like Aurora, aren't you now fairly committed to hoping your
             | scale never exceeds the cost-benefit tradeoff?
        
               | viraptor wrote:
               | Or you're realistic about what you're doing. Will you
               | _ever_ need to scale more than 10x? And on the timescales
               | where you do grow over 10x, would it be better to
               | reconsider /re-architect everything anyway?
               | 
               | I mean, I'm looking after a 4 instance Aurora cluster
               | which is great feature wise, is slightly overprovisioned
               | for special events, and is more likely to shrink than
               | grow 2x in the next decade. If we start experiencing any
               | issues, there's lots of optimisations that can be still
               | gained from better caching and that work will be cheaper
               | than the instance size upgrade.
        
               | zmgsabst wrote:
               | ...no?
               | 
               | There's still a defined cost to swapping your DB code
               | over to a different backend. At the point where it
               | becomes uneconomical, you're also at a scale you can
               | afford rewriting a module.
               | 
               | That's why we have things like "hexagonal architecture",
               | which focus on isolating the storage protocol from the
               | code. There's an art to designing such that your
               | prototype can scale with only minor rework -- but that's
               | why we have senior engineers.
        
               | callalex wrote:
               | If you're paying list price at scale you are doing it
               | very wrong.
        
               | tw04 wrote:
               | Sure, but if you're paying anywhere near list price for
               | your on-prem hardware at scale you're also doing it
               | wrong. I've never seen a scenario where Amazon discounts
               | exceed what you would get from a hardware or software
               | vendor at the same scale.
        
               | osigurdson wrote:
               | Interesting how cloud services are sold like used cars.
        
               | rswail wrote:
               | It's more interesting how cloud services are sold like
               | any other consumables or corporate services.
               | 
               | No one runs their own electricity supply (well until
               | recently with renewables/storage), they buy it as a
               | service, up to a pretty high scale before it becomes more
               | economic to invest the capex and opex to run your own.
        
               | nemothekid wrote:
               | If my scale exceeds the cost benefit tradeoff, then I
               | will thank God/Allah/Buddah/Spaghetti Monster.
               | 
               | These questions always sound flawed to me. It's like
               | asking won't I regret moving to California and paying
               | high taxes once I start making millions of dollars?
               | Maybe? But that's an amazing problem to have and one that
               | I may be much better equipped to solve.
               | 
               | If you are small, RDS is much cheaper, and many company
               | killing events, such as not testing your backups are
               | solved. If you are big and you can afford a 60K/yr RDS
               | bill than you can make changes to move on-prem. Or you
               | can open up excel and do the math if your margins are
               | meaningfully affected by moving on-prem.
        
               | pclmulqdq wrote:
               | I assume that you do that math on all your new features
               | too, right? The calculation of how much extra money they
               | will bring in?
               | 
               | On some level, AWS/GCP/California relies on you doing
               | this calculation for the things that you can do it on
               | easily (the savings of moving away), while not doing this
               | calculation on things where it's hard to do (new
               | development). That way, you can pretend that your new
               | features are a lot more valuable than the $Xk/year you
               | will save by moving your infra.
        
               | nemothekid wrote:
               | > _The calculation of how much extra money they will
               | bring in?_
               | 
               | Yes, I've done the math. The piece you are missing is,
               | saving money on infra will bring in $0 new dollars. There
               | is a floor to how much money I can save. There is no
               | ceiling to how much money the right feature can bring in.
               | Penny pinching on infra, especially when the amount of
               | money is saved is less than the cost of an engineer is
               | almost always a waste of time while you are growing a
               | company. If you are at the point where you are wasting
               | 1x,2x,3x of an engineers salary of superflous
               | infrastructure - then congratulations you have survived
               | the great filter for 99% of startups.
               | 
               | > _That way, you can pretend that your new features are a
               | lot more valuable than the $Xk /year you will save by
               | moving your infra._
               | 
               | Finding product market fit is 1000x harder than moving
               | from RDS to On-prem. If you haven't solved PMF, then no
               | amount of $Xk/year in savings will save you from having
               | to shut down your company.
        
               | pclmulqdq wrote:
               | I am well aware of the math on that. Also, switching to
               | faster infra can be a surprising benefit to your revenue,
               | by the way, if it makes your app feel nicer.
               | 
               | The thing is, most features, particularly later in the
               | life of a company, don't have an easy-to-measure revenue
               | impact, and I suspect that many features are actually
               | worth $0 of revenue. However, they cost money to
               | implement (both in engineering time and infra), making
               | them very much net negative value propositions. This is
               | why Facebook and Google can cut tons of staff and lose
               | nothing off their revenue number.
               | 
               | Also, there's a bit of a gambling mentality here which is
               | that a feature could be worth effectively infinite
               | revenue (ie it could be the thing that gives you PMF), so
               | it's always worth doing over things with known, bounded
               | impact on your bottom line. However, improving your
               | efficiency gives you more cracks at finding good features
               | before you run out of money.
        
               | matwood wrote:
               | Agree. "What if you're wildly successful and get huge?"
               | Awesome, we'll solve the problem then. The other part is
               | what if AWS was a part of becoming successful? IE, it
               | freed my small team from having to worry all that much
               | about a database and instead focused on features.
        
               | rswail wrote:
               | Aurora supports standard Postgres clients.
               | 
               | So moving to/from Aurora/RDS/own EC2/on-prem _should_ be
               | a matter of networking and changing connection strings in
               | the clients.
               | 
               | Your operational requirements and processes
               | (backup/restore, failover, DR etc) will change, but
               | that's because you're making a deliberate decision
               | weighing up those costs vs benefits.
        
               | gregw2 wrote:
               | Pro tip side note:
               | 
               | You can use DNS to mitigate the pain of changing those
               | connection strings, decoupling client change management
               | from backend change process, or if you had foresight, not
               | having to change client connection strings at all.
        
           | DiggyJohnson wrote:
           | The US DoD for sure.
        
           | amluto wrote:
           | I have a small MySQL database that's rather important, and
           | RDS was a complete failure.
           | 
           | It would have cost a negligible amount. But the sheer amount
           | of time I wasted before I gave up was honestly quite
           | surprising. Let's see:
           | 
           | - I wanted one simple extension. I could have compromised on
           | this, but getting it to work on RDS was a nonstarter.
           | 
           | - I wanted RDS to _import the data_. Nope, RDS isn't "SUPER,"
           | so it rejects a bunch of stuff that mysqldump emits. Hacking
           | around it with sed was not confidence-inspiring.
           | 
           | - The database uses GTIDs and needed to maintain replication
           | to a non-AWS system. RDS nominally supports GTID, but the
           | documented way to enable it at import time strongly suggests
           | that whoever wrote the docs doesn't actually understand the
           | purpose of GTID, and it wasn't clear that RDS could do it
           | right. At least Azure's docs suggested that I could have
           | written code to target some strange APIs to program the thing
           | correctly.
           | 
           | Time wasted: a surprising number of hours. I'd rather give
           | someone a bit of money to manage the thing, but it's still on
           | a combination of plain cloud servers and bare metal. Oh well.
        
             | blantonl wrote:
             | replication to non-AWS systems. "simple" extension problems
             | importing data into RDS because of your custom stuff
             | lurking in a mysqldump
             | 
             | Sounds like you are walking massive edge
        
           | ehnto wrote:
           | > Sure, maybe if you are some sort of SaaS with a need for a
           | small single DB, that also needs to be resilient, backed up,
           | rock solid bulletproof.. it makes sense? But how many cases
           | are there of this?
           | 
           | Very small businesses with phone apps or web apps are often
           | using it. There are cheaper options of course, but when there
           | is no "prem" and there are 1-5 employees then it doesn't make
           | much sense to hire for infra. You outsource all digital work
           | to an agency who sets you up a cloud account so you have
           | ownership, but they do all software dev and infra work.
           | 
           | > If its so fundamental to your product and needs such uptime
           | & redundancy, what are the odds its also reasonably small?
           | 
           | Small businesses again, some of my clients could probably run
           | off a Pentium 4 from 2008, but due to nature of the org and
           | agency engagement it often needs to live in the cloud
           | somewhere.
           | 
           | I am constantly beating the drum to reduce costs and use as
           | little infra as needed though, so in a sense I agree, but the
           | engagement is what it is.
           | 
           | Additionally, everyone wants to believe they will need to
           | hyperscale, so even medium scale businesses over-provision
           | and some agencies are happen to do that for them as they
           | profit off the margin.
        
             | graemep wrote:
             | A lot of my clients are small businesses in that range or
             | bigger.
             | 
             | AWS and the like are rarely a cost effective option, but it
             | is something a lot of agencies like, largely because they
             | are not paying the bills. The clients do not usually care
             | because they are comfortable with a known brand and the
             | costs are a small proportion of the overall costs.
             | 
             | A real small business will be fine just using a VPS
             | provider or a rented server. This solves the problem of not
             | having on premise hardware. They can then run everything on
             | a single server, which is a lot simpler to set up, and a
             | lot simpler to secure. That means the cost of paying
             | someone to run it is a lot lower too as they are needed
             | only occasionally.
             | 
             | They rarely need very resilient systems as they amount of
             | money lost to downtime is relatively small - so even on AWS
             | they are not going to be running in multiple availability
             | zones etc.
        
           | neeleshs wrote:
           | Out of curiosity, who is your onprem provider?
        
           | kunley wrote:
           | RDS is not so bulletproof as advertised, and the support is
           | first arrogant then (maybe) helpful.
           | 
           | People pay for RDS because they want to believe in a fairy
           | tale that it will keep potential problems away and that it
           | worked well for other customers. But those mythical other
           | customers also paid based on such belief. Plus, no one wants
           | to admit that they pay money in such irrational way. It's a
           | bubble
        
             | AtlasBarfed wrote:
             | Plus aws outright lie to us about zero downtime upgrades.
             | 
             | Come time for force major upgrade shoved down our throat?
             | Downtime, surprise, surprise
        
           | thelastparadise wrote:
           | > $600k/year operational against sub-$100k capital cost pays
           | DBAs, backups, etc with money to spare.
           | 
           | One of these is not like the others (DBAs are not capex.)
           | 
           | Have you ever considered that if a company can get the same
           | result for the same price ($100K opex for RDS vs same for
           | human DBA), it actually makes much more sense to go the route
           | that takes the human out of the loop?
           | 
           | The human shows up hungover, goes crazy, gropes Stacy from
           | HR, etc.
           | 
           | RDS just hums along without all the liabilities.
        
             | AaronM wrote:
             | Not only that, you can't just have one DBA. You need a team
             | a them, otherwise that person is going to be on call 24/7,
             | can never take a vacation, etc. Your probably looking at a
             | minimum of 3.
        
             | tpetry wrote:
             | And when you have performance issues you still need a DBA.
             | Because RDS only runs your database. It is up to you to
             | make it fast.
        
               | icedchai wrote:
               | You'll need an engineer with database skills, not a
               | dedicated DBA. I haven't seen a small company with a full
               | time DBA in well over a decade. If you can learn a
               | programming language, you can learn about indexes and
               | basic tuning parameters (buffer pool, cache, etc.)
        
         | infecto wrote:
         | The problem you have here is by the time you reach the size of
         | this DB, you are on a special discount rate within AWS.
        
           | jacurtis wrote:
           | Discount rates are actually much better too on the bigger
           | instances. Therefore the "sticker price" that people compare
           | on the public site is no where close to a fair comparison.
           | 
           | We technically aren't supposed to talk about pricing
           | publically, but I'm just going to say that we run a few 8XL
           | and 12Xl RDS instances and we pay ~40% off the sticker price.
           | 
           | If you switch to Aurora engine the pricing is absurdly
           | complex (its basically impossible to determine without a
           | simulation calculator) but AWS is even more aggressive with
           | discounting on Aurora, not to mention there are some legit
           | amazing feature benefits by switching.
           | 
           | I'm still in agreeance that you could do it cheaper yourself
           | at a Data Center. But there are some serious tradeoffs made
           | by doing it that way. One is complexity and it certainly
           | requires several new hiring decisions. Those have their own
           | tangible costs, but there are a huge amount of intangible
           | costs as well like pure inconvenience, more people
           | management, more hiring, split expertise, complexity to
           | network systems, reduce elasticity of decisions, longer
           | commitments, etc.. It's harder to put a price on that.
           | 
           | When you account for the discounts at this scale, I think the
           | cost gap between the two solutions is much smaller and these
           | inconveniences and complexities by rolling it yourself are
           | sometimes worth bridging that smaller gap in cost in order to
           | gain those efficiencies.
        
             | jq-r wrote:
             | > but I'm just going to say that we run a few 8XL and 12Xl
             | RDS instances and we pay ~40% off the sticker price.
             | 
             | Genuinely curious, how do you that?
             | 
             | We pay a couple of million dollars per year and the biggest
             | spend is RDS. The bulk of those are 8xl and 12xl as you
             | mention and we have a lot of these. We do have savings
             | plans, but those are nowhere near 40%.
        
               | hardolaf wrote:
               | Yeah 40% seems like a pipedream. I was at a Fortune 500
               | defense firm and we couldn't get any cloud provider to
               | even offer us anything close to that discount if we
               | agreed to move to them for 3-4 years minimum. That org
               | ended up not migrating because it was significantly
               | cheaper to buy land and build datacenters from scratch
               | than to rent in the cloud.
        
               | overstay8930 wrote:
               | There are basically no discounts in govcloud
        
               | CubsFan1060 wrote:
               | At least according to: https://instances.vantage.sh/rds/?
               | selected=db.r6g.16xlarge,d...
               | 
               | It looks like a reserved instance is 35% off sticker
               | price? Add probably a discount and you'd be around 40%
               | off.
        
             | CubsFan1060 wrote:
             | The new Aurora pricing model helps, and is honestly the
             | only reason we're able to use it. It caps costs:
             | https://aws.amazon.com/blogs/aws/new-amazon-aurora-i-o-
             | optim...
        
         | nyc_data_geek wrote:
         | Some orgs are looking at moving back to on prem because they're
         | figuring this out. For a while it was vogue to go from capex to
         | opex costs, and C suite people were incentivized to do that via
         | comp structures, hence "digital transformation" ie: migration
         | to public cloud infrastructure. Now, those same orgs are
         | realizing that renting computers actually costs more than
         | owning them, when you're utilizing them to a significant
         | degree.
         | 
         | Just like any other asset.
        
           | nextos wrote:
           | Same experience here. As a small organization, the quotes we
           | got from cloud providers have always been prohibitively
           | expensive compared to running things locally, even when we
           | accounted for geographical redundancy, generous labor costs,
           | etc. Plus, we get to keep _know how_ and avoid lock-in, which
           | are extremely important things in the long term.
           | 
           | Besides, running things locally can be refreshingly simple if
           | you are just starting something and you don't need tons of
           | extra stuff, which becomes accidental complexity between you,
           | the problem, and a solution. This old post described that
           | point quite well by comparing Unix to Taco Bell:
           | http://widgetsandshit.com/teddziuba/2010/10/taco-bell-
           | progra.... See HN discussion:
           | https://news.ycombinator.com/item?id=10829512.
           | 
           | I am sure for some use-cases cloud services might be worth
           | it, especially if you are a large organization and you get
           | huge discounts. But I see lots of business types blindly
           | advocating for clouds, without understanding costs and
           | technical tradeoffs. Fortunately, the trend seems to be
           | plateauing. I see an increasing demand for people with HPC,
           | DB administration, and sysadmin skills.
        
             | layoric wrote:
             | > Plus, we get to keep know how and avoid lock-in, which
             | are extremely important things in the long term.
             | 
             | So much this. The "keep know how" has been so greatly
             | avoided over the past 10 years, I hope people with these
             | skills start getting paid more as more companies realize
             | the cost difference.
        
               | lanstin wrote:
               | When I started working in the 1980s (as a teenager but
               | getting paid) there was a sort of battle between the
               | (genuinely cool and impressive) closed technology of IBM
               | and the open world of open standards/interop like TCP/IP
               | and Unix, SMTP, PCs, even Novell sort of, etc. There was
               | a species of expert that knew the whole product offering
               | of IBM, all the model numbers and recommended solution
               | packages and so on. And the technology was good - I had
               | an opportunity to program a 3093K(?) CM/VMS monster with
               | APL and rexx and so on. Later on I had a job working with
               | AS/400 and SNADS and token ring and all that, and it was
               | interesting; thing is they couldn't keep up and the more
               | open, less greedy, hobbyists and experts working on Linux
               | and NFS and DNS etc. completely won the field. For
               | decades, open source, open standards, and
               | interoperability dominated and one could pick the best
               | thing for each part of the technology stack, and be
               | pretty sure that the resultant systems would be good. Now
               | however, the Amazon cloud stacks are like IBM in the
               | 1980s - amazingly high quality, but not open; the cloud
               | architects master the arcane set of product offerings and
               | can design a bespoke AWS "solution" to any problems. But
               | where is the openness? Is this a pendulum that goes back
               | and forth (and many IBM folks left IBM in the 1990s and
               | built great open technologies on the internet) or was it
               | a brief dawn of freedom that will be put down by the
               | capital requirements of modern compute and networking
               | stacks?
               | 
               | My money is on openness continuing to grow and more and
               | more pieces of the stack being completely owned by
               | openness (kernels anyone?) but one doesn't know.
        
               | nyc_data_geek wrote:
               | Even without owning the infrastructure, running in the
               | cloud without know-how is very dangerous.
               | 
               | I hear tell of a shop that was running on ephemeral
               | instance based compute fleets (EC2 spot instances, iirc),
               | with all their prod data in-memory. Guess what happened
               | to their data when spot instance availability cratered
               | due to an unusual demand spike? No more data, no more
               | shop.
               | 
               | Don't even get me started on the number of privacy
               | breaches because people don't know not to put customer
               | information in public cloud storage buckets.
        
             | hardolaf wrote:
             | I was part of a relatively small org that wanted us to move
             | to cloud dev machines. As soon as they saw the size of our
             | existing development docker images that were 99.9% vendor
             | tools in terms of disk space, they ran the numbers and told
             | us that we were staying on-prem. I'm fairly sure just
             | loading the dev images daily or weekly would be more
             | expensive than just buying a server per employee.
        
             | nicbou wrote:
             | Is there a bit of risk involved since the know-how has a
             | will of its own and sometimes gets sick?
             | 
             | If I had a small business with very clever people I'd be
             | very afraid of what happens if they're not available for a
             | while.
        
           | stingraycharles wrote:
           | That's made possible because of all the orchestration
           | platforms such as Kubernetes being standardized, and as such
           | you can get pretty close to a cloud experience while having
           | all your infrastructure on-premise.
        
             | nyc_data_geek wrote:
             | Yes, virtualization, overprovisioning and containerization
             | have all played a role in allowing for efficient enough
             | utilization of owned assets that the economics of cloud are
             | perhaps no longer as attractive as they once were.
        
           | oooyay wrote:
           | Context: I build internal tools and platforms. Traffic on
           | them varies, but some of them are quite active.
           | 
           | My nasty little secret is for single server databases I have
           | zero fear of over provisioning disk iops and running it on
           | SQLite or making a single RDBMS server in a container. I've
           | never actually run into an issue with this. It surprises me
           | the number of internal tools I see that depend on large RDS
           | installations that have piddly requirements.
        
             | DeathArrow wrote:
             | >making a single RDBMS server in a container
             | 
             | On what disk is the actual data written? How do you do
             | backups, if you do?
        
               | BirAdam wrote:
               | In most setups like this, it's going to be spinning rust
               | with mdadm, and MySQL dumps that get created via cron and
               | sent to another location.
        
             | dvfjsdhgfv wrote:
             | The problem with single instance is that while performance-
             | wise it's best (at least on bare metal), there comes a
             | moment when you simply have too much data and one machine
             | can't handle. Your your scenario, it may never come up, but
             | many organizations face this problem sooner or later.
        
               | oooyay wrote:
               | I agree, my point is that clusters are overused. Most
               | applications simply don't need them and it results in a
               | lot of waste. _Much_ of this has to do with engineers
               | being tasked with an assortment of roles these days, so
               | they obviously opt for the solution where a database and
               | upgrades are managed for them. I 've just found that
               | managing a single containers upgrades aren't that big of
               | an issue.
        
           | chii wrote:
           | i would imagine that cloud infrastructure has the ability for
           | fast scale up, unlike self-owned infrastructure.
           | 
           | For example, how long does it take to rent another rack that
           | you didnt plan for?
           | 
           | And not to mention that the cost of cloud management
           | platforms that you have to deploy to manage these owned
           | assets is not free.
           | 
           | I mean, how come even large consumers of electricity does not
           | buy and own their own infrastructure to generate it?
        
             | pinkgolem wrote:
             | >I mean, how come even large consumers of electricity do
             | not buy and own their own infrastructure to generate it?
             | 
             | They sure do? BASF has 3 power plants in Hamburg, Disney
             | operate Reedy Creek Energy with at least 1 power plant and
             | I could list a fair bit more...
             | 
             | >For example, how long does it take to rent another rack
             | that you didnt plan for?
             | 
             | I mean, you can also rent hardware a lot cheaper then on
             | AWS. There certainly are providers where you can rent out a
             | rack for a month within minutes
        
               | sseagull wrote:
               | Some universities also have their own power plants. It's
               | also becoming more common to at least supplement power on
               | campus with solar arrays.
        
             | tpetry wrote:
             | Ordering that amount of amount of servers takes about one
             | hour with hetzner. If you truly want a complete rack on
             | your own maybe a few days as they have to do it manually.
             | 
             | Most companies don't need to scale up full racks in
             | seconds. Heck, even weeks would be ok for most of them to
             | get new hardware delivered. The cloud planted the lie into
             | everyone's head that most companies dont have predictable
             | and stable load.
        
               | rajamaka wrote:
               | What would be the cost/time of scaling down a rack on
               | Hetzner?
        
               | pinkgolem wrote:
               | rental period is a month you can also use hetzner cloud,
               | which is still roughly 10x less expensive then aws and
               | that does not take into account the vastly cheaper
               | traffic
        
               | hardolaf wrote:
               | Most businesses could probably know server needs 6-12
               | months out. There's a small number of businesses in the
               | world that actually need dynamic scaling.
        
             | gorm wrote:
             | One other appealing alternative for smaller startups is to
             | run Docker on one burstable vm. This is a simple setup and
             | allows you to go beyond the cpu limits and also scale up
             | the vm.
             | 
             | Might be other alternatives than using Docker so if anyone
             | has tips for something simpler or easier to maintain,
             | appreciate a comment.
        
           | pinkgolem wrote:
           | Keep in mind, there is an in between..
           | 
           | I would have a hard time doing servers as cheap as hetzner
           | for example including the routing and everything
        
             | jwr wrote:
             | I do that. In fact I've been doing it for years, because
             | every time I do the math, AWS is unreasonably expensive and
             | my solo-founder SaaS would much rather keep the extra
             | money.
             | 
             | I think there is an unreasonable fear of "doing the routing
             | and everything". I run vpncloud, my server clusters are
             | managed using ansible, and can be set up from either a list
             | of static IPs or from a terraform-prepared configuration.
             | The same code can be used to set up a cluster on bare-metal
             | hetzner servers or on cloud VMs from DigitalOcean (for
             | example).
             | 
             | I regularly compare this to AWS costs and it's not even
             | close. Don't forget that the performance of those bare-
             | metal machines is _way_ higher than of overbooked VMs.
        
               | pinkgolem wrote:
               | I was more talking about physical backbone connection
               | which hetzner does for you.
               | 
               | We are using hetzner cloud.. but we are also scaling up
               | and down a lot right now
        
               | swores wrote:
               | Could you please explain what you mean by "physical
               | backbone connection", as I can't think of a meaning that
               | fits the context.
               | 
               | If you mean dealing with the physical dedicated servers
               | that can be rented from Hetzner, that's what the person
               | you replied to was talking about being not so difficult.
               | 
               | If you mean everything else at the data centre that makes
               | having a server there worthwhile (networking, power,
               | cooling, etc.) I don't think people were suggesting doing
               | that themselves (unless you're a big enough company to
               | actually be in the data centre business), but were
               | talking about having direct control of physical servers
               | in a data centre managed by someone like Hetzner.
               | 
               | (edit: and oops sorry I just realised I accidentally
               | downvoted your comment instead of up, undone and
               | rectified now)
        
               | pinkgolem wrote:
               | With "routing" I meant the backbone connection, which is
               | included in the hetzner price.
               | 
               | Aka if I add up power (including backup) + backbone
               | connection rental + server deprication I can not do it
               | for the hetzner price..
               | 
               | That was quite imprecise, sorry about that.
        
               | DeathArrow wrote:
               | I think no one talked about having physical server on
               | their own premises but colocating servers in a data
               | center or renting servers in a data center.
        
               | swores wrote:
               | No worries, easy to not foresee every possible way in
               | which strangers could interpret a comment!
               | 
               | But I think that people (at least jwr, and probably even
               | nyc_data_geek saying "on prem") are talking about cloud
               | (like AWS) vs. renting (or buying) servers that live in a
               | data centre run by a company like Hetzner, which can be
               | considered "on prem" if you're the kind of data centre
               | client who has building access to send your own staff
               | there to manage your servers (while still leaving
               | everything else, possibly even legal ownership and
               | therefore deprecation etc. to the data centre owner).
               | 
               | What you're thinking of - literally taking responsibility
               | for running your own mini data centre - I think is hardly
               | ever considered (at least in my experience), except by
               | companies at the extremes of size. If you're as big as
               | Facebook (not sure where the line is but obviously
               | including some companies not AS big as Meta but still
               | huge) then it makes sense to run your own data centres.
               | If you're a tiny business getting less than thousands of
               | website visits a day and where the website (or whatever
               | is being hosted) isn't so important that a day of
               | downtime every now and then isn't a big deal, then it's
               | not uncommon to host from the company's office itself
               | (just using a spare old PC or second hand cheap 1U
               | server, maybe a cheap UPS, and just connected to the main
               | internet connection that people in the office use, and
               | probably managed by a single employee, or company owner,
               | who happens to be geeky enough to think it's one or both
               | of simple or fun to set up a basic LAMP server, or even a
               | Windows server for its oh-so-lovely GUI).
        
               | fgonzag wrote:
               | You usually just do colocation. The data center will give
               | you a rack (or space for one), an upstream gateway to
               | your ISP, and redundant power. You still have to manage a
               | firewall and your internal network equipment, but its not
               | really that bad. I've used PFsense firewalls, configured
               | by them for like $1500, with roaming vpn, high
               | availability, point to point vpn, and as secure as
               | reasonably possible. After that it's the same thing as
               | the cloud except its physical servers.
        
               | pinkgolem wrote:
               | i mean, yes.. but you pay for that, and colocation +
               | server deprication in the case i calculated was higher
               | then just renting the servers
        
               | DeathArrow wrote:
               | 100% agree. People still think that maintaining
               | infrastructure is very hard and requires lot of people.
               | What they disregard is that using cloud infrastructure
               | also requires people.
        
               | tormeh wrote:
               | When talking about Hetzner pricing, please don't change
               | the subject to AWS pricing. The two have nothing in
               | common, and intuition derived from one does not transfer
               | to the other.
        
               | dvfjsdhgfv wrote:
               | > please don't change the subject to AWS pricing
               | 
               | Why? The only reason I'm using Hetzner and not AWS for
               | several of my own projects (even though I know AWS much
               | better since this is what I use at work) is an enormous
               | price difference in each aspect (compute, storage,
               | traffic).
        
               | KronisLV wrote:
               | > The two have nothing in common
               | 
               | If all you need are some cloud servers, or a basic load
               | balancer, they _are_ pretty much the same.
               | 
               | If you need a plethora of managed services and don't want
               | to risk getting fired over your choice or specifics of
               | how that service is actually rendered, they are nothing
               | alike and you should go for AWS, or one of the other
               | large alternatives (GCP, Azure etc.).
               | 
               | On the flip side, if you are using AWS or one of those
               | large platforms as a glorified VPS host and you aren't
               | doing this in an enterprise environment, outside of
               | learning scenarios, you are probably doing something
               | wrong and you should look at Hetzner, Contabo, or one of
               | those other providers, though some can still be a bit
               | pricey - DigitalOcean, Vultr, Scaleway etc.
        
           | jumploops wrote:
           | Funny story time.
           | 
           | I was once part of an acquisition from a much larger
           | corporate entity. The new parent company was in the middle of
           | a huge cloud migration, and as part of our integration into
           | their org, we were required to migrate our services to the
           | cloud.
           | 
           | Our calculations said it would cost 3x as much to run our
           | infra on the cloud.
           | 
           | We pushed back, and were greenlit on creating a hybrid
           | architecture that allowed us to launch machines both on-prem
           | and in the cloud (via a direct link to the cloud datacenter).
           | This gave us the benefit of autoscaling our volatile
           | services, while maintaining our predictable services on the
           | cheap.
           | 
           | After I left, apparently my former team was strong-armed into
           | migrating everything to the cloud.
           | 
           | A few years go by, and guess who reaches out on LinkedIn?
           | 
           | The parent org was curious how we built the hybrid infra, and
           | wanted us to come back to do it again.
           | 
           | I didn't go back.
        
             | smitty1e wrote:
             | My funny story is built on the idea that AWS is Hotel
             | California for your data.
             | 
             | A customer had an interest in merging the data from an
             | older account into a new one, just to simplify matters.
             | Enterprise data. Going back years. Not even leaving the
             | region.
             | 
             | The AWS rep in the meeting kinda pauses, says: "We'll get
             | back to you on the cost to do that."
             | 
             | The sticker shock was enough that the customer simply
             | inherited the old account, rather than making things tidy.
        
               | hhsectech wrote:
               | Eh? I've never had a problem moving data out of AWS.
               | 
               | Have people lost the ability to write export and backup
               | scripts?
        
               | Draiken wrote:
               | The ingress/egress cost is ridiculously high. Some
               | companies don't care, but it is there and I've seen it
               | catch people off guard multiple times.
        
               | varjag wrote:
               | Oh come on from the description both accounts could be
               | sitting on the same datacenter LAN.
        
               | LadyCailin wrote:
               | It's the cost of data egress, which isn't free.
        
               | mciancia wrote:
               | But there is no paid egress when we are moving data
               | between account within one region, rigth?
        
               | storyinmemo wrote:
               | There is. You pay a price for any cross-VPC traffic.
        
               | CubsFan1060 wrote:
               | This isn't true, at least not anymore.
               | 
               | You can peer two vpc's and as long as you are
               | transferring within the same (real) AZ, it's free:
               | https://aws.amazon.com/about-aws/whats-
               | new/2021/05/amazon-vp...
               | 
               | Even peered VPC's only pay "normal" prices:
               | https://aws.amazon.com/ec2/pricing/on-
               | demand/#Data_Transfer
               | 
               | "Data transferred "in" to and "out" from Amazon EC2,
               | Amazon RDS, Amazon Redshift, Amazon DynamoDB Accelerator
               | (DAX), and Amazon ElastiCache instances, Elastic Network
               | Interfaces or VPC Peering connections across Availability
               | Zones in the same AWS Region is charged at $0.01/GB in
               | each direction."
        
               | interroboink wrote:
               | My (peripheral) experience is that it is much cheaper to
               | get data in than to get data out. When you have the
               | amount of data being discussed -- "Enterprise data. Going
               | back years." -- that can get very costly.
               | 
               | It's the amount of data where it makes more sense to put
               | hard drives on a truck and drive across the country
               | rather than send it over a network, where this becomes an
               | issue (actually, probably a bit _before_ then).
        
               | fcarraldo wrote:
               | AWS actually has a service for this - Snowmobile, a
               | storage datacenter inside of a shipping container, which
               | is driven to you on a semi truck.
               | https://aws.amazon.com/snowmobile/
        
               | xmcqdpt2 wrote:
               | They do not!
               | 
               | > Q: Can I export data from AWS with Snowmobile? > >
               | Snowmobile does not support data export. It is designed
               | to let you quickly, easily, and more securely migrate
               | exabytes of data to AWS. When you need to export data
               | from AWS, you can use AWS Snowball Edge to quickly export
               | up to 100TB per appliance and run multiple export jobs in
               | parallel as necessary. Visit the Snowball Edge FAQs to
               | learn more.
               | 
               | https://aws.amazon.com/snowmobile/faqs/?nc2=h_mo-lang
               | 
               | Why would they make it convenient to leave?
        
               | fcarraldo wrote:
               | Oh, TIL! Thanks for correcting me.
        
               | brickteacup wrote:
               | That's only for data into AWS though, not data out
        
               | Shorel wrote:
               | Just in network costs, there's a huge asymmetry.
               | Uploading data to AWS is free. Downloading data from
               | them, you have to pay.
               | 
               | When you have enough data, that cost is quite
               | significant.
        
               | mijoharas wrote:
               | There's a cost for data egress (but not ingress)
        
               | banku_brougham wrote:
               | Is R2 a sensible option for hosting data? I understand
               | egress is chesp.
        
               | stickfigure wrote:
               | R2 is great. Our GCS bill (almost all egress) jumped from
               | a few hundred dollars a month to a couple thousand
               | dollars a month last year due to a usage spike. We rush-
               | migrated to R2 and now that part of the bill is $0.
               | 
               | I've heard some people here on HN say that it's slow, but
               | I haven't noticed a difference. We're mainly dealing with
               | multi-megabyte image files, so YMMV if you have a
               | different workload.
        
             | hhsectech wrote:
             | There are two possible scenarios here. Firstly, they can't
             | find the talent to support what you implemented...or more
             | likely, your docs suck!
             | 
             | I've made a career out of inheriting other peoples whacky
             | setups and supporting them (as well as fixing them) and
             | almost always its documentation that has prevented the
             | client getting anywhere.
             | 
             | I personally dont care if the docs are crap because usually
             | the first thing I do is update / actually write the docs to
             | make them usable.
             | 
             | For a lot of techs though crap documentation is a deal
             | breaker.
             | 
             | Crap docs aren't always the fault of the guys implementing
             | though, sometimes there are time constraints that prevent
             | proper docs being written. Quite frequently though its
             | outsourced development agencies that refuse to write it
             | because its "out of scope" and a "billable extra". Which I
             | think is an egregious stance...doxs Should be part and
             | parcel of the project. Mandatory.
        
               | smokel wrote:
               | I agree that bad documentation is a serious problem in
               | many cases. So much so that your suggestion to write the
               | documentation after the fact can become quite impossible.
               | 
               | If there is only one thing that juniors should learn
               | about writing documentation (be it comments or design
               | documents), it is this: document _why_ something is
               | there. If resources are limited, you can safely skip
               | comments that describe _how_ something works, because
               | that information is also available in code.
               | 
               | (It might help to describe _what_ is available,
               | especially if code is spread out over multiple
               | repositories, libraries, teams, etc.)
               | 
               | (Also, I suppose the comment I'm responding to could've
               | been slightly more forgiving to GP, but that's another
               | story.)
        
               | lazyasciiart wrote:
               | Unfortunately it's also possible that e.g the company
               | switched from share point to confluence and lost half the
               | entire knowledge base because it wasn't labeled the way
               | they thought it was. Or that the docs were all purged
               | because they were part of an abandoned project.
        
               | adrianmsmith wrote:
               | > Quite frequently though its outsourced development
               | agencies that refuse to write it
               | 
               | It's also completely against their interest to write docs
               | as it makes their replacement easier.
               | 
               | That's why you need someone competent on the buying side
               | to insist on the docs.
               | 
               | A lot of companies outsource because they _don 't_ have
               | this competency themselves. So it's inevitable that this
               | sort of thing happens and companies get locked in and
               | can't replace their contractors, because they don't have
               | any docs.
        
               | thelastparadise wrote:
               | > the first thing I do is update / actually write the
               | docs to make them usable.
               | 
               | OK so the docs are in sync for a single point of time
               | when you finish. Plus you get to have the context in your
               | head (bus factor of 1, job security for you, bad for the
               | org.)
               | 
               | How about if we just write clean infra configs/code,
               | stick to well known systems like docker, ansible, k8s,
               | etc.
               | 
               | Then we can make this infra code available to an on prem
               | LLM and ask it questions as needed without it drifting
               | out of sync overtime as your docs surely will.
               | 
               | Wrong documentation is worse than no documentation.
        
               | maxrecursion wrote:
               | "Crap docs aren't always the fault of the guys
               | implementing though, sometimes there are time constraints
               | that prevent proper docs being written."
               | 
               | I can always guarantee a stream of consciousness one note
               | that should have most of the important data, and a few
               | docs about the most important parts. It's up to
               | management if they want me to spend time turning that one
               | note into actual robust documentation that is easily
               | read.
        
               | ZoomerCretin wrote:
               | Documentation? What for? It's self-documenting (to me,
               | because I wrote it)!
        
             | nyc_data_geek wrote:
             | Yes, I do believe autoscaling is actually a good use case
             | for public cloud. If you have bursty load that requires a
             | lot of resources at peak which would sit idle most of the
             | time, probably doesn't make sense to own what you need for
             | those peaks.
        
           | throwawaaarrgh wrote:
           | It's not an either/or. Many business both own and rent
           | things.
           | 
           | If price is the only factor, your business model (or
           | executives' decision-making) is questionable. Buy only the
           | cheapest shit, spend your time building your own office chair
           | rather than talking to a customer, you aren't making a
           | premium product, and that means you're not differentiated.
        
         | Scubabear68 wrote:
         | Elsewhere today I recommended RDS, but was thinking of small
         | startup cases that may lack infrastructure chops.
         | 
         | But you are totally right it can be expensive. I worked with a
         | startup that had some inefficient queries, normally it would
         | matter, but with RDS it cost $3,000 a month for a tiny user
         | base and not that much data (millions of rows at most).
        
           | rswail wrote:
           | That sounds like the app needs some serious surgery.
        
         | j16sdiz wrote:
         | In another section , they mentioned they don't have DBA, no app
         | team own the database and the infra team is overwhelmed.
         | 
         | RDS make perfect sense for them
        
         | osigurdson wrote:
         | Cloud was supposed to be a commodity. Instead it is priced like
         | at burger at the ski hill.
        
         | Sparkyte wrote:
         | Data isn't cheap never was. Paying the licensing fees on top
         | make it more expensive. It really depends on the circunstance a
         | managed database usually has exended support from the compaany
         | providing it. You have to weigh a team's expertise to manage a
         | solution on your own and ensure you spent ample time making it
         | resilient. Other half is the cost of upgrading hardware
         | sometimes it is better to just outright pay a cloud provider if
         | you business does not have enough income to outright buy
         | hardware.There is always an upfront cost.
         | 
         | Small databases or test environment databases you can also
         | leverage kubernetes to host an operator for that tiny DB. When
         | it comes to serious data and it needs a beeline recovery
         | strategy RDS.
         | 
         | Really it should be a mix self hosted for things you aren't
         | afraid to break. Hosted for the things you put at high risk.
        
         | silisili wrote:
         | Even for small workloads it's a difficult choice. I ran a small
         | but vital db, and RDS was costing us like 60 bucks a month per
         | env. That's 240/month/app.
         | 
         | DynamoDB as a replacement, pay per request, was essentially
         | free.
         | 
         | I found Dynamo foreign and rather ugly to code for initially,
         | but am happy with the performance and especially price at the
         | end.
        
         | dfgdfg34545456 wrote:
         | For big companies such as banks this cost comparison is not as
         | straight forward. They have whole data centres just sitting
         | there for disaster recovery. They periodically do switchovers
         | to test DR. All of this expense goes away when they migrate to
         | cloud.
        
           | nightfly wrote:
           | > All of this expense goes away when they migrate to cloud.
           | 
           | Just to pay someone else enough money to provide the same
           | service and make a profit while do it
        
             | dfgdfg34545456 wrote:
             | Well corporations pay printers to do their printing because
             | they don't want to be in the business of printing. It's the
             | same with infrastructure, a lot of corporations simply
             | don't want to be in the data centre business.
        
             | jabradoodle wrote:
             | That's how nearly every aspect of every business works;
             | would you you start a bakery by learning construction and
             | building it yourself?
        
               | AtlasBarfed wrote:
               | Construction is a one time cost. It infrastructure is in
               | constant use.
               | 
               | It's like accounting and finance. Yeah a lot of companies
               | use tax firms, but they all have finance and accounting
               | in-house.
        
           | graemep wrote:
           | > All of this expense goes away when they migrate to cloud.
           | 
           | They need to replicate everything in multiple availability
           | zones, which is going to be more expensive than replicating
           | data centres.
           | 
           | They still need to test their cloud infrastracuture works.
        
         | AtNightWeCode wrote:
         | In your case it sounds more viable to move to VMs instead of
         | RDS, which some cloud providers also recommend.
        
         | prisenco wrote:
         | From what I've read, a common model for mmorpg companies is to
         | use on-prem or colocated as their primary and then provision a
         | cloud service for backup or overage.
         | 
         | Seems like a solid cost effective approach for when a company
         | reaches a certain scale.
        
           | hardolaf wrote:
           | Lots of companies, like Grinding Gear Games and Square Enix,
           | just rent whole servers for a tiny fraction of the price
           | compared to what the price gouging cloud providers would
           | charge for the same resources. They get the best of both
           | worlds. They can scale up their infrastructure in hours or
           | even minutes and they can move to any other commodity
           | hardware in any other datacenter at the drop of a hat if they
           | get screwed on pricing. Migrating from one server provider
           | (such as IBM) to another (such as Hetzner) can take an
           | experienced team 1-2 weeks at most. Given that pricing
           | updates are usually given 1-3 quarters ahead at a minimum,
           | they have massive leverage over their providers because they
           | an so easily switch. Meanwhile, if AWS decides to jack up
           | their prices, well you're pretty much screwed in the short-
           | term if you designed around their cloud services.
        
         | fulafel wrote:
         | I'd add another criticism to the whole quote:
         | 
         | > Data is the most critical part of your infrastructure. You
         | lose your network: that's downtime. You lose your data: that's
         | a company ending event. The markup cost of using RDS (or any
         | managed database) is worth it.
         | 
         | You need well-run, regularly tested, air gapped or otherwise
         | immutable backups of your DB (and other critical biz data).
         | Even if RDS was perfect, it still doesn't protect you from the
         | things that backups protect you from.
         | 
         | After you have backups, the idea of paying enormous amounts for
         | RDS in order to keep your company from ending is more far
         | fetched.
        
         | afpx wrote:
         | That's the cost of two people.
        
         | raffraffraff wrote:
         | I agree that RDS is stupidly expensive and not worth it
         | provided that the company actually hires at least 2x full-time
         | database owners who monitor, configure, scale and back up
         | databases. Most startups will just save the money and let
         | developers "own" their own databases or "be responsible for"
         | uptime and backups.
        
           | rr808 wrote:
           | For a couple hundred grand you can get a team of 20 fully
           | trained people working full time in most parts of the world.
        
       | CSMastermind wrote:
       | So by and large I agree with the things in this article. It's
       | interesting that the points I disagree with the author on are all
       | SaaS products:
       | 
       | > Moving off JIRA onto linear
       | 
       | I don't get the hype. Linear is fine and all but I constantly
       | find things I either can't or don't know how to do. How do I make
       | different ticket types with different sets of fields? No clue.
       | 
       | > Not using Terraform Cloud No Regrets
       | 
       | I generally recommend Terraform Cloud - it's easy for you to grow
       | your own in house system that works fine for a few years and
       | gradually ends up costing you in the long run if you don't.
       | 
       | > GitHub actions for CI/CD Endorse-ish
       | 
       | Use Gitlab
       | 
       | > Datadog Regret
       | 
       | Strong disagree - it's easily the best monitoring/observability
       | tool on the market by a wide margin.
       | 
       | Cost is the most common complaint and it's almost always from
       | people who don't have it configured correctly (which to be fair
       | Datadog makes it far too easy to misconfigure things and blow up
       | costs).
       | 
       | > Pagerduty Endorse
       | 
       | Pagerduty charges like 10x what Opsgenie does and offers no
       | better functionality.
       | 
       | When I had a contract renewal with Pagerduty I asked the sales
       | rep what features they had that Opsgenie didn't.
       | 
       | He told me they're positioning themselves as the high end brand
       | in the market.
       | 
       | Cool so I'm okay going generic brand for my incident reporting.
       | 
       | Every CFO should use this as a litmus test to understand if their
       | CTO is financially prudent IMO.
        
         | crabmusket wrote:
         | We moved from Trello to Linear and it's been fantastic. I hope
         | to never work at an organisation large enough for JIRA to be a
         | good idea.
        
           | CSMastermind wrote:
           | To be fair Linear does strike me as everything everyone
           | always hoped Trello would be.
           | 
           | So if that's the upgrade path you're going down I'd expect it
           | to be fantastic.
        
           | cqqxo4zV46cp wrote:
           | Newer (aka next gen aka Team-managed) Jira projects are
           | pretty solid.
        
             | FridgeSeal wrote:
             | Do jira pages still take 30 seconds to load, and have all
             | the interaction speed of cold molasses? Does it have nice
             | keyboard shortcuts yet? Do I still need to perform an
             | arcane ritual of setup to get the ticket statuses to be
             | what I want?
             | 
             | Linear has been such a breath of fresh air, with such a
             | solid desktop app (on Mac OS) that I don't ever want to go
             | back. Stuff happens _instantly_ , the layout and semantics
             | are an excellent "90% good enough" that I would happily
             | relegate jira to only the most enterprise of enterprise
             | projects.
        
               | coffeebeqn wrote:
               | At one of the bigger companies I was at we had an on-prem
               | JIRA in the same office building and it was still so slow
               | that I would often forget why I was loading that page
        
               | Cacti wrote:
               | trigger warning please on the Jira stuff
        
               | crabmusket wrote:
               | Linear is making (fairly) good on the promises of local-
               | first software. As opposed to "every click is a round
               | trip to the server" software.
        
               | mjfisher wrote:
               | No, Jira loading is relatively OK and on par with other
               | SPAs. It's got a CTRL+SHIFT+P style actions menu for
               | tickets which helps cut down on point and click pain
               | (especially for linking issues etc). Setting up statues
               | and workflows and how they map to a board is relatively
               | straightforward.
               | 
               | There are lots of things where Jira falls short, but the
               | pain points on an under-resourced self hosted instance of
               | ten years ago are nothing like the ones you'll find on
               | Jira cloud today.
        
               | aniforprez wrote:
               | Does Jira still have multiple flavours of markdown for
               | different fields and editors? Last I used it, it used a
               | different flavour for creating and editing a ticket. Also
               | another flavour for bitbucket. None of these were
               | compatible and it would convert between them in the
               | backend but I was left confused every time when I would
               | have to switch formatting styles
        
               | mjfisher wrote:
               | I remember that from a while back, and getting annoyed -
               | it doesn't appear to be something that annoys me at the
               | moment so it might have been fixed, but on reflection I
               | tend to just use the default rich text editor now.
               | 
               | It takes markdownish input but converts it to rich text
               | as you type - so asterisk-space starts a bullet point
               | list, etc.
               | 
               | I actually can't remember if it has a dedicated markdown
               | mode anymore; the rich text editing supports the usual
               | shortcuts that mean I tend to stick with it.
        
         | tootie wrote:
         | Interesting. Atlassian also just launched an integration with
         | OpsGenie. I have the same opinion of JIRA. I've tried many
         | competitors (not Linear so far) and regretted it every time.
        
           | Jedd wrote:
           | > Atlassian also just launched an integration with OpsGenie.
           | 
           | Given Atlassian bought OpsGenie in 2018, this either
           | somewhere between _quite late_ and _unsurprising_.
        
             | rswail wrote:
             | Two different measurements (time and Atlassian development
             | processes) that are orthogonal.
             | 
             | Anything Atlassian does is mostly quite late and its
             | integration story is so pathetic that it's unsurprising.
             | 
             | Try to have a bitbucket pipeline that pushes to confluence.
             | Seems like a basic integration to have, after all,
             | Confluence has an API (well, actually it has 3 different
             | ones) so surely Atlassian would make a basic thing like
             | "publish a wiki page" a thing you get out of the box.
             | 
             | Nope.
        
               | Jedd wrote:
               | Oh, I am no great fan. Plus I have a nascent blog post on
               | the subject of 'can you believe ...?' items around this
               | subject.
               | 
               | I suppose it comes back to the comparative priorities (as
               | evaluated by recurrent revenue) of ticking rfq boxes vs
               | solving actual problems.
        
           | jacurtis wrote:
           | I'm not sure they just launched anything. OpsGenie has been
           | an Atlassian product for 5 or more years now. I've been using
           | it for 3-4 myself and its been integrated with Jira the whole
           | time.
           | 
           | In fact, OpsGenie has mostly been on Auto-pilot for a few
           | years now.
        
         | steveBK123 wrote:
         | Agreed on PagerDuty It doesn't really do a lot, administrating
         | it is fairly finicky, and most shops barely use half the
         | functionality it has anyway.
         | 
         | To me its whole schedule interface is atrocious for its price,
         | given from an SRE/dev perspective, that's literally its purpose
         | - scheduled escalations.
        
         | colechristensen wrote:
         | PagerDuty's cheapest plan is $21 per user month
         | 
         | OpsGenie's cheapest is $9 per user month but arbitrarily
         | crippled, the plan anybody would want to use is $19 per user
         | month
         | 
         | So instead of a factor of ten it's ten percent cheaper. And i
         | just kind of expect Atlassian to suck.
         | 
         | Datadog is ridiculously expensive and on several occasions I've
         | run into problems where an obvious cause for an incident was
         | hidden by bad behavior of datadog.
        
           | compumike wrote:
           | Heii On-Call is $32 per month total for your team -- not per
           | user. https://heiioncall.com/ (Full disclosure: part of the
           | team building it)
        
             | avemg wrote:
             | How do you pronounce that?
        
               | revscat wrote:
               | "Hey".
        
             | solatic wrote:
             | Looks super interesting, and that $3/month for hobbyists is
             | just low enough to meet my budget for hobby services, but
             | please, for on-call stuff, you gotta have alerts that make
             | phone calls. Nothing else is going to wake me in the middle
             | of the night. This is the #1 feature I expect from an on-
             | call service - you're on-call because you will be _called_.
        
               | compumike wrote:
               | Thanks for the feedback!
               | 
               | We use iOS "Critical Alerts" and similar on Android that
               | breaks through any Do-Not-Disturb settings.
               | https://heiioncall.com/blog/better-alerting-for-heii-on-
               | call... Would you be willing to give that a shot? It
               | wakes me every time :)
               | 
               | (It's configurable too; we have vibrate-only or silenced
               | modes. Think old-school beeper.)
               | 
               | In the rare case that it doesn't wake you, we have
               | configurable escalation strategies to alert someone else
               | on your team after a configurable number of minutes.
        
               | mads_quist wrote:
               | We are building a great and affordable incident
               | escalation tool as well:
               | 
               | https://allquiet.app
               | 
               | With SMS, Phone Calls and Critical Alerts / DnD override.
               | 
               | We're 5 USD/user.
               | 
               | We try to build as close to our users as possible. Happy
               | for any new try outs! :)
               | 
               | (I am co founder)
        
           | jpb0104 wrote:
           | I just started building out on-call rotation scheduling to
           | fit teams that already have an alerting solution and need
           | simple automated scheduling. I'd love to get some feedback:
           | https://majorpager.com
        
           | skrtskrt wrote:
           | Grafana OnCall can be self hosted for free or you can pay $20
           | a month, and still always have the option to migrate to self
           | hosting if you want to save money
        
         | macNchz wrote:
         | > Cost is the most common complaint and it's almost always from
         | people who don't have it configured correctly (which to be fair
         | Datadog makes it far too easy to misconfigure things and blow
         | up costs).
         | 
         | I loved Datadog 10 years ago when I joined a company that
         | already used it where I never once had to think about pricing.
         | It was at the top of my list when evaluating monitoring tools
         | for my company last year, until I got to the costs. The pricing
         | page itself made my head swim. I just couldn't get behind
         | subscribing to something with pricing that felt designed to be
         | impossible to reason about, even if the software is best in
         | class.
        
           | gen220 wrote:
           | I'm a big fan of Datadog from multiple angles.
           | 
           | Their pricing setup is evil. Breaking out by SKUs and having
           | 10+ SKUs is fine, trialing services with "spot" prices before
           | committing to reserved capacity is also fine.
           | 
           | But (for some SKUs, at least) they make it really difficult
           | to be confident that the reserved capacity you're purchasing
           | will cover your spot use cases. Then, they make you contact a
           | sales rep to lower your reserved capacity.
           | 
           | It all feels designed to get you to pay the "spot" rate for
           | as long as possible, and it's not a good look.
           | 
           | I understand the pressures on their billing and sales teams
           | that lead to these patterns, but they don't align with their
           | customers in the long term. I hope they clean up their act,
           | because I agree they're losing some set of customers over it.
        
             | viraptor wrote:
             | Another annoying thing is that the billing dashboards do
             | not map clearly to what's on the pricing pages / in the
             | contract. Good luck figuring out the extras for RUM when
             | you have multiple orgs.
             | 
             | Then they have things that I wanted to try for a long time,
             | but... support doesn't care? Repeated "would you like to
             | use this? / very likely, can we try it out? / (silence)". I
             | love their product, but they are so annoying to deal with
             | at the billing level.
        
               | iaresee wrote:
               | > Another annoying thing is that the billing dashboards
               | do not map clearly to what's on the pricing pages / in
               | the contract. Good luck figuring out the extras for RUM
               | when you have multiple orgs.
               | 
               | I, quite literally, was griping to my Datadog CSM about
               | this exact thing last week. They'll email me and be, "Oh,
               | you know you're logging volume this month put you into
               | on-demand indexing rates, right?" and my answer is
               | always, "No, because your monitoring platform makes it
               | nearly impossible for me to monitor it correctly."
               | 
               | You can't reference your contracted volume rates when
               | building monitors out and the units for the metrics you
               | need to watch don't match the units you contract with
               | them on the SKU.
               | 
               | Maddening.
        
               | Solvency wrote:
               | And why do you continue to deal with scum like this?
               | You're ultimately going to pay it and business will carry
               | on as usual for them.
        
               | kevinslin wrote:
               | > You can't reference your contracted volume rates when
               | building monitors out and the units for the metrics you
               | need to watch don't match the units you contract with
               | them on the SKU.
               | 
               | Are you referring to the
               | `datadog.estimated_usage.logs.ingested_events` metric? It
               | includes excluded events by default but you can get to
               | your indexed volume by excluding excluded logs. `sum:data
               | dog.estimated_usage.logs.ingested_events{datadog_index:*,
               | datadog_is_excluded:false}.as_count()`
        
           | jacurtis wrote:
           | > Datadog makes it far too easy to misconfigure things and
           | blow up costs
           | 
           | I'll give you a fun example. It's fresh in my mind because i
           | just got reamed out about it this week.
           | 
           | In our last contract with DataDog, they convinced us to try
           | out the CloudSIEM product, we put in a small $600/mo
           | committment to it to try it out. Well, we never really set it
           | up and it sat on autopilot for many months. We fell under our
           | contract rate for it for almost a year.
           | 
           | Then last month we had some crazy stuff happen and we were
           | spamming logs into DataDog for a variety of reasons. I knew I
           | didn't want to pay for these billions of logs to be indexed,
           | so I made an exclusion filter to keep them out of our log
           | indexes so we didn't have a crazy bill for log indexing.
           | 
           | So our rep emailed me last week and said "Hey just a heads up
           | you have $6,500 in on-demand costs for CloudSIEM, I hope that
           | was expected". No, it was NOT expected. Turns out excluding
           | logs from indexing does not exclude them from CloudSIEM. Fun
           | fact, you will not find any documented way to exclude logs
           | from CloudSIEM ingestion. It is technically possible, but
           | only through their API and it isn't documented. Anyway, I
           | didn't do or know this, so now i had $6,500 of on-demand
           | costs plus $400-500 misc on-demand costs that I had to
           | explain to the CTO.
           | 
           | I should mention my annual review/pay raise is also next week
           | (I report to the CTO), so this will now be fresh in their
           | mind for that experience.
        
             | macNchz wrote:
             | That's just the sort of hypothetical scenario that kept
             | running through my head as I tried to find a way for us to
             | use Datadog. I even particularly wanted to use the
             | CloudSIEM product. Bummer.
        
         | xtracto wrote:
         | DatDog is a freaking beast. NY wife works in workday (a huge
         | employee management system) and they have a very large number
         | of tutorials, videos, "working hours" and other tools to ensure
         | their customers are making the best use of it.
         | 
         | Datadog on the other side... their "DD University" is a shame
         | and we as paying customers are overwhelmed and with no real
         | guidance. DD should assign some time for integration for new
         | customers, even if it is proportional to what you pay annually.
         | (I think I pay around 6000 usd annually.
        
         | jacurtis wrote:
         | I mostly agreed with OP's article, but you basically nailed all
         | of the points of disagreement I did have.
         | 
         | Jira: Its overhyped and overpriced. Most HATE jira. I guess I
         | don't care enough. I've never met a ticket system that I loved.
         | Jira is fine. Its overly complex sure. But once you set it up,
         | you don't need to change it very often. I don't love it, I
         | don't hate it. No one ever got fired for choosing Jira, so it
         | gets chosen. Welcome to the tech industry.
         | 
         | Terraform Cloud: The gains for Terraform Cloud are minimal. We
         | just use Gitlab for running Terraform pipelines and have a
         | super nice custom solution that we enjoy. It wasn't that hard
         | to do either. We maintain state files remotely in S3 with
         | versioning for the rare cases when we need to restore a
         | foobar'd statefile. Honestly I like having Terraform pipelines
         | in the same place as the code and pipelines for other things.
         | 
         | GitHub Actions: Yeah switch to GitLab. I used to like Github
         | Actions until I moved to a company with Gitlab and it is best
         | in class, full stop. I could rave about Gitlab for hours. I
         | will evangelize for Gitlab anywhere I go that is using anything
         | else.
         | 
         | DataDog: As mentioned, DataDog is the best monitoring and
         | observability solution out there. The only reason NOT to use it
         | is the cost. It is absurdly expensive. Yes, truly expensive. I
         | really hate how expensive it is. But luckily I work somewhere
         | that lets us have it and its amazing.
         | 
         | Pagerduty: Agree, switch to OpsGenie. Opsgenie is considerably
         | cheaper and does all the pager stuff of Pager duty. All the
         | stuff that PagerDuty tries to tack on top to justify its cost
         | is stuff you don't need. OpsGenie does all the stuff you need.
         | Its fine. Similar to Jira, its not something anyone wants
         | anyway. No ones going to love it, no one loves being on call.
         | So just save money with OpsGenie. If you're going to fight for
         | the "brand name" of something, fight for DataDog instead, not a
         | cooler pager system.
        
           | bigstrat2003 wrote:
           | I'm right there with you on Jira. The haters are wrong - it's
           | a decent enough ticket system, no worse than anything else
           | I've used. You can definitely torture Jira into something
           | horrible, but that's not Jira's fault. Bad managers will ruin
           | _any_ ticket system if they have the customization tools to
           | do so.
        
             | Cacti wrote:
             | Using Jira feels like using IBM enterprise web software
             | from 2005, and I am simply not going to make my teams put
             | up with that amount of inanity.
        
               | rswail wrote:
               | We switched to JIRA around 2005 _away_ from IBM
               | enterprise web software, because it was a breath of fresh
               | air.
               | 
               | So on the standard tech hype cycle, that sounds about
               | right.
        
               | mixmastamyk wrote:
               | Found the person who never used _Lotus Notes_ haha.
        
               | Cacti wrote:
               | I was blown away when I found out a couple years ago that
               | there were major corporations still using that as their
               | primary communication platform.
        
               | mixmastamyk wrote:
               | Surely has improved in the last 20+ years? :hope:
        
             | matwood wrote:
             | Yeah, usually Jira hate is really convoluted company
             | process hate. Of course the Jira software isn't perfect,
             | but it's fine. Jira's strength and weakness is it's
             | flexibility.
        
         | benced wrote:
         | After their ridiculous outage, I wouldn't touch OpsGenie with a
         | 10ft pole.
        
         | mardifoufs wrote:
         | Why gitlab? GitHub actions are a mess but gitlab online's ci cd
         | is not much better at all, and for self hosted it opens a whole
         | different can of worms. At least with GitHub actions you have a
         | plugin ecosystem that makes the super janky underlying platform
         | a bit more bearable.
        
           | YoshiRulz wrote:
           | I've found GitLab CI's "DAG of jobs" model has made
           | maintenance and, crucially for us, optimisation relatively
           | easy. Then I look into GitHub Actions and... where are the
           | abstraction tools? How do I cache just part of my "workflow"?
           | Plugins be damned. GitLab CI is so good that I'm willing to
           | overlook vendor lock-in and YAML, and use it for our GitHub
           | project even without proper integration. (Frankly the rest of
           | GitLab seems to always be a couple features ahead, but no-
           | one's willing to migrate.)
        
             | mardifoufs wrote:
             | Mhmm that's actually a good point!! I didn't realize that I
             | couldn't do that with GitHub, I never really used partial
             | caching. I just had a lot (a looot) of issues with our
             | kubernetes runner (which I even made sure to be as close to
             | the vanilla docs example as possible). I guess the grass is
             | always greener on the other side :)
        
         | bilalq wrote:
         | Linear has a lot going for it. It doesn't support custom
         | fields, so if that's a critical feature for you, I can see it
         | falling short. In my experience though, custom fields just end
         | up being a mess anytime a manager changes and decides to do
         | things differently, things get moved around teams, etc.
         | 
         | - It's fast. It's wild that this is a selling point, but it's
         | actually a huge deal. JIRA and so many other tools like it are
         | as slow as molasses. Speed is honestly the biggest feature.
         | 
         | - It looks pretty. If your team is going to spend time there,
         | this will end up affecting productivity.
         | 
         | - It has a decent degree of customization and an API. We've
         | automated tickets moving across columns whenever something gets
         | started, a PR is up for review, when a change is merged, when
         | it's deployed to beta, and when it's deployed to prod. We've
         | even built our own CLI tools for being able to action on Linear
         | without leaving your shell.
         | 
         | - It has a lot of keyboard shortcuts for power users.
         | 
         | - It's well featured. You get teams, triaging, sprints
         | (cycles), backlog, project management, custom views that are
         | shareable, roadmaps, etc...
        
         | marcinzm wrote:
         | > Cost is the most common complaint and it's almost always from
         | people who don't have it configured correctly (which to be fair
         | Datadog makes it far too easy to misconfigure things and blow
         | up costs).
         | 
         | Datadog's cheapest pricing is $15/host/month. I believe that is
         | based on the largest sustained peak usage you have.
         | 
         | We run spot instances on AWS for machine learning workflows. A
         | lot of them if we're training and none otherwise. Usually we're
         | using zero. Using DataDog at it's lowest price would basically
         | double the cost of those instances.
        
         | data_maan wrote:
         | This may be a noob question - but why not use Github Projects
         | instead of Linear or Jita?
         | 
         | You're staying within an ecosystem you know and it seems to
         | offer almost all of the necessary functionality
        
         | kevinslin wrote:
         | In terms of Datadog - the per host pricing on infrastructure in
         | a k8/microservices world is perhaps the most egregious of
         | pricing models across all datadog services. Triply true if you
         | use spot instances for short lived workloads.
         | 
         | For folks running k8 at any sort of scale, I generally
         | recommend aggregating metrics BEFORE sending them to datadog,
         | either on a per deployment or per cluster level. Individual
         | host metrics tend to also matter less once you have a large
         | fleet.
         | 
         | You can use opensource tools like veneur
         | (https://github.com/stripe/veneur) to do this. And if you don't
         | want to set this up yourself, third party services like Nimbus
         | (https://nimbus.dev/) can do this for you automatically (note
         | that this is currently a preview feature). Disclaimer also that
         | I'm the founder of Nimbus (we help companies cut datadog costs
         | by over 60%) and have a dog in this fight.
        
         | lijok wrote:
         | > I generally recommend Terraform Cloud
         | 
         | I'll be dead in the ground before I use TFC. 10 cents per
         | resource per month my ass. We have around 100k~ resources at an
         | early-stage startup I'm at, our AWS bill is $50~/mo and TFC
         | wants to charge me $10k/mo for that? We can hire a senior dev
         | to maintain an in-house tool full time for that much.
        
       | 005 wrote:
       | Interesting read, I agree with adopting an identity platform but
       | this can definitely be contentious if you want to own your data.
       | 
       | I wonder how much one should pay attention to future problems at
       | the start of a startup versus "move fast and break things." Some
       | of this stuff might just put you off finishing.
        
       | sakopov wrote:
       | Who's using Pulumi here and how mature is it in comparison to
       | terraform?
        
         | jryan49 wrote:
         | I think currently under the hood it's actually still terraform.
         | I know they are working on their own native providers.
        
         | dmattia wrote:
         | I'm using Pulumi in production pretty heavily for a bunch of
         | different app types (ECS, EKS, CloudFront, CloudFlare, Vault,
         | Datadog monitors, Lambdas of all types, EC2s with ASGs, etc.),
         | it's reasonably mature enough.
         | 
         | As mentioned in the other comment, the most commonly used
         | providers for terraform are "bridged" to pulumi, so the
         | maturity is nearly identical to Terraform. I don't really use
         | Pulumi's pre-built modules (crossroads), but I don't find I've
         | ever missed them.
         | 
         | I really like both Pulumi and Terraform (which I also used in
         | production for hundreds of modules for a few years), which it
         | seems like isn't always a popular opinion on HN, but I have and
         | you absolutely can run either tool in production just fine.
         | 
         | My slight preference is for Pulumi because I get slightly more
         | willing assistance from devs on our team to reach in and change
         | something in infra-land if they need to while working on app
         | code.
         | 
         | We do still use some Pulumi and some Terraform, and they play
         | really nicely together: https://transcend.io/blog/use-
         | terraform-pulumi-together-migr...
        
         | rswail wrote:
         | IaaC is one of the worst acronyms ever.
         | 
         | Infrastructure should be _declared_ , not coded.
         | 
         | Say what you want. The tool then builds that, or changes whats
         | there to match.
         | 
         | I've tried Pulumi and understanding the bit that runs before it
         | tries to do stuff and the bit that runs after it tries to do
         | stuff and working out where the bugs are is a PITA. It lulls
         | you into a false sense of security that you can refer to your
         | own variables in code, but that doesn't get carried over to
         | when it is actually running the plan on the cloud service (ie
         | actually creating the infrastructure) because you can only
         | refer to the outputs of other infrastructure.
         | 
         | CFN is too far in the other direction, primarily because it's
         | completely invisible and hard to debug.
         | 
         | Terraform has _enough_ programmability (eg for_each, for-
         | expressions etc) that you can write  "here is what I want and
         | how the things link together" and terraform will work out how
         | to do it.
         | 
         | The language is... sometimes painful, but it works.
         | 
         | The provider support is unmatched and the modules are of
         | reasonable quality.
        
       | breckenedge wrote:
       | > There are no great FaaS options for running GPU workloads,
       | which is why we could never go fully FaaS.
       | 
       | I keep wondering when this is going to show up. We have a lot of
       | service providers, but even more frameworks, and every vendor
       | seems to have their own bespoke API.
        
         | z3ugma wrote:
         | Check out beam.cloud. They're impressing me with calling GPU
         | runtimes as a FaaS
        
         | gfodor wrote:
         | I just started playing with modal.com and so far it seems good.
         | I haven't run anything in production yet, so YMMV.
        
         | gen220 wrote:
         | I don't think anybody should go "fully FaaS", it's like saying
         | screwdrivers are useless, all you need is a hammer.
         | 
         | That being said, Cloudflare is on the path to offering a great
         | GPU FaaS system for inference.
         | 
         | I believe it's still in beta, but it's the most promising
         | option at the moment.
        
           | breckenedge wrote:
           | Right, I still find it faster to manually provision a
           | specific instance type, install PyTorch on it, and deploy a
           | little flask app for an inference server.
        
       | hermanradtke wrote:
       | Without some sort of background on cost or scale it is hard to
       | judge any of these decisions.
        
       | cratermoon wrote:
       | Even if others disagree with your endorsements or regrets, this
       | record shows you're actually aware of the important decisions you
       | made over the past four years and tracked outcomes. Did you
       | record the decisions when you made them and revisit later?
        
       | guhcampos wrote:
       | Well it's a bit unfortunate this post was published in Feb 1st,
       | it got really outdated really fast around the "choose flux for
       | gitops" part.
        
         | CoolCold wrote:
         | Mind sharing bit more of the details?
        
           | medina wrote:
           | > engineers at Weaveworks built the first version of Flux >
           | Weaveworks donated Flux and Flagger to the CNCF
           | 
           | https://fluxcd.io/blog/2022/11/flux-is-a-cncf-graduated-
           | proj...
           | 
           | > Weaveworks will be closing its doors and shutting down
           | commercial operations > Alexis Richardson, 5 Feb 2024
           | 
           | https://www.linkedin.com/posts/richardsonalexis_hi-
           | everyone-...
           | 
           | If the project has legs, it's now under CNCF.
        
         | plagiarist wrote:
         | What's the news there? I was just about to try it out this
         | weekend.
        
         | alexjurkiewicz wrote:
         | Context
         | https://www.silverliningsinfo.com/automation/weaveworks-unra...
        
         | zeeZ wrote:
         | So far it seems fine, and the maintainers seem to be doing OK
         | too.
         | 
         | Is the project future at risk?
         | https://github.com/fluxcd/flux2/discussions/4544
        
       | sroussey wrote:
       | If you are startup that can can't afford a DBA, then why why why
       | are you using Kubernetes?
        
         | jrockway wrote:
         | Why wouldn't you use Kubernetes? There are basically 3 classes
         | of deployments:
         | 
         | 1) We don't have any software, so we don't have a prod
         | environment.
         | 
         | 2) We have 1 team that makes 1 thing, so we just launch it out
         | of systemd.
         | 
         | 3) We have between 2 and 1000 teams that make things and want
         | to self-manage when stuff gets rolled out.
         | 
         | Kubernetes is case 3. Like it or not, teams that don't
         | coordinate with each other is how startups scale, just like big
         | companies. You will never find a director of engineering that
         | says "nah, let's just have one giant team and one giant
         | codebase".
        
           | otterley wrote:
           | On AWS, at least, there are alternatives such as ECS and even
           | plain old EC2 auto scaling groups. Teams can have the
           | autonomy to run their infrastructure however they like
           | (subject to whatever corporate policy and compliance regime
           | requirements they might have to adhere to).
           | 
           | Kubernetes is appealing to many, but it is not 100%
           | frictionless. There are upgrades to manage, control plane
           | limits, leaky abstractions, different APIs from your cloud
           | provider, different RBAC, and other things you might prefer
           | to avoid. It's its own little world on top of whatever world
           | you happen to be running your foundational infrastructure on.
           | 
           | Or, as someone has artistically expressed it:
           | https://blog.palark.com/wp-
           | content/uploads/2022/05/kubernete...
        
             | ezrast wrote:
             | The alternatives aren't frictionless either; many items
             | from that image are not specific to Kubernetes. I
             | personally find AWS API's frustrating to use, so even if I
             | were running a one-person shop (and was bound to AWS for
             | some reason - maybe a warlock has cursed me?) I'd lean
             | towards managing things from EKS to get an interface that
             | fits my brain better. It's just preference, though - EC2
             | auto-scaling is perfectly viable if that's your jam.
        
             | jrockway wrote:
             | The iceberg is fine, but using ECS doesn't absolve you from
             | needing to care about monitoring, affinity, audit logging,
             | OS upgrades, authentication/IAM, etc. That's generally why
             | organizations choose to have infrastructure teams, or to
             | not have infrastructure at all.
             | 
             | I have seen people rewrite Kubernetes in CloudFormation.
             | You can do it! But it certainly isn't problem-free.
        
               | otterley wrote:
               | ECS Fargate does manage the security of the node up to
               | and including the container runtime. Patches are often
               | applied behind the scenes, without many folks even
               | knowing, and for those that require interruption, a
               | restart of the task will land it on a patched node.
               | 
               | You're right that if you use a cloud provider, IAM is
               | something that has to be reckoned with. But the question
               | is, how many implementations of IAM and policy mechanisms
               | do I want to deal with?
        
             | klooney wrote:
             | K8S has a credible local development and testing story, ECS
             | and ASGs do not. The fact that there's a generic interface
             | for load-balancer like things, and then you can have a
             | different implementation on your laptop, in the datacenter,
             | and in AWS, and everything ports, is huge.
             | 
             | Also, you can bundle your load balancer config and
             | application config together. No written description of the
             | load balancer config + an RPM file to a disinterested
             | different team.
        
           | kccqzy wrote:
           | One giant codebase is fine. Monorepo is better than lots of
           | scattered repos linked together with git hashes. And it
           | doesn't really get in the way of each team managing when
           | stuff gets rolled out.
        
             | jrockway wrote:
             | I'm a big monorepo fan, but you run into that ownership
             | problem. "It's slow to clone"; which team fixes that?
        
               | Yasuraka wrote:
               | some bored guy at $trillion_dollar_company
               | 
               | https://github.com/martinvonz/jj
               | https://github.com/facebook/sapling
        
           | vander_elst wrote:
           | Google has one _giant_ codebase. I am pretty sure the aren 't
           | the only ones.
        
         | ezrast wrote:
         | Because it works, the infra folks you hired already know how to
         | use it, the API is slightly less awful than working with AWS
         | directly, and your manifests are kinda sorta portable in case
         | you need to switch hosting providers for some reason.
        
         | tomas789 wrote:
         | This is my case. I'm one man show ATM so no DBA. I'm still
         | using Kubernetes. Many things can be automated as simply as
         | helm apply. Plus you get the benefit of not having a hot mess
         | of systemd services, ad hoc tools which you don't remember how
         | you configured, plethora bash scripts to do common tasks and so
         | on.
         | 
         | I see Kubernetes as one time (mental and time) investment that
         | buys me somehow smoother sailing plus some other benefits.
         | 
         | Of course it is not all rainbows and unicorns. Having a single
         | nginx server for a single /static directory would be my dream
         | instead of MinIO and such.
        
           | sroussey wrote:
           | I don't push to implement Kubernetes until I had 100
           | engineers and a reason to use it.
        
         | maccard wrote:
         | Because I can go from main.go to a load balanced, autoscaling
         | app with rolling deploys, segeregated environments, logging &
         | monitoring in about 30 minutes, and never need to touch _any_
         | of that again. Plus, if I leave, the guy who comes after me can
         | look at a helm chart, terraform module + pipeline.yml and
         | figure out how it works. Meanwhile, our janq shell script based
         | task scheduler craps out on something new every month. What
         | started as 15 lines of "docker run X, sleep 30 docker kill x"
         | is now a polyglot monster to handle all sorts of edge cases.
         | 
         | I have spent vanishingly close to 0 hours on maintaining our
         | (managed) kubernetes clusters in work over the past 3 years,
         | and if I didn't show up tomorrow my replacement would be fine.
        
           | sroussey wrote:
           | I spent zero hours on a MySQL server on bare hardware for
           | seven years.
           | 
           | Admittedly, I was afraid of ever restarting as I wasn't sure
           | it would reboot. But still...
        
             | viraptor wrote:
             | You better invest some time in migrating away from your 5.7
             | (or earlier) in that case, because it's EOL already ;)
        
             | maccard wrote:
             | You still need to get mysql installed and configured
             | though. On AWS, it's 30 lines of terraform for RDS on an
             | internal subnet with a security group only allowing access
             | from your cluster.
             | 
             | For that, you get automated backups, very simple read
             | proxies, managed updates of you ever need them. You can
             | vertically scale down, or uo to the point of "it's cheaper
             | to hire a DBA to fix this".
        
           | yellow_lead wrote:
           | If you can do all that in 30 minutes (or even a few hours), I
           | would love to read an article/post about your setup, or any
           | resources you might recommend.
        
             | maccard wrote:
             | I've just done it a dozen times at this point. Hello world
             | from gin-gonic [0], terraform file with a DO K8s cluster
             | [1] and load balancer, and CI/CD [2] on deploy. There's
             | even time to make a cuppa when you run terraform.
             | 
             | We use this for our internal services at work, and the last
             | time I touched the infra was in 2022 according to git
             | 
             | [0] https://github.com/gin-gonic/gin
             | 
             | [1] https://gist.github.com/donalmacc/0efbb0b377533232da3f7
             | 76c60....
             | 
             | [2] https://docs.digitalocean.com/products/kubernetes/how-
             | to/dep...
        
               | yellow_lead wrote:
               | Thanks! Does DO K8s come with sufficient monitoring /
               | logging or do you add anything?
        
               | yolo3000 wrote:
               | You can just deploy other applications to Kubernetes, for
               | example you can deploy this operator https://prometheus-
               | operator.dev/ and you get Prometheus and Grafana running
               | with a bunch of dashboards already created. Then you
               | annotate your pods to tell Prometheus what to scrape, and
               | you got monitoring. It also comes with AlertManager for
               | alerting. Same for logging, you deploy Elasticsearch and
               | Kibana and you're good to go.
        
               | maccard wrote:
               | As the other commentor said, you can deploy
               | Prometheus/grafana into the k8s cluster and it pretty
               | much does what you want it to do.
        
           | flemhans wrote:
           | You'll need to touch it again. These paid services tend to
           | change all the time.
           | 
           | You also need to pay them which is an event.
        
         | kwillets wrote:
         | To make up for having a better schema in Terraform than in the
         | database.
        
         | paulgb wrote:
         | I think a lot of startups have a set of requirements that is
         | something like:
         | 
         | - I want to spin up multiple redundant instances of some set of
         | services
         | 
         | - I want to load balance over those services
         | 
         | - I want some form of rolling deploy so that I don't have
         | downtime when I deploy
         | 
         | - I want some form of declarative infrastructure, not click-ops
         | 
         | Given these requirements, I can't think of an alternative to
         | managed k8s that isn't more complex.
        
           | sroussey wrote:
           | A startup with no DBA does not need redundant anything. Too
           | small.
        
             | mardifoufs wrote:
             | Uh? Even some larger startups don't have DBAs anymore. For
             | better or for worse. Hell even the place I currently work
             | in, which is not a startup at all has basically no DBA role
             | to speak of.
        
             | slyall wrote:
             | Places get pretty big with no dedicated DBA resources these
             | days. Last place I was at was a Fintech SaaS with 50
             | engineers and half a million paying customers.
             | 
             | Running off a couple of medium ( $3k/month each range ) RDS
             | databases with failover setup. ECS for apps.
             | 
             | Databases looked after themselves. The senior people
             | probably spent 20% of a FTE on stuff like optimizing it
             | when load crept up.
             | 
             | Place before that was a similar size and no DBA either.
             | People just muddled though.
        
             | paulgb wrote:
             | This is a sweeping generalization to make, and I think you
             | underestimate how easy it is to achieve redundancy with
             | modern tools these days.
             | 
             | My company uses redundant services because we like to
             | deploy frequently, and our customers notice if our API
             | breaks while the service is restarted. Running the service
             | redundantly allows us to do rolling deploys while
             | continuing to serve our API. It's also saved us from
             | downtime when a service encounters a weird code path and
             | crashes.
        
           | fulafel wrote:
           | AWS Copilot (if you're on AWS). It's a bit like the older
           | Elastic Beanstalk for EC2.
        
         | klooney wrote:
         | Helm is the only infrastructure package manager I've ever used
         | where you could reliably get random third party things running
         | without a ton of hassle. It's a huge advantage.
        
         | lysecret wrote:
         | Because they are on AWS and can't use Cloud Run.
        
       | endisneigh wrote:
       | Great post. I do wonder - what are the simplest K8s alternatives?
       | 
       | Many say in the database world, "use Postgres", or "use sqlite."
       | Similarly there are those databases that are robust that no one
       | has heard of, but are very limited like FoundationDB. Or things
       | that are specialized and generally respected like Clickhouse.
       | 
       | What are the equivalents of above for Kubernetes?
        
         | tomas789 wrote:
         | You can always use old boring AWS EC2 and such. And sprinkle in
         | some Terraform if you feel fancy. That would be my "use sqlite"
         | 
         | Kubernetes is probably "use postgres"
        
         | marcosdumay wrote:
         | Kubernetes aren't like that.
         | 
         | It's just that, you should start with a handful of backed-up
         | pet servers. Then manually automate their deployment when you
         | need it. And only then go for a tool that abstracts the
         | automated deployment when you need it.
         | 
         | But I fear the simplest option on the Kubernetes area is
         | Kubernetes.
        
           | doctor_eval wrote:
           | I don't know that this is good advice.
           | 
           | I shunned k8s for a long time because of the complexity, but
           | the managed options are so much easier to use and deploy than
           | pet servers that I can't justify it any more. For anything
           | other than truly trivial cases, IMO kubernetes or (or
           | similar, like nomad) is easier than any alternative.
           | 
           | The stack I use is hosted Postgres and VKS from Vultr. It's
           | been rock solid for me, and the entire infrastructure can be
           | stored in code.
        
           | lucw wrote:
           | This is good advice, if you haven't experienced the pain of
           | doing it yourself, you won't know what the framework does for
           | you. There are limits to this reasoning of course, we don't
           | reimplement everything on the stack just for the learning
           | experience. But starting with just docker might be a good
           | idea.
        
         | busterarm wrote:
         | The simplest k8s alternative (that is an actual alternative) is
         | Nomad.
        
         | Too wrote:
         | It's mainly running your own control plane that is complex.
         | Managed k8s (EKS, AKS, GKE) is not difficult at all. Don't
         | listen to all the haters. It's the same crowd who think they
         | can replace systemd with self hacked init scripts written in
         | bash, because they don't trust abstractions and need to see
         | everything the computer does step-by-step.
         | 
         | I also stayed away for a long time due to all the fear spread
         | here, after taking the leap, I'm not looking back.
         | 
         | The lightweight "simpler" alternative is docker-compose. I put
         | simpler in quotes because once you factor in all the auxiliary
         | software needed to operate the compose files in a professional
         | way (IaC, Ansible, monitoring, auth, VM provisioning, ...), you
         | will accumulate the same complexity yourself, only difference
         | is you are doing it with tools that may be more familiar to
         | what you are used to. Kubernetes gives you a single point of
         | control plane for all this. Does it come with a learning curve?
         | Yes, but once you get over it there is nothing inherent about
         | it that makes it unnecessary complex. You don't _need_
         | autoscaler, replicasets and those more advanced features just
         | because you are on k8s.
         | 
         | If you want to go even simpler, the clouds have offerings to
         | just run a container, serverless, no fuzz around. I have to
         | warn everyone though that using ACI on Azure was the biggest
         | mistake of my career. Conceptually it sounds like a good idea
         | but Azures execution of it is just a joke. Updating a very
         | small container image taking upwards of 20-30 minutes, no logs
         | on startup crashes, randomly stops serving traffic, bad
         | integration with storage.
        
       | rmccue wrote:
       | > Not using Terraform Cloud
       | 
       | We adopted TFC at the start of 2023 and it was problematic right
       | from the start; stability issues, unforeseen limitations, and
       | general jankiness. I have no regrets about moving us away from
       | local execution, but Terraform Cloud was a terrible provider.
       | 
       | When they announced their pricing changes, the bill for our team
       | of 5 engineers would have been roughly 20x, and more than hiring
       | an engineer to literally sit there all day just running it
       | manually. No idea what they're thinking, apart from hoping their
       | move away from open source would lock people in?
       | 
       | We ended up moving to Scalr, and although it hasn't been a long
       | time, I can't speak highly enough of them so far. Support was
       | amazing throughout our evaluation and migration, and where we've
       | hit limits or blockers, they've worked with us to clear them very
       | quickly.
        
       | LispSporks22 wrote:
       | Can any of your engineers run the product locally and iterate
       | fast?
        
         | cissmayazz wrote:
         | Yeah typically run a single go service or use devspace to
         | combine multiple services using published containers
        
       | hintymad wrote:
       | > EKS
       | 
       | My contrarian view is that EC2 + ASG is so pleasant to use. It's
       | just conceptually simple: I launch an image into an ASG, and
       | configure my autoscale policies. There are very few things to
       | worry about. On the other hand, using k8s has always been a big
       | deal. We built a whole team to manage k8s. We introduce dozens of
       | concepts of k8s or spend person-years on "platform engineering"
       | to hide k8s concepts. We publish guidelines and sdks and all
       | kinds of validators so people can use k8s "properly". And we
       | still write 10s of thousands lines of YAML plus 10s of thousands
       | of code to implement an operator. Sometimes I wonder if k8s is
       | too intrusive.
        
         | xyzzy_plugh wrote:
         | I tend to agree that for most things on AWS, EC2 + ASG is
         | superior. It's very polished. EKS is very bare bones. I would
         | probably go so far as to just run Kubernetes on EC2 if I had to
         | go that route.
         | 
         | But in general k8s provides incredibly solid abstractions for
         | building portable, rigorously available services. Nothing quite
         | compares. It's felt very stable over the past few years.
         | 
         | Sure, EC2 is incredibly stable, but I don't always do business
         | on Amazon.
        
           | Noumenon72 wrote:
           | At first I thought your "in general" statement was
           | contradicting your preference for EC2 + ASG. I guess AWS is
           | such a large part of my world that "in general" includes AWS
           | instead of meaning everything but AWS.
        
         | cedws wrote:
         | K8S is a disastrous complexity bomb. You need millions upon
         | millions of lines of code just to build a usable platform.
         | Securing Kubernetes is a nightmare. And lock-in never really
         | went away because it's all coupled with cloud specific stuff
         | anyway.
         | 
         | Many of the core concepts of Kubernetes should be taken to
         | build a new alternative without all the footguns. Security
         | should be baked in, not an afterthought when you need
         | ISO/PCI/whatever.
        
           | xyzzy_plugh wrote:
           | This isn't my experience at all. Maybe three or four years
           | ago?
           | 
           | Who exactly needs millions of lines of code?
        
             | Spivak wrote:
             | I think they're more getting a k8s requiring a whole mess
             | of 3rd party code to actually be useful when bringing it to
             | prod. For EKS you end up having coredns, fluentbit, secrets
             | store, external dns, aws ebs csi controller, aws k8s cni,
             | etc.
             | 
             | And in the end it's hard to say if you've actually gained
             | anything except now this different code manages your AWS
             | resources like you were doing with CF or terraform.
        
               | mschuster91 wrote:
               | We have all of that neatly extracted into a Terraform
               | module. Write it once and now EKS clusters are
               | essentially disposable.
        
               | Solvency wrote:
               | You just added yet another Thing in that huge pile of
               | things representing millions of lines of code. That's the
               | point.
        
               | dvfjsdhgfv wrote:
               | Everything we run our workloads on is based on millions
               | of LoCs, whether it's in the OS, in K8S, in is built-in
               | or external kinds. If you decide to run K8S in AWS,
               | you'll be better of using Karpenter, external-secrets and
               | all these things as they will make your life easier in
               | various ways.
        
               | lijok wrote:
               | Why is that inherently a problem?
               | 
               | How many LOCs in the linux kernel again?
        
           | woleium wrote:
           | kinda like openshift?
        
           | mardifoufs wrote:
           | Millions upon millions of lines of code?! What? Can you
           | specify what you were trying to do with it?
        
             | cedws wrote:
             | Argo CD, Argo Rollouts, Vault, External Secrets, Cert
             | Manager, Envoy, Velero, plus countless operators, plus a
             | service mesh if you need it, the list goes on. If you're
             | providing Kubernetes as a platform at any sort of scale
             | you're going to need most of this stuff or some
             | alternatives. This sums up to at least multiple million
             | LOC. Then you have Kubernetes itself, containerd, etcd...
        
               | arccy wrote:
               | that's not much different from using the cloud PaaS
               | offerings besides who runs that million lines and who
               | gets the freedom/control for customization.
        
           | foofie wrote:
           | > K8S is a disastrous complexity bomb. You need millions upon
           | millions of lines of code just to build a usable platform.
           | 
           | I don't know what you have been doing with Kubernetes, but I
           | run a few web apps out of my own Kubernetes cluster and the
           | full extent of my lines of code are the two dozen or so LoC
           | kustomize scripts I use to run each app.
        
             | WildGreenLeave wrote:
             | I run my own cluster too, it is managed by one terraform
             | file which is maintained on GitHub [0]. Along with that I
             | deploy everything on here with 1 shell script and a bunch
             | of yaml manifests for my services. It's perfect for
             | projects that are managed by one person (me). Everything is
             | in git and reproducable. The only thing I am doing
             | unconventional is that I didn't want to use github actions,
             | so I use Kaniko to build my Docker containers inside my
             | cluster.
             | 
             | 0 https://github.com/kube-hetzner/terraform-hcloud-kube-
             | hetzne...
        
             | cedws wrote:
             | If you're using a K8S cluster just to deploy a few web apps
             | then it's not really a platform that you could provide to
             | an engineering team within a medium-large company. You
             | could probably run your stuff on ECS.
        
               | foofie wrote:
               | > If you're using a K8S cluster just to deploy a few web
               | apps (...)
               | 
               | It's really not about what I do and do not do with
               | Kubernetes. It's on you to justify your "millions upon
               | millions lines of code" claim because it is so outlandish
               | and detached from reality that it says more about your
               | work than about Kubernetes.
               | 
               | I repeat: I only need a few dozen lines of kustomize
               | scripts to release whole web apps. Simple code. Easy
               | peasy. What mess are you doing to require "millions upon
               | millions" lines of code?
        
               | cedws wrote:
               | You are missing the point. I recommend you look into
               | Platform Engineering and what it involves.
        
               | avbanks wrote:
               | While I love ECS you're not giving k8s enough credit.
               | Nearly every COTS (common off the self) app has a helm
               | chart, hardly any provide direct ECS support. If I want a
               | simple kafka cluster or zookeeper cluster there's a
               | supported helm chart for that, nothing is provided for
               | ECS, you have to make that yourself.
        
             | gtirloni wrote:
             | You're both using hyperboles that don't match the reality
             | of the average-sized company using Kubernetes. It's neither
             | "millions upon millions of lines of code" nor "just a few
             | dozen lines of kustomize scripts".
        
           | mise_en_place wrote:
           | kubeadm + fabric + helm got me 99% of the way there. My
           | direct report, a junior engineer, wrote the entire helm chart
           | from our docker-compose. It will not entirely replace our
           | remote environment but it is nice to have something in
           | between our SDK and remote deployed infra. Not sure what you
           | meant by security; could you elaborate? I just needed to
           | expose one port to the public internet.
        
         | foofie wrote:
         | > My contrarian view is that EC2 + ASG is so pleasant to use.
         | 
         | Sometimes I think that managed kubernetes services like EKS are
         | the epitome of "give the customers what they want", even when
         | it makes absolutely no sense at all.
         | 
         | Kubernetes is about stitching together COTS hardware to turn it
         | into a cluster where you can deploy applications. If you do not
         | need to stitch together COTS hardware, you have already far
         | better tools available to get your app running. You don't need
         | to know or care in which node your app is suppose to run and
         | not run, what's your ingress control, if you need to evict
         | nodes, etc. You have container images, you want to run
         | containers out of them, you want them to scale a certain way,
         | etc.
        
         | mr_moose wrote:
         | To me, it sounds like your company went through a complex re-
         | architecturing exercise at the same time you moved to
         | Kubernetes, and your problems have more to do with your
         | (probably flawed) migration strategy than the tool.
         | 
         | Lifting and shifting an "EC2 + ASG" set-up to Kubernetes is a
         | straightforward process unless your app is doing something very
         | non-standard. It maps to a Deployment in most cases.
         | 
         | The fact that you even implemented an operator (a very advanced
         | use-case in Kubernetes) strongly suggests to me that you're
         | doing way more than just lifting and shifting your existing
         | set-up. Is it a surprise then that you're seeing so much more
         | complexity?
        
           | krab wrote:
           | Not familiar with the OP but this may have been the pitch for
           | migration: "K8S will allow us better automation".
        
       | LispSporks22 wrote:
       | I feel like this is overkill for a startup.
       | 
       | Why not dump your application server and dependencies into rented
       | data center (or EC2 if you must) and setup a coarse DR? Maybe
       | start with a monolith in PHP or Rails.
       | 
       | None of that word salad sounds like startup to me, but then again
       | everyone loves to refer to themselves as a startup (must be a
       | recruiting tool?), so perhaps muh dude is spot on.
        
         | icameron wrote:
         | I would like to know what you're being downvoted for. It's not
         | bad advice, necessarily... this was the way 20 years ago. I
         | mean isn't hacker news running kind of like this as a monolith
         | on a single server? People might be surprised how far you can
         | get with a simple setup.
        
         | charred_patina wrote:
         | I don't want to be negative, but this post reads like a list of
         | things that I want to avoid in my career. I did a brief stint
         | in cloud stuff at a FAANG and I don't care to go back to it.
         | 
         | Right now I'm engineer No. 1 at a current startup just doing
         | DDD with a Django monolith. I'm still pretty Jr. and I'm
         | wondering if there's a way to scale without needing to get into
         | all of the things the author of this article mentions. Is it
         | possible to get to a $100M valuation without needing all of
         | this extra stuff? I realize it varies from business to
         | business, but if anyone has examples of successes where people
         | just used simple architecture's I'd appreciate it.
        
           | krmboya wrote:
           | I bet you can get pretty far with just ec2 and autoscaling,
           | or comparable tech in other cloud platforms. With a managed
           | database service.
        
             | charred_patina wrote:
             | That I'd be comfortable with.
        
           | kevinqi wrote:
           | I work at a startup and most of the stuff in the article
           | covers things we use and solve real world problems.
           | 
           | If you're looking for successful businesses, indie hackers
           | like levelsio show you how far you can get with very simple
           | architectures. But that's solo dev work - once you have a
           | team and are dealing with larger-scale data, things like
           | infrastructure as code, orchestration, and observability
           | become important. Kubernetes may or may not be essential
           | depending on what you're building; it seems good for AI
           | companies, though.
        
             | charred_patina wrote:
             | How many people if I may ask? And how many TPS for your
             | services? I am hoping I can get away with a simple monolith
             | for a very long time.
        
               | kevinqi wrote:
               | 30-40 people; not much TPS but we're not primarily
               | building a web app; we have event-driven data pipelines
               | and microservices for ML data.
               | 
               | If you're primarily building a web app, a monolith is
               | fine for quite a while, I think. But a lot of the stuff
               | in the post is still relevant even for monoliths - RDS,
               | Redis, ECR, terraform, pagerduty,
               | monitoring/observability.
        
           | AznHisoka wrote:
           | I bet Craigslist runs on much simpler infrastructure. Not
           | sure how much they're worth though
        
             | mixmastamyk wrote:
             | Stackoverflow famously grew huge for a long time on a
             | single Windows box. I don't recommend that but yeah KISS
             | rule definitely. Floss version: supabase, open telemetry,
             | etc.
        
           | singron wrote:
           | You don't need this many tools, especially really early. It
           | also depends on the particulars of your business. E.g. if you
           | are B2B SaaS, then you need a ton of stuff automatically to
           | get SOC2 and generally appease the security requirements of
           | your customers.
           | 
           | That said, anything that's set-and-forget is great to start
           | with. Anything that requires it's own care and feeding can
           | wait unless it's really critical. I think we have a project
           | each quarter to optimize our datadog costs and renegotiate
           | our contract.
           | 
           | Also if you make microservices, you are going to need a ton
           | of tools.
        
             | segfaltnh wrote:
             | Also don't make microservices if you don't have teams that
             | will independently own them.
        
           | extr wrote:
           | You can scale to any valuation with any architecture. Whether
           | or not you need sophisticated scaling solutions depends on
           | the characteristics of your product, mostly how pure of a
           | software play it is. Pure software means you will run into
           | scaling challenges quicker, since likely part of your value
           | add is in fact managing the complexity of scaling.
           | 
           | If you are running a marketplace app and collect fees you're
           | going to be able go much further on simpler architectures
           | than if you're trying to generate 10,000 AI images per
           | second.
        
           | movpasd wrote:
           | I'm currently early in my career and "the software guy" in a
           | non-software team and role, but I'm looking to move into a
           | more engineering direction. You've pretty much got my dream
           | next job at the moment -- if you don't mind me asking, how
           | did you manage to find your role, especially being "still
           | pretty Jr."?
        
             | charred_patina wrote:
             | What a coincidence! I've got my dream job too!
             | 
             | The things I did to get here are honestly kind of stupid. I
             | started out at a defense contractor after graduating and
             | left in the first six months because all the software devs
             | were jumping ship. Went to a small business defense
             | contractor (yep that's a thing) and learned to build web
             | apps with React and Django. Then the pace of business
             | slowed so after about 18 months I got on the Leetcode grind
             | and got into a FAANG. Realized I hated it, so I quit after
             | about 9 months with no job lined up.
             | 
             | While unemployed I convinced myself I was going to get a
             | job in robotics (I actually got pretty close, I had 3 final
             | level interviews with robotics companies), but the job
             | market went to shit pretty much the exact day I quit my job
             | lol. I spent about 6 months just learning ROS, Inverse
             | Kinematics, math for robotics, gradient descent and
             | optimization, localization, path planning, mapping etc. I
             | taught at a game development summer camp for a month and a
             | half, that was awesome. Working with kids is always a
             | blast. Also learned Rust and built a prototype for a
             | multiplayer browser-based coding game I had been thinking
             | about for a while. It was an excuse to make a full stack
             | application with some fun infrastructure stuff.
             | 
             | https://ai-arena.com/#/multiplayer
             | 
             | The backend is no longer running, but originally users
             | could see their territory on the galaxy grow as their code
             | won battles for them.
             | 
             | For the current role, I really just got lucky. The previous
             | engineer was on his way out for non-job related reasons. He
             | had read a lot of the books I had (Code Complete, Domain
             | Driven Design) and I think we just connected over shared
             | interests and intellectual curiosity.
             | 
             | I think that in the modern day, so many people are really
             | just in this space for the paycheck-- and that's okay!
             | Everyone needs to make a living. But I think that if you
             | have that intellectual curiosity and like making stuff,
             | people will see that and get excited. It ends up being a
             | blessing and a curse.
             | 
             | I have failed interviews because of honesty "I would Google
             | the names of books and read up on that subject" or "I think
             | if I was doing CSS then I would be in the wrong role" (I
             | realize how douchey that sounds but I just was not meant to
             | design things, I have tried). But I have also gone further
             | in interviews than I should have because I was really
             | engrossed in a particular problem like path planning or
             | inverse kinematics and I was able to talk about things in
             | plain terms.
             | 
             | I think it's easier to learn things quickly if they are
             | something you're actually interested in, it becomes
             | effortless. Basically I just try to do that so I can learn
             | optimally, then I try to get lucky.
             | 
             | EDIT: Oh I just thought of more good advice. Find senior
             | devs to learn from. They can be kind of grumpy in their
             | online presence, but they help you avoid so many tar pits.
             | I am in a Discord channel with a handful of senior
             | engineers. The best way to get feedback is to naively say
             | "I'm going to do X", they will immediately let you know why
             | X is a bad idea. A lot of their advice boils down to KISS
             | and use languages with strong typing.
        
               | daxfohl wrote:
               | I did this myself for a good 15 years or so, but
               | eventually with a family, money became a bit more of a
               | priority, and it's hard to get a good job if all you've
               | worked at is small shops. Any next role in a larger tech
               | company will likely be a downgrade until you can prove
               | yourself out, which of course you may not be able to
               | because things are so different, and motivation will run
               | low because you're being tasked with all the stuff that
               | caused you to leave big tech in the first place. It can
               | be quite miserable to be grouped with a bunch of kids
               | with 3-5 YOE that have no idea how to build something
               | from scratch, and they're outperforming you because they
               | know the system.
               | 
               | In my case it took a good five years and a couple job
               | hops to rebalance. But eventually you get back to a
               | reasonable tech leadership role and back to making some
               | of the bigger decisions to help make the junior devs'
               | lives less miserable.
               | 
               | No regrets, but the five years it takes to rebalance can
               | be pretty hard.
        
           | Arbortheus wrote:
           | Currently working at a $100M valuation tech company that
           | fundamentally is built on a Django monolith with some other
           | fluffy stuff lying around it. You can go far with a Django
           | monolith and some load balancing.
        
           | daxfohl wrote:
           | Don't need any of it. Start simple. Some may be useful
           | though. The list makes good points. Keep it around and if you
           | find yourself suffering from the lack of something, look
           | through the list and see if anything there would be good ROI.
           | But don't adopt something just because this list says you
           | should.
           | 
           | One thing though, I'd start with go. It's no more complex
           | than python, more efficient, and most importantly IMO since
           | it compiles down to binary it's easier to build, deploy,
           | share, etc. And there's less divergence in the ecosystem;
           | generally one simple way to do things like building and
           | packaging, etc. I've not had to deal with versions or tooling
           | or environmental stuff nearly as much since switching.
        
         | krmboya wrote:
         | Key term here: 'cloud native'. Which is supposedly the future
        
       | jasoneckert wrote:
       | After reading through this entire post, I'm pleasantly surprised
       | that there isn't one item where I don't mirror the same
       | endorse/regret as the author. I'm not sure if this is coincidence
       | or popular opinion.
        
       | ndjshe3838 wrote:
       | I'm imagining a developer in the 90s/00s reading this list and
       | being baffled by the complexity/terminology
        
         | SoftTalker wrote:
         | I am in 2024.
        
         | kypro wrote:
         | I thought the same reading it - is it really this hard to build
         | an app these days?
         | 
         | Things were more far more manual and much less secure, scalable
         | and reliable in the past, but they were also far far simpler.
        
           | xcrunner529 wrote:
           | Agreed. It's just ridiculous. Some just love to spend money
           | and make things more complex.
        
         | LispSporks22 wrote:
         | I agree. I'm afraid I'm one of those 00s developers and can
         | relate. Back then many startups were being launched on super
         | simple stacks.
         | 
         | With all of that complexity/word salad from TFA, where's the
         | value delivered? Presumably there's a product somewhere under
         | all that infrastructure, but damn, what's left to spend on it
         | after all the infrastructure variable costs?
         | 
         | I get it's a list of preferences, but still once you've got
         | your selection that's still a ton of crap to pay for and deal
         | with.
         | 
         | Do we ever seek simplicity in software engineering products?
        
           | bigstrat2003 wrote:
           | I think that far too many companies get sold on the vision of
           | "it just works, you don't need to hire ops people to run the
           | tools you need for your business". And that is true! And
           | while you're starting, it may be that you can't afford to
           | hire an ops guy and can't take the time to do it yourself.
           | But it doesn't take _that_ much scale before you get to the
           | point it would be cheaper to just manage your own tools.
           | 
           | Cloud and SaaS tools are very seductive, but I think they're
           | ultimately a trap. Keep your tools simple and just run them
           | yourselves, it's _not_ that hard.
        
           | TeMPOraL wrote:
           | > _Do we ever seek simplicity in software engineering
           | products?_
           | 
           | Doubtfully. Simplicity of work breakdown structure - maybe.
           | Legibility for management layers, possibly. Structural
           | integrity of your CYA armor? 100%.
           | 
           | The half-life of a software project is what now, a few years
           | at most these days? Months, in webdev? Why build something
           | that is robust, durable, efficient, make all the correct
           | engineering choices, where you can instead race ahead with a
           | series of "nobody ever got fired for using ${current hot
           | cloud thing}" choices, not worrying at all about rapidly
           | expanding pile of tech and organizational debt? If you push
           | the repayment time far back enough, your project will likely
           | be dead by then anyway (win), or acquired by a greater fool
           | (BIG WIN) - either way, you're not cleaning up anything.
           | 
           | Nobody wants to stay attached to a project these days anyway.
           | 
           | /s
           | 
           | Maybe.
        
             | dogcomplex wrote:
             | Don't worry, AI will wash all that away. Nothing says
             | simplicity like an incomprehensible black box!
        
           | izacus wrote:
           | Look, the thing is - most of infra decisions are made by
           | devops/devs that have a vested interest in this.
           | 
           | Either because they only know how to manage AWS instances (it
           | was the hotness and thats what all the blogs and YT videos
           | were about) and are now terrified from losing their jobs if
           | the companies switch stacks. Or because they needed to put
           | the new thing on their CV so they remain employable. Also
           | maybe because they had to get that promotion and bonus for
           | doing hard things and migrating things. Or because they were
           | pressured into by bean counters which were pressured by the
           | geniuses of Wall Street to move capex to opex.
           | 
           | In any case, this isn't by necessity these days. This is
           | because, for a massive amount of engineers, that's the only
           | way they know how to do things and after the gold rush of
           | high pay, there's not many engineers around that are in it to
           | learn or do things better. It's for the paycheck.
           | 
           | It is what it is. The actual reality of engineering the
           | products well doesn't come close to the work being done by
           | the people carrying that fancy superstar engineer title.
        
           | habinero wrote:
           | That's for slower projects.
           | 
           | You know the old adage "fast, cheap, good: pick two"? With
           | startups, you're forced to pick fast. You're still probably
           | not gonna make it, but if you don't build fast, you
           | definitely won't.
        
             | geraldhh wrote:
             | "That's what they want you to think"
        
         | DannyBee wrote:
         | Yeah, I read the " My general infrastructure advice is "less is
         | better".", and was like "when did this list of stuff become the
         | definition of 'less'"
        
           | segfaltnh wrote:
           | My reaction exactly. I don't know their footprint but this is
           | a long list of stuff.
        
         | occams_chainsaw wrote:
         | There's _a lot_ in the article that existed in the 00s. Now
         | imagine a programmer from the 70s...
        
           | smallnix wrote:
           | I think engineers in the 20s who were putting out quality
           | enigmas would be stunned by all the marketing lingo.
        
         | davedx wrote:
         | I've used most of these technologies and the sum value add over
         | a way simpler monolith on a single server setup is negligible.
         | It's pure insanity
        
           | _kb wrote:
           | It's a hedge.
           | 
           | There's an easy bent towards designing everything for scale.
           | It's optimistic. It's feels good. It's safe, defendable, and
           | sound to argue that this complexity, cost, and deep
           | dependency is warranted when your product is surely on the
           | verge of changing the course of humanity.
           | 
           | The reality is your SaaS platform for ethically sourced,
           | vegan dog food is below inconsequential and the few users
           | that you do have (and may positively affect) absolutely do
           | not not need this tower of abstraction to run.
        
         | timc3 wrote:
         | Couldn't agree more. What a huge amount of tech and complexity
         | just to get something off the ground
        
         | LightFog wrote:
         | The more complex you make it the better your job security eh?
         | Maybe they'll even give you a whole team to look after it all.
         | Absolute madness.
        
         | lawgimenez wrote:
         | My last web development project was in the FTP upload era.
         | Reading this, I'm kinda glad I'm not in web dev.
        
         | esskay wrote:
         | The funny thing is a lot of smaller startups are seeing just
         | how absurdly expensive these service are, and are just
         | switching back to basic bare metal server hosting.
         | 
         | For 99% of businesses it's a wasteful, massive overkill
         | expense. You dont NEED all the shiny tools they offer, they
         | don't add anything to your business but cost. Unless you're a
         | Netflix or an Apple who needs massive global content
         | distribution and processing services theres a good chance
         | you're throwing money away.
        
         | benreesman wrote:
         | We had FB up to 6 figures in servers and a billion MAUs
         | (conservatively) before even tinkering with containers.
         | 
         | The "control plane" was ZooKeeper. Everything had bindings to
         | it, Thrift/Protobuf goes in a znode fine. List of servers for
         | FooService? znode.
         | 
         | The packaging system was a little more complicated than a
         | tarball, but it was spiritually a tarball.
         | 
         | Static link everything. Dependency hell: gone. Docker:
         | redundant.
         | 
         | The deployment pipeline used hypershell to drop the packages
         | and kick the processes over.
         | 
         | There were hundreds of services and dozens of clusters of them,
         | but every single one was a service because it needed a
         | different SKU (read: instance type), or needed to be in Java or
         | C++, or some engineering reason. If it didn't have a real
         | reason, it goes in the monolith.
         | 
         | This was dramatically less painful than any of the two dozen
         | server type shops I've consulted for using kube and shit. It's
         | not that I can't use Kubernetes, I know the k9s shortcuts
         | blindfolded. But it's no fun. And pros built these deployments
         | and did it well, serious Kubernetes people can do everything
         | right and it's _complicated_.
         | 
         | After 4 years of hundreds of elite SWEs and PEs (SRE) building
         | a Borg-alike, we'd hit _parity_ with the bash and ZK stuff. And
         | it ultimately got to be a clear win.
         | 
         | But we had an _engineering_ reason to use containers: we were
         | on bare metal, containers can make a lot of sense on bare
         | metal.
         | 
         | In a hyperscaler that has a zillion SKUs on-demand?
         | Kubernetes/Docker/OCI/runc/blah is the friggin Bezos tax.
         | You're already virtualized!
         | 
         | Some of the new stuff is hot shit, I'm glad I don't ssh into
         | prod boxes anymore, let alone run a command on 10k at the same
         | time. I'm glad there are good UIs for fleet management in the
         | browser and TUI/CLI, and stuff like TailScale where mortals can
         | do some network stuff without a guaranteed zero day. I'm glad
         | there are layers on top of lock servers for service discovery
         | now. There's a lot to keep from the last ten years.
         | 
         | But this yo dawg I heard you like virtual containers in your
         | virtual machines so you can virtualize while you virtualize
         | shit is overdue for its CORBA/XML/microservice/many-many-many
         | repos moment.
         | 
         | You want reproducibility. Statically link. Save Docker for a
         | CI/CD SaaS or something.
         | 
         | You want pros handing the datacenter because pets are for
         | petting: pay the EC2 markup.
         | 
         | You can't take risks with customer data: RDS is a very sane
         | place to splurge.
         | 
         | Half this stuff is awesome, let's keep it. The other half is
         | job security and AWS profits.
        
           | geraldhh wrote:
           | > We had FB up to 6 figures in servers and a billion MAUs
           | (conservatively) before even tinkering with containers.
           | 
           | that would have been around the time when containers entered
           | the public/developer consciousness, no?
        
         | annoyingnoob wrote:
         | No, not at all. Maybe baffled by the use of expensive cloud
         | services instead of running on your own bare metal where the
         | cost is in datacenter space and bandwidth. The loss of control
         | coupled with the cost is baffling.
        
       | ehPReth wrote:
       | Okta... after everything that's happened recently with them?
        
         | deskamess wrote:
         | Yeah... this stood out! Do you have any good alternatives? I
         | wish CloudFlare would do it (IDP).
        
       | __turbobrew__ wrote:
       | It is a shame karpenter is AWS only. I was thinking about how our
       | k8s autoscaler could be better and landed on the same kind of
       | design as karpenter where you work from unschedulable pods
       | backwards. Right now we have an autoscaler which looks at
       | resource utilization of a node pool but that doesn't take into
       | account things like topology spread constraints and resource
       | fragmentation.
        
         | acedTrex wrote:
         | https://github.com/Azure/karpenter-provider-azure there is this
         | in the works for karpenter on aks
        
           | redrove wrote:
           | It's actually released in preview, they called it Node Auto
           | Provisioning. Doesn't work with Azure Linux unfortunately.
        
       | sreeramvenkat wrote:
       | Ironic that the article begins with an image of server chassis
       | with wires running around while the description is entirely about
       | cloud infra.
        
       | ChuckMcM wrote:
       | This is fabulous. I keep lists like this in my notebook(s). The
       | critical thing here is that you shouldn't dwell on your "wrong"
       | choices, instead document the choice, what you thought you were
       | getting, what you got, and what information would have been
       | helpful to know at the time of decision (or which information you
       | should have given more weight at the time of the decision.) If
       | you do this, you will consistently get better and better.
       | 
       | And by far "automate all the things" is probably my number one
       | suggestion for DevOps folks. Something that saves you 10 minutes
       | a day pays for itself in a month when you have a couple of hours
       | available to diagnose and fix a bug that just showed up. (5 days
       | a week X 4 weeks X 10 minutes = 200 minutes) The exponential
       | effect of not having to do something is much larger than most
       | people internalize (they will say, "This just takes me a couple
       | of minutes to do." when in fact it takes 20 to 30 minutes to do
       | and they have to do it repeatedly.)
        
       | janfoeh wrote:
       | Almost every time I read someone's insights who works in an
       | environment with IaaS buy-in, my takeaway is the same: oh boy,
       | what an alphabet soup.
       | 
       | The initial promise of "we'll take care of this for you, no in-
       | house knowledge needed" has not materialized. For any non-trivial
       | use case, all you do is replace transferrable, tailored knowledge
       | with vendor-specific voodoo.
       | 
       | People who are serious about selling software-based services
       | should do their own infrastructure.
        
       | cyounkins wrote:
       | I've climbed the mountain of learning the basics of kubernetes /
       | EKS, and I'm thinking we're going to switch to ECS. Kubernetes is
       | way too complicated for our needs. It wants to be in control and
       | is hard to direct with eg CloudFormation. Load balancers are
       | provisioned from the add-on, making it hard to reference them
       | outside kubernetes. Logging on EKS Fargate to Cloudwatch appears
       | broken, despite following the docs. CPU/Memory metrics don't work
       | like they do on EKS EC2, it appears to require ADOT.
       | 
       | I recreated the environment in ECS in 1/10th the time and
       | everything just worked.
        
         | jacurtis wrote:
         | I've been running ECS for about 5 years now. It has come a long
         | way from a "lightweight" orchestration tool into something
         | thats actually pretty impressive. The recent new changes to the
         | GUI are also helpful for people that don't have a ton of
         | experience with orchestration.
         | 
         | We have moved off of it though, you can eventually need more
         | features than it provides. Of course that journey always ends
         | up in Kubernetes land, so you eventually will find your way
         | back there.
         | 
         | Logging to Cloudwatch from kubernetes is good for one thing...
         | audit logs. Cloudwatch in general is a shit product compared to
         | even open source alternatives. For logging you really need to
         | look at Fluentd or Kibana or DataDog or something along those
         | lines. Trying to use Cloudwatch for logs is only going to end
         | in sadness and pain.
        
           | busterarm wrote:
           | GKE is a much better product to me still than EKS but at
           | least in the last two years or so EKS has become a usable
           | product. Back in like 2018 though? Hell no, avoid avoid
           | avoid.
        
       | amluto wrote:
       | > Zero Trust VPN
       | 
       | VPNs can be wonderful, and you can use use Tailscale or AWS VPN
       | or OpenVPN or IPSEC and you can authenticate using Okta or GSuite
       | or Auth0 or Keycloak or Authelia.
       | 
       | But since when is this Zero Trust? It takes a somewhat unusual
       | firewall scheme to make a VPN do anything that I would seriously
       | construe as Zero Trust, and getting authz on top of that is a
       | real PITA.
        
       | morsecodist wrote:
       | > Picking AWS over Google Cloud
       | 
       | I know this is an unpopular opinion but I think google cloud is
       | amazing compared to AWS. I use google cloud run and it works like
       | a dream. I have never found an easier way to get a docker
       | container running in the cloud. The services all have sensible
       | names, there are fewer more important services compared to the
       | mess of AWS services, and the UI is more intuitive. The only
       | downside I have found is the lack of community resulting in fewer
       | tutorials, difficulty finding experienced hires, and fewer third
       | party tools. I recommend trying it. I'd love to get the user base
       | to an even dozen.
       | 
       | The reasoning the author cites is that AWS has more responsive
       | customer service and maybe I am missing out but it would never
       | even occur to me to speak to someone from a cloud provider. They
       | mention having "regular cadence meetings with our AWS account
       | manager" and I am not sure what could be discussed. I must be
       | doing simper stuff.
        
         | iimblack wrote:
         | I don't have as much experience with aws but I do hate gcp. The
         | ui is slow and buggy. The way they want things to authenticate
         | is half baked and only implemented in some libraries and it
         | isn't always clear what library supports it. The gcloud command
         | line tool regularly just doesn't work; it just hangs and never
         | times out forcing you to kill it manually wondering if it did
         | anything and you'll mess something up running it again. The way
         | they update client libraries by running code generation means
         | there's tons of commits that aren't relevant to the library
         | you're actually using. Features are not available across all
         | client libraries. Documentation contradicts itself or
         | contradicts support recommendations. Core services like
         | bigquery lack any emulator or Docker image to facilitate CI or
         | testing without having to setup a separate project you have to
         | pay for.
        
           | arccy wrote:
           | aws is even worse yet somehow people love them, maybe because
           | they get to talk to a support "human" to hand-hold them
           | through all the badness
        
           | mdaniel wrote:
           | Oh, friend, you have not known UI pain until you've used
           | portal.azure.com. That piece of junk requires actual page
           | reloads to make any changes show up. That Refresh button is
           | just like the close-door elevator button: it's there for you
           | to blow off steam, but it for damn sure does not DO anything.
           | I have boundless screenshots showing when their own UI
           | actually pops up a dialog saying "ok, I did what you asked
           | but it's not going to show up in the console for 10 minutes
           | so check back later". If you forget to always reload the
           | page, and accidentally click on something that it says exists
           | but doesn't, you get the world's ugliest error message and
           | only by squinting at it do you realize it's just the 404 page
           | rendered as if the world has fallen over
           | 
           | I suspect the team that manages it was OKR-ed into using AJAX
           | but come from a classic ASP background, so don't understand
           | what all this "single page app" fad is all about and hope it
           | blows over one day
        
         | fshbbdssbbgdd wrote:
         | I have had the experience of an AWS account manager helping me
         | by getting something fixed (working at a big client). But more
         | commonly, I think the account manager's job at AWS or any cloud
         | or SAAS is to create a reality distortion field and distract
         | you from how much they are charging you.
        
           | tester457 wrote:
           | > I think the account manager's job at AWS or any cloud or
           | SAAS is to create a reality distortion field and distract you
           | from how much they are charging you.
           | 
           | How do they do this jedi mind trick?
        
           | viraptor wrote:
           | Maybe your TAM is different, but our regularly do
           | presentations about cost breakdown, future planning and
           | possible reservations. There's nothing distracting there.
        
         | piotrkaminski wrote:
         | Heartily seconded. Also don't forget the docs: Google Cloud
         | docs are generally fairly sane and often even useful, whereas
         | my stomach churns whenever I have to dive into AWS's labyrinth
         | of semi-outdated, nigh-unreadable crap.
        
           | andreif wrote:
           | To be fair there are lots of GCP docs, but I cannot say they
           | are as good as AWS. Everything is CLI-based, some things are
           | broken or hello-world-useless. Takes time to go through
           | multiple duplicate articles to find anything decent. I have
           | never had this issue with AWS.
           | 
           | GCP SDK docs must be mentioned separately as it's a bizarre
           | auto-generated nonsense. Have you seen them? How can you even
           | say that GCP docs are good after that?
        
             | arccy wrote:
             | very few things are cli only, most have multiple ways to do
             | things. and they have separate guide reference sections
             | that can easily be found. compared to aws where your best
             | bet is to hope google indexed the right page for them.
        
               | andreif wrote:
               | > few things are cli only
               | 
               | wdym? As far as I see, it's either CLI or Terraform. GCP
               | SDK is complete garbage, at least for Python compared to
               | AWS boto3. I have personally made web UI for AWS CLI man
               | pages as a fun project and can index everything myself if
               | needed. Googling works fine. If you are not happy with it
               | then ChatGPT is to the rescue. I honestly do not see any
               | problem at all.
        
         | kbar13 wrote:
         | AWS enterprise support (basically first line support that you
         | paid for) is actually really really good. they will look at
         | your metrics/logs and share with you solid insights. anything
         | more you can talk to a TAM who can then reach out to relevant
         | engineering teams
        
         | halfcat wrote:
         | > I have never found an easier way to get a docker container
         | running in the cloud
         | 
         | We started using Azure Container Apps (ACA) and it seems simple
         | enough.
         | 
         | Create ACA, point to GitHub repo, it runs.
         | 
         | Push an update to GitHub and it redeploys.
        
           | rickette wrote:
           | Azure Container Apps (ACA) and AWS AppRunner are also heavily
           | "inspired" by Google Cloud Run.
        
             | marcinzm wrote:
             | So?
        
         | darknavi wrote:
         | > I have never found an easier way to get a docker container
         | running in the cloud
         | 
         | I don't have a ton of Azure or cloud experience but I run an
         | Unraid server locally which has a decent Docker gui.
         | 
         | Getting a docker container running in Azure is so complicated.
         | I gave up after an hour of poking around.
        
           | andreif wrote:
           | Azure is a complete disaster, deserves its own garbage-
           | category, and gives people PTSD. I don't think AWS/CGP should
           | ever be compared to it at all.
        
             | jiggawatts wrote:
             | Funnily enough, I have the opposite opinion.
             | 
             | AWS has "fun" features like the ability to just lose track
             | of some resource and still be billed for it. It's in
             | here... somewhere. Not sure which region or account. I'll
             | find it one day.
             | 
             | GCP is made by Google, also known as children that forgot
             | to take their ADHD medication. Any minute now they'll just
             | casually announce that they're cancelling the cloud because
             | they're bored of it.
             | 
             | Azure is the only one I've seen with a sane management
             | interface, where you can actually see everything everywhere
             | all at once. Search, filter, query-across-resources, etc...
             | all work reasonably well.
        
               | andreif wrote:
               | I am yet to meet an IRL person who believes Azure has
               | "sane management interface". In my experience it was
               | horribly inconvenient, filled with weird anti-UX
               | solutions that were completely unnecessary. It maybe
               | shows you all at once, or at least tries to, but it's
               | such a horrible idea for a complex system. Non-
               | surprisingly it never worked properly with various
               | widgets hanging or erroring-out. It was impossible to see
               | wtf is going on, what state it is in, or how to do
               | anything about it. Azure will always be an example of a
               | web UI done horribly wrong. This does actually not
               | surprise me at all since Microsoft products are known for
               | this. Every time I need to extend my kids Xbox
               | subscriptions I have to pull my hair out to figure out
               | how to do it in their web mess.
               | 
               | How can you even compare it to AWS is a mystery to me.
               | There are pages showing all your resources, not sure why
               | you think it's a problem. Could be a problem from long
               | time ago?
        
               | arccy wrote:
               | you're lucky if azure works without errors half the
               | time...
        
           | maccard wrote:
           | Oh I disagree - we migrated from azure to AWS, and running a
           | container on Fargate is significantly more work than Azure
           | Container Apps [0]. Container Apps was basically "here's a
           | container, now go".
           | 
           | [0] https://azure.microsoft.com/en-gb/products/container-apps
        
             | mdaniel wrote:
             | Heh, your comment almost echos the positive thing I was
             | going to say, as well as highlighting half of why I loathe
             | Azure with every fiber of my being
             | 
             | https://learn.microsoft.com/en-us/azure/container-
             | instances/... is the one I was going to plug, because
             | coming from a kubernetes background it seems to damn near
             | be the PodSpec and thus both expresses a lot of my needs
             | and also is very familiar https://learn.microsoft.com/en-
             | us/azure/templates/microsoft....
             | 
             | Your link does seem to be a lot more "container, plus all
             | the surrounding stuff" in line with the "apps" part,
             | whereas mine more closely matches my actual experience of
             | what you said: container, go
             | 
             | The "what the fucking hell is wrong with you people?" part
             | is that their naming is just all over the place, and
             | changes constantly, and is almost designed to be misleading
             | in any sane conversation. I quite literally couldn't have
             | guessed whether Container Apps was a prior name of
             | Container Instances, a super set of it, subset, other? And
             | one will observe that while I said Container Instances, and
             | the URL says Container Instances, the ARM is Container
             | Groups. Are they the same? different? old? who fucking
             | knows. It's horrific
        
               | maccard wrote:
               | Oh yeah. This and resource groups are the only two things
               | that azure did well. Everything else is a disaster.
        
         | madduci wrote:
         | I share your thoughts. It looks like an entire article
         | endorsing AWS honestly
        
         | wodenokoto wrote:
         | If you are big enough to have regular meetings with AWS you are
         | big enough to have meetings with GCP.
         | 
         | I've had technicians at both GCP and Azure debug code and spend
         | hours on developing services.
        
           | marcinzm wrote:
           | > I've had technicians at both GCP and Azure debug code and
           | spend hours on developing services.
           | 
           | Almost every time Google pulled in a specialist engineer
           | working on a service/product we had issues with it was very
           | very clear the engineer had no desire to be on that call or
           | to help us. In other words they'd get no benefit from helping
           | us and it was taking away from things that would help their
           | career at Google. Sometimes they didn't even show up to the
           | first call and only did to the second after an escalation up
           | the management chain.
        
         | rswail wrote:
         | We are a reasonably large AWS customer and our account manager
         | sends out regular emails with NDA information on what's coming
         | up, we have regular meetings with them about things as wide
         | ranging as database tuning and code development/deployment
         | governance.
         | 
         | They often provide that consulting for free, and we know their
         | biases. There's nothing hidden about the fact that they will
         | push us to use AWS services.
         | 
         | On the other hand, they will also help us optimize those
         | services and save money that is directly measurable.
         | 
         | GCP might have a better API and better "naming" of their
         | services, but the breadth of AWS services, the incorporation of
         | IAM across their services, governance and automation all makes
         | it worth while.
         | 
         | Cloud has come a long way from "it's so easy to spin up a
         | VM/container/lambda".
        
           | politelemon wrote:
           | > There's nothing hidden about the fact that they will push
           | us to use AWS services.
           | 
           | Our account team don't even do that. We use a lot of AWS
           | anyway and they know it, so they're happy to help with
           | competitor offerings and integrating with our existing stack.
           | Their main push on us has been to not waste money.
        
             | bakchodi wrote:
             | When I was at AWS, I watched SAs get promoted for saving
             | customers money all the time.
             | 
             | AWS wants happy customers to stick around for a long time,
             | not one month of goosed income
        
               | deskamess wrote:
               | Yep. Pay us less every month and stick around for a long
               | time. Getting low prices makes it really difficult to
               | move away.
               | 
               | If you still decided to move away, and want to take data
               | with you, yeah... there is a cost. Heck there is a cost
               | to delete the data you have with them (like S3 content).
               | 
               | Its a good way to do business.
        
           | danpalmer wrote:
           | In a previous role I got all of these things from GCP - they
           | ran training for us, gave us early access to some alpha/beta
           | stage products (under NDA), we got direct onboarding from
           | engineers on those, they gave us consulting level support on
           | some things and offered much more of it than we took up.
        
         | simonbarker87 wrote:
         | Totally agree, GCP is far easier to work with and get things up
         | and running for how my brain works compared to AWS. Also, GCP
         | name stuff in a way that tells me what it does, AWS name things
         | like a teenage boy trying to be cool.
        
           | andreif wrote:
           | That's completely opposite to my experience. Do you have any
           | examples of AWS naming that you think is "teenage boy trying
           | to be cool"? I am genuinely curious.
        
             | alentred wrote:
             | BigQuery - Athena
             | 
             | Pub/Sub - Kinesis
             | 
             | Cloud CDN - CloudFront
             | 
             | Cloud Domains - Route 53
             | 
             | ...
        
               | andreif wrote:
               | Pub/sub is more like SNS or EventBridge Bus to me
        
               | andreif wrote:
               | I thought you meant API and parameters. Blaming them for
               | product names is weird to me.
        
               | geraldhh wrote:
               | why is that?
        
               | andreif wrote:
               | Why it's weird to blame them for product names? Because
               | their purpose slightly different. I can see where
               | negativity comes from and understand, but product name is
               | a lot less important as consistent API experience. AWS is
               | the best among big players by far, hats off and well-done
               | to their teams and leadership. I hope the others will
               | finally learn and follow.
        
               | morsecodist wrote:
               | My issue isn't just with the names themselves but they
               | are emblematic of AWS's overall mentality. They want to
               | have the AWS(TM) solution to X business case while other
               | cloud providers feel more like utilities that give you
               | building blocks. This obviously works for them and many
               | of their customers I just personally don't care for it.
               | It is probably to do with the level of complexity I am
               | working at (*which is not very complex).
               | 
               | Also, I don't think trying to emulate AWS's support and
               | consistent API makes sense as a strategy for other cloud
               | providers. They will never beat AWS at their own game, it
               | is light years ahead. If cloud providers want to survive
               | they need to fill a different niche and try different
               | things.
        
               | jgalt212 wrote:
               | It's nice when things do what they say on the tin. That
               | being said, it's hard to build a "brand" when you start
               | out with a generic name.
        
               | andreif wrote:
               | How many popular products have you named and launched?
               | Naming products is hard to meet both usability and
               | marketing objectives. This has never been as big of a
               | problem for me, as GCPs APIs for example. Those are the
               | true evil. Product names I care little for.
        
               | jgalt212 wrote:
               | > How many popular products have you named and launched?
               | 
               | One, and you often times only need one.
        
               | arccy wrote:
               | aws api and param names are stupidly long CamelCased and
               | not even consistent half the time like a leaky
               | abstraction over their underlying implementation
        
               | andreif wrote:
               | You remember any example? I don't call API directly and
               | usually use CLI/SDK/CDK that work a lot better than
               | gcloud. I did see some inconsistencies between services
               | (e.g. updating params for SQS and SNS) and that could
               | definitely be improved. But honestly, comparing to GCP
               | mess, AWS is ten times better.
        
               | simonbarker87 wrote:
               | Perfect list, also:
               | 
               | Google Cloud Run - Lambda
               | 
               | Sure I get the reference to the underlying algebraic
               | representation of coding but come on, Lambda tells us
               | nothing of what it does.
               | 
               | Products (not brands, products) should be named in a way
               | that means something to the customer afaic.
        
               | Hasu wrote:
               | > Perfect list, also:
               | 
               | > Google Cloud Run - Lambda
               | 
               | ECS is the AWS equivalent of Cloud Run. GCP Cloud
               | Functions are the equivalent of AWS Lambda.
               | 
               | ECS / Cloud Run = managed container service that
               | autoscales
               | 
               | Lambda / Cloud Functions = serverless functions as a
               | service
        
               | simonbarker87 wrote:
               | Thanks for the clarification hadn't appreciated the
               | difference. Also somewhat reiterates my point which is
               | nice as well
        
               | andreif wrote:
               | Have you named any successful product?
        
               | simonbarker87 wrote:
               | Yes, named a product and sold over 100,000 units of them.
               | Naming products is hard but not that hard.
        
         | andreif wrote:
         | GCP's SDK and documentation is a mess compared to AWS. And
         | looking at the source code I don't see how it can get better
         | any time soon. AWS seems to have proper design in mind and uses
         | less abstractions giving you freedom to build what you need.
         | AWS CDK is great for IAC.
         | 
         | The only weird part I experienced with AWS is their SNS API.
         | Maybe due to legacy reasons, but what a bizarre mess when you
         | try doing it cross-account. This one is odd.
         | 
         | I have been trying GCP for a while and DevX was horrible. The
         | only part that more-or-less works is CLI but the naming there
         | is inconsistent and not as well-done as in AWS. But it's
         | relative and subjective, so I guess someone likes it. I have
         | experienced GCP official guides that broken, untested or
         | utterly braindead hello-world-useless. And also they are
         | numerous and spread so it takes time to find anything decent.
         | 
         | No dark mode is an extra punch. Seriously. Tried to make it
         | myself with an extension but their page is Angular hell of
         | millions embedded divs. No thank you.
         | 
         | And since you mentioned Cloud Run -- it takes 3 seconds to
         | deploy a Lambda version in AWS and a minute or more for GCP
         | Could Function.
        
         | ratherbefuddled wrote:
         | We're relatively small GCP users (low six figures) and have
         | monthly cadence meetings with our Google account manager.
         | They're very accommodating, and will help with contacts, events
         | and marketing.
        
         | lysecret wrote:
         | Also much prefer GCP but gotta say their support is hot
         | steaming **. I wasted so much time for absolutely nothing with
         | them.
        
         | marcinzm wrote:
         | GCP support is atrocious. I've worked at one of their largest
         | clients and we literally had to get executives into the loop
         | (on both sides) to get things done sometimes. Multiple times
         | they broke some functionality we depended on (one time they
         | fixed it weeks later except it was still broken) or gave us bad
         | advice that cost a lot of money (which they at least refunded
         | if we did all the paperwork to document it). It was so bad that
         | my team viewed even contacting GCP as an impediment and
         | distraction to actually solving a problem they caused.
         | 
         | I also worked at a smaller company using GCP. GCP refused to do
         | a small quota increase (which AWS just does via a web form)
         | unless I got on a call with my sales representative and
         | listened to a 30 minute upsell pitch.
        
         | jq-r wrote:
         | > "regular cadence meetings with our AWS account manager" and I
         | am not sure what could be discusse.
         | 
         | As being on a number of those calls, its just a bunch of crap
         | where they talk like a scripted bot reading from corporate
         | buzzword bingo card over a slideshow. Their real intention is
         | two fold. To sell you even more AWS complexity/services, and to
         | provide "value" to their person of contact (which is person
         | working in your company).
         | 
         | We're paying north of 500K per year in AWS support (which is a
         | highway robbery), and in return you get a "team" of people
         | supposedly dedicated to you, which sounds good in theory but
         | you get a labirinth of irresponsiblity, stalling and
         | frustration in reality.
         | 
         | So even when you want to reach out to that team you have to
         | first to through L1 support which I'm sure will be replaced by
         | bots soon (and no value will be lost) which is useful in 1 out
         | of 10 cases. Then if you're not satisfied with L1's answer(s),
         | then you try to escalate to your "dedicated" support team, then
         | they schedule a call in three days time, or if that is around
         | Friday, that means Monday etc.
         | 
         | Their goal is to stall so you figure and fix stuff on your own
         | so they shield their own better quality teams. No wonder our
         | top engineers just left all AWS communication and in cases
         | where unavoidable they delegate this to junior people who still
         | think they are getting something in return.
        
           | Grimm665 wrote:
           | This rings so true from experience it hurts.
        
           | awskinda wrote:
           | > We're paying north of 500K per year in AWS support (which
           | is a highway robbery), and in return you get a "team" of
           | people supposedly dedicated to you, which sounds good in
           | theory but you get a labirinth of irresponsiblity, stalling
           | and frustration in reality.
           | 
           | I've found a lot of the time the issues we run into are self-
           | inflicted. When we call support for these, they have to
           | reverse-engineer everything which takes time.
           | 
           | However when we can pinpoint the issue to AWS services, it
           | has been really helpful to have them on the horn to confirm &
           | help us come up with a fix/workaround. These issues come up
           | more rarely, but are _extremely_ frustrating. Support is
           | almost mandated in these cases.
           | 
           | It's worth mentioning that we operate at a scale where the
           | support cost is a non-issue compared to overall engineering
           | costs. There's a balance, and we have an internal structure
           | that catches most of the first type of issue nowadays.
        
           | AtlasBarfed wrote:
           | This. This is the reality.
           | 
           | I am so tired of the support team having all the real
           | metrics, especially in io and throttling, and not surfacing
           | it to us somehow.
           | 
           | And cadence is really an opportunity for them to sell to you,
           | the parent is completely right.
        
       | jiggawatts wrote:
       | Something I've noticed with PaaS services like RDS or Azure SQL
       | is that people arguing against it are assuming that the
       | alternative is "competence".
       | 
       | Even in a startup, it's difficult to hire an expert in every
       | platform that can maintain a robust, secure system. It's
       | possible, but not guaranteed, and may require a high pay to
       | retain the right staff.
       | 
       | Many government agencies on the other hand are legally banned
       | from offering a competitive wage, so they can literally never
       | hire anyone that competent.
       | 
       | This cap on skill level means that if they do need reliable
       | platforms, the only way they can get one is by paying 10x the
       | real market rate for an over-priced cloud service.
       | 
       | These are the "whales" that are keeping the cloud vendors fat and
       | happy.
        
       | IamLoading wrote:
       | > Go is for services that are non-GPU bound.
       | 
       | What are they using for GPU bound services. Python?
        
         | cissmayazz wrote:
         | Python indeed
        
       | itpragmatik wrote:
       | Not sure the fascination about Go - one can write fully scalable
       | functional readable maintainable upgradable rest api service with
       | Java 17 and above.
        
         | MarkMarine wrote:
         | I struggle with the type system in both, but today I was going
         | through obscure go code and wishing interfaces were explicitly
         | implemented. Lack of sum types is making me sad
        
       | nickzelei wrote:
       | What are startups using for a logging tool that isn't datadog?
        
         | podoman wrote:
         | https://highlight.io
        
         | Too wrote:
         | Loki
        
         | ndr wrote:
         | https://axiom.co/
        
       | bilalq wrote:
       | I love this write-up and the way it's presented. I disagree with
       | some of the decisions and recommendations, but it's great to read
       | through the reasoning even in those cases.
       | 
       | It'd be amazing if more people published similar articles and
       | there was a way to cross-compare them. At the very least, I'm
       | inspired to write a similar article.
        
       | roughly wrote:
       | The Bazel one made me chuckle - I worked at a company with an scm
       | & build setup clearly inspired by Google's setup. As a non-ex-
       | Googler, I found it obviously insane, but there was just no way
       | to get traction on that argument. I love that the rest of this
       | list is pretty cut and dry, but Bazel is the one thing that the
       | author can't bring themself to say "don't regret" even though
       | they clearly don't regret not using it.
        
         | busterarm wrote:
         | I've seen Bazel reduce competent engineers to tears. There was
         | a famous blog post a half-decade ago called something like
         | "Bazel is the worst build system, except for all the others"
         | and this still seems to ring true for me today.
         | 
         | There are some teams I work with that we'll never bother to
         | make use Bazel because we know in advance that it would cripple
         | them.
        
           | ali_piccioni wrote:
           | Having led a successful Bazel migration, I'd still recommend
           | many projects to stick to the native or standard supported
           | toolchain until there's a good reason to migrate to a build
           | system (And I don't consider GitHub actions to be a build
           | system).
        
         | dieortin wrote:
         | I'm curious, what do you find insane about Bazel? In my
         | experience it makes plenty of sense. And after using it for
         | some months, I find more insane how build systems like CMake
         | depend on you having some stuff preinstalled in your system and
         | produce a different result depending on which environment
         | they're run on.
        
       | bayareabadboy wrote:
       | Interesting enough read. But I'm not sure he's a regretful enough
       | boy to write a blog to merit the title.
        
       | fswd wrote:
       | stuff like this makes me want to experiment with going back to
       | just one huge $100k server and running it all on one box in a
       | server rack.
        
         | sseagull wrote:
         | I am doing that. I am part of a research group, and don't have
         | the $$ or ability to pay so much for all these services.
         | 
         | So we got a $90k server with 184TB of raw storage (SAS SSD), 64
         | cores, and 1TB of memory. Put it on a 10GB line at our
         | university and it is rock solid. We probably have less downtime
         | than Github, even with reboots every few months.
         | 
         | Have some large (multi-TB) databases on it and web APIs for
         | accessing the data. Would be hugely expensive in the cloud
         | with, especially with egress costs.
         | 
         | You have to be comfortable sys-admining though. Fortunately I
         | am.
        
       | hi_hi wrote:
       | I was hoping there would be a section for Search Engines. It's
       | one of those things you tend to get locked in to, and it's hard
       | to clearly know your requirements well enough early on.
       | 
       | Any references to something like this with a Search slant would
       | be greatly appreciated.
        
       | brycelarkin wrote:
       | Awesome writeup! Just had a couple comments/questions.
       | 
       | > Not adopting an identity platform early on
       | 
       | The reason for not adopting an IDP early is because almost every
       | vendor price gouges for SAML SSO integration. Would you say it's
       | worth the cost even when you're a 3-5 person startup?
       | 
       | > Datadog
       | 
       | What would you recommend as an alternative? Cloudwatch? I love
       | everything about Datadog, except for their pricing....
       | 
       | > Nginx load balancer for EKS ingress
       | 
       | Any reason for doing this instead of an Application Load
       | Balancer? Or even HA Proxy?
        
         | kevinslin wrote:
         | For datadog, unfortunately there's no obvious altnernative
         | despite many companies trying to take marketshare. This is to
         | say, datadog both has second to none DX and a wide breadth of
         | services.
         | 
         | Grafana Labs comes closest in terms of breadth but their DX is
         | abysmal (I say this as a heavy grafana/prometheus user) Same
         | comments about new relic though they have better dx than
         | grafana. Chronosphere has some nice DX around prometheus based
         | metrics but lack the full product suite. I could go on but
         | essentially, all vendors either lack breadth, DX, or both.
        
       | hitekker wrote:
       | Props to the author for writing up the results from his exercise.
       | But I think he should focused on a few controversial ones, and
       | not the rotes ones.
       | 
       | Many of the decisions presented are not disagreeable (choosing
       | slack) and some lack framing that clarifies the associated loss
       | (Not adopting an identity platform early on). I think they're all
       | good choices worth mentioned; I would have preferred a deeper
       | look into the few that seemed easy and turned out to be hard, or
       | the ones that were hard and got even harder.
        
         | 8organicbits wrote:
         | > not the rotes ones
         | 
         | It helps to hear the validation, although I think almost every
         | decision has a dissenting voice in the HN comments.
        
       | isoprophlex wrote:
       | > There are no great FaaS options for running GPU workloads
       | 
       | This hits hard. Someone please take my (client's) money and
       | provide sane GPU FaaS. Banana.dev is cool but not really
       | enterprise ready. I wish there was a AWS/GCP/Azure analogue that
       | the penny pinchers and MBAs in charge of procurement can get
       | behind.
        
         | karbon0x wrote:
         | I am confused. Doesn't Modal Labs solve this?
        
           | isoprophlex wrote:
           | Definitely. But the sad reality is that in some corporate
           | environments (incumbent finance, government) if it's not a
           | button click in portal.azure.com away, you can spend 6-12
           | months in meetings with low energy gloomboys to get your
           | access approved.
        
             | karbon0x wrote:
             | Ah, I see. Yeah, been victim of that bureaucracy as well.
        
       | Rainymood wrote:
       | As a machine learning platform engineer these sound like
       | _technology choices_ as opposed to _infrastructure decisions_. I
       | would love to read this post but really with the infrastructure
       | trade-offs that were made. But thanks for the post.
       | 
       | Side node: There is a small typo repeated twice "Kuberentes"
        
       | davedx wrote:
       | Utter insanity. So much cost and complexity, and for what?
       | Startups don't think about costs or runway anymore, all they care
       | about is "modern infrastructure".
       | 
       | The argument for RDS seems to be "we can't automate backups".
       | What on earth?
        
         | isbvhodnvemrwvn wrote:
         | Is spending time to make it reliable worth it vs working on
         | your actual product? Databases are THE most critical things
         | your company has.
        
           | davedx wrote:
           | All that infra doesn't integrate itself. Everywhere I've
           | worked that had this kind of stack employed at least one if
           | not a team of DevOps people to maintain it all, full time,
           | the year round. Automating a database backup and testing it
           | works takes half a day unless you're doing something weird
        
             | isbvhodnvemrwvn wrote:
             | Setting up a multi-az db with automatic failover,
             | incremental backups and PiTR, automated runbooks and
             | monitoring all that doesn't take half a day, not even with
             | RDS.
        
               | davedx wrote:
               | No, but again, that sounds like a lot of complexity your
               | average startup does not need. Multi-az? Why?
        
               | marcinzm wrote:
               | Because their Enterprise client requires it on their due
               | diligence paperwork.
        
               | dvfjsdhgfv wrote:
               | Which makes little sense anyway as in practice the real
               | problems you have are from region/connectivity issues,
               | not AZ failures.
        
             | fullstackchris wrote:
             | A startup sized company using this many tools? They're for
             | sure doing something weird (and that's not a compliment :)
             | )
             | 
             | Totally on your side with this one - but alas, people
             | associate value with complexity.
        
             | ffsm8 wrote:
             | > _Automating a database backup and testing it works takes
             | half a day unless you're doing something weird_
             | 
             | True story bro
             | 
             | I'm sure that's possible if you're storing the backup on
             | the same server you're restoring on and everything is on
             | top of the line nvme storage. Otherwise your backup just
             | started to run and will need another few days to finish.
             | And that's only if you're running single master.
             | 
             | You're massively underestimating the challenge to get that
             | kind of automation done in a stable manner - and the
             | maintenance required to keep it working over the years.
        
               | davedx wrote:
               | I've implemented such a process for companies multiple
               | times, bro. I know what I'm talking about.
        
               | marcinzm wrote:
               | And that's the problem. "It's easy for me because I've
               | done it a dozen times so it's easy for everyone" is a
               | very common fallacy.
        
               | layer8 wrote:
               | What happened to having people trained by external
               | trainers for what you need? That's much cheaper than
               | having everything externally "managed" and still having
               | to integrate all of it. The number of services listed in
               | TFA is just ridiculous.
        
               | ffsm8 wrote:
               | I've done it before,too. For toy project, it's easy as
               | you said. It's not once you're at scale. It's hilarious
               | that people are down voting my comment. I guess there are
               | a lot of juniors suffering from the dunning Kruger
               | syndrome around right now
        
               | icedchai wrote:
               | I worked at a place with its own colo where they ran
               | several multi TB MySQL database servers. We did weekly
               | backups and it could take days. Our backups were stored
               | on external USB disks. The I/O performance was abysmal.
               | Taking a filesystem snapshot and copying it to USB could
               | take days. The disks would occasionally lock up and
               | someone would have to power cycle them. Total clown show.
               | 
               | I would rather pay for RDS. Databases are the one thing
               | you don't want to screw up.
        
           | Draiken wrote:
           | I see this argument a lot. Then most startups use that time
           | to create rushed half-assed features instead of spending a
           | week on their db that'll end up saving hundreds of thousands
           | of dollars. Forever.
           | 
           | For me that's short-sighted.
        
           | eptcyka wrote:
           | So investing in a critical part of my business is the bad
           | thing to do?
        
         | brightball wrote:
         | There are other providers with better value for service within
         | AWS or GCP, like Crunchy.
        
         | viraptor wrote:
         | > The argument for RDS seems to be "we can't automate backups".
         | What on earth?
         | 
         | I can automate backups and I'm extremely happy they with some
         | extra cost in RDS, I don't have to do that.
         | 
         | Also, at some size automating the database backup becomes non-
         | trivial. I mean, I can manage a replica (which needs to be
         | updated at specific times after the writer), then regularly
         | stop replication for a snapshot, which is then encrypted,
         | shipped to storage, then manage the lifecycle of that storage,
         | then setup monitoring for all of that, then... Or I can set one
         | parameter on the Aurora cluster and have all of that happen
         | automatically.
        
         | bowsamic wrote:
         | I agree but also I'm not entirely sure how much of this is
         | avoidable. Even the most simple web applications are full of
         | what feels like needless complexity, but I think actually a lot
         | of it is surprisingly essential. That said, there is definitely
         | a huge amount of "I'm using this because I'm told that we
         | should" over "I'm using this because we actually need it"
        
         | jstummbillig wrote:
         | The argument for RDS (and other services along those lines) is
         | "we can't do it as good, for less".
         | 
         | And, when factoring in _all_ costs and considering all things
         | the service takes care of, it seems like a reasonable
         | assumption that in a free market a team that specializes in
         | optimizing this entire operation will sell you a db service at
         | a better net rate than you would be able to achieve on your
         | own.
         | 
         | Which might still turn out to be false, but I don't think it's
         | obvious why.
        
         | overstay8930 wrote:
         | Everyone who says they can run a database better than Amazon is
         | probably lying or Has a story about how they had to miss a
         | family event because of an outage.
         | 
         | The point isn't that you can't do it, the point is that it's
         | less work for extremely high standards. It is not easy to
         | configure multi region failover without an entire network team
         | and database team unless you don't give a shit about it
         | actually working. Oh yea, and wait until you see how much SOC2
         | costs if you roll your own database.
        
       | tofflos wrote:
       | > Using cert-manager to manage SSL certificates
       | 
       | > Very intuitive to configure and has worked well with no issues.
       | Highly recommend using it to create your Let's Encrypt
       | certificates for Kubernetes.
       | 
       | > The only downside is we sometimes have ANCIENT (SaaS problems
       | am I right?) tech stack customers that don't trust Let's Encrypt,
       | and you need to go get a paid cert for those.
       | 
       | Cert-manager allows you to use any CA you like including paid
       | ones without automation.
        
       | kunley wrote:
       | A fallacy of a "choice" between GCP and AWS never stops to
       | entertain me
        
       | danielovichdk wrote:
       | I would have liked some data around why these technologies were
       | chosen and preferably based on loads from customers.
       | 
       | Seems like yagni to me but please prove me wrong
        
       | corentin88 wrote:
       | Curious about the mention of buying IPs. Anyone else can share
       | feedback/thoughts on this?
        
         | cissmayazz wrote:
         | This was done for multiple reasons but mainly security and to
         | allow customers to whitelist a certain ip range.
        
       | throwawaaarrgh wrote:
       | This guy gets it, I agree with it all. The exception being, use
       | Fargate without K8s and lean on Terraform and AWS services rather
       | than the K8s alternatives. When you have no choice left and you
       | have to use K8s, then I would pick it up. No sense going down
       | into the mines if you don't have to.
        
       | michidk wrote:
       | > Code is of course powerful, but I've found the restrictive
       | nature of Terraform's HCL to be a benefit with reduced
       | complexity.
       | 
       | No way. We used Terraform before and the code just got
       | unreadable. Simple things like looping can get so complex.
       | Abstraction via modules is really tedious and decreases
       | visibility. CDKTF allowed us to reduce complexity drastically
       | while keeping all the abstracted parts really visible. Best
       | choice we ever made!
        
       | opentokix wrote:
       | After working with infrastructure for 20 years, I fully endorse
       | this post.
        
       | kosolam wrote:
       | What is the cost? With 1/10th of the sum one capable engineer can
       | setup a way better infra on premise. The days of free money is
       | over, guys. Wake up!
        
       | politelemon wrote:
       | > Ubuntu for dev servers
       | 
       | I didn't understand this section. Ubuntu servers as dev
       | environment, what do you mean? As in an environment to deploy
       | things onto, or a way for developers to write code like with
       | VSCode Remote?
        
         | hahnchen wrote:
         | seems like the latter given "Originally I tried making the dev
         | servers the same base OS that our Kubernetes nodes ran on,
         | thinking this would make the development environment closer to
         | prod"
        
           | runiq wrote:
           | But I thought the whole point of the container ecosystem was
           | to abstract away the OS layer. Given that the kernel is
           | backwards compatible to a fault, shouldn't it be enough to
           | have a kernel that is as least as recent as the one on your
           | k8s platform (provided that you're running with the default
           | kernel or something close to it)?
        
         | brainzap wrote:
         | My take from this was more: being uniform reduced overhead of
         | maintaining.
         | 
         | Being able to write a bash script that runs on ever machine is
         | nice.
        
       | politelemon wrote:
       | > homebrew for Linux
       | 
       | No, just no. I see this cropping up now and then. Homebrew is
       | unsafe for Linux, and is only recommended by Mac users that don't
       | want to bother to learn about existing package management.
        
       | erostrate wrote:
       | The author leads infrastructure at Cresta. Cresta is a customer
       | service automation company. His first point is about how happy he
       | is to have picked AWS and their human-based customer service,
       | versus Google's robot-based customer service.
       | 
       | I'm not saying there's anything wrong, and I'm oversimplifying a
       | bit, but I still find this amusing.
        
         | lysecret wrote:
         | Haha very good catch. I prefer GCP but I will admit any day of
         | the week that their support is bad. Makes sense that they would
         | value good support highly.
        
           | danpalmer wrote:
           | We used to use AWS and GCP at my previous company. GCP
           | support was fine, and I never saw anything from AWS support
           | that GCP didn't also do. I've heard horror stories about
           | both, including some security support horror stories from AWS
           | that are quite troubling.
        
       | yread wrote:
       | > Ubuntu
       | 
       | we have dotnet webapp deployed on Ubuntu and it leaves a lot to
       | be desired. The package for .net6 from default repo didn't
       | recognise other dotnet components installed, net8 is not even
       | coming to 22.04 - you have to install from the ms repo. But that
       | is not compatible with the default repo's package for net6 so you
       | have to remove that first and faff around with exact versions to
       | get it installed side by side...
       | 
       | At least I don't have to deal with rhel Why is renewing a dev
       | subscription so clunky?!
        
       | f549abd0 wrote:
       | Disagree on the point and reasoning about the single database.
       | 
       | Sounds like they experienced badly managed and badly constrained
       | database. The described fks and relations: that's what the key
       | constraints and other guard rails and cascades are for - so that
       | you are able to manage a schema. That's exactly how you do it:
       | add in new tables that reference old data.
       | 
       | I think the regret is actually not managing the database, and not
       | so much about having a single database.
       | 
       | "database is used by everyone, it becomes cared for by no one".
       | How about "database is used by everyone, it becomes cared for by
       | everyone".
        
         | f549abd0 wrote:
         | Reading further
         | 
         | > Endorse-ish: Schema migration by Diff
         | 
         | Well that explains it... What a terrible approach to migrations
         | for data integrity.
        
           | cloogshicer wrote:
           | Genuinely curious (I don't have much experiences with DBs),
           | how is schema migration done 'properly' these days?
        
             | jspdown wrote:
             | Incremental forward-only migrations (non-state based).
             | Then, for the How and When, it mostly depends of your
             | constraints and sizes. There's no silver bullet, it's hard,
             | it require constant thinking, it's a slow and often multi
             | step process.
             | 
             | I never saw a successful fully automated one-way-of-doing
             | process.
        
               | from-nibly wrote:
               | Are you talking about the mechanics? Like more than just
               | run a migration script on boot?
        
           | jamescontrol wrote:
           | Can you explain? Having a tool to detect changes and create a
           | migration doesn't sound bad? In a nutshell thats how django
           | migrations work as well, which works really well.
        
         | Sankozi wrote:
         | > How about "database is used by everyone, it becomes cared for
         | by everyone".
         | 
         | So every one needs to know every use case of that database?
         | Seems very unlikely if there are multiple teams using same DB.
         | 
         | FKs? Unique constraints? Not null colums? If not added at the
         | creation of the table they will never be added - the moment DB
         | is part of a public API you cannot do a lot of things safely.
         | 
         | The only moment when you want to share DB is when you really
         | need to squeeze every last bit of performance - and even then,
         | you want to have one owner and severly limited user accounts
         | (with white list of accessible views and stored procedures).
        
           | layer8 wrote:
           | The database should never ever become part of a public API.
           | 
           | You don't share a DB for performance reasons (rather the
           | opposite), you do it to ensure data integrity and
           | consistency.
           | 
           | And no, not everyone needs to know every use case. But every
           | team needs to have _someone_ who coordinates any overlapping
           | schema concerns with the other teams. This needs to be
           | managed, but it's also not rocket science.
        
             | Sankozi wrote:
             | If database is shared it is a part of an API. If it is
             | shared between teams then it is a public API.
             | 
             | If DB is shared then data from different users is
             | entered/updated through multiple transactions. So you
             | cannot get anything better regarding consistency and
             | integrity compared to multiple DBs and distributed TXs.
             | 
             | By introducing schema change coordination you will
             | introduce enormous delays to almost any DB change. This is
             | more realistic than everyone knowing each use case but less
             | practical. Shared DB is an antipattern either way.
        
       | eadmund wrote:
       | > Startups don't have the luxury of a DBA ...
       | 
       | I understand, _but_ I think they don't have the luxury of _not_
       | having a DBA. Data is important; it's arguably more important
       | than code. Someone needs to own thinking about data, whether it
       | is stored in a hierarchical, navigation-based database such as a
       | filesystem, a key-value stored like S3 (which, sure, can emulate
       | a filesystem), or in a relational database. Or, for that matter,
       | in vendor systems such as Google Workspace email accounts or
       | Office365 OneDrive.
        
         | Draiken wrote:
         | Early on, depending on what you're building, you don't need a
         | fully fleshed DBA and can get away with at least one person
         | that knows DB fundamentals.
         | 
         | But if you only want to hire React developers (or swap for the
         | framework of the week) then you'll likely end up with zero
         | understanding of the DB. Down the line you have a mess with
         | inconsistent or corrupted data that'll come back with a
         | vengeance.
         | 
         | It's short-sighted for serious endeavors.
        
       | lysecret wrote:
       | Half the stuff is K8s related... Makes me very happy to use Cloud
       | Run.
        
       | gokhan wrote:
       | "Multiple applications sharing a database" and Kubernetes sound
       | really funny together:)
        
       | shp0ngle wrote:
       | I should _really_ learn AWS huh
        
       | knowsuchagency wrote:
       | Using k8s over ECS and raw-dogging Terraform instead of using the
       | CDK? It's no wonder you end up needing to hire entire teams of
       | people just to manage infra
        
       | BrickTamblan wrote:
       | What's the right way to manage npm installs and deploy it to an
       | AWS ec2 instance from github? Kubernetes? GitOps? EKS? I roll my
       | own solution now with cron and bash because everything seems so
       | bloated.
        
       | ildjarn wrote:
       | Reading this I couldn't help but think: yeah all of these points
       | make sense in isolation, but if you look at the big picture, this
       | is an absurd level of complexity.
       | 
       | Why do we need entire teams making 1000s of micro decisions to
       | deploy our app?
       | 
       | I'm hungry for a simpler way, and I doubt I'm alone in this.
        
         | klabb3 wrote:
         | You're not alone. There is a constant undercurrent of pushback
         | against this craziness. You see it all the time here on hacker
         | news and with people I talk to irl.
         | 
         | Does not mean each of these things don't solve problems. The
         | issue as always about complexity-utility tradeoff. Some of
         | these things have too much complexity for too little utility.
         | I'm not qualified to judge here, but if the suspects have
         | Turing-complete-yaml-templates on their hands, it probably ties
         | them to the crime scene.
        
         | Sammi wrote:
         | It smells like ZIRP is not over yet. VCs are still burning
         | money in the AWS fire pit.
        
           | kibwen wrote:
           | ZIRP was never the root problem.
           | 
           | The problem was: _too much money, too few consequences for
           | burning it_.
           | 
           | The existence of the uber-wealthy means that markets can no
           | longer function efficiently. _Every_ market remains
           | irrational longer than anyone who 's not uber-wealthy can
           | remain solvent.
           | 
           | Welcome to the new normal.
        
             | daxfohl wrote:
             | Now it's "fix it with AI". (And pay lip service to green
             | tech.)
        
       | jgalt212 wrote:
       | > We use Okta to manage our VPN access and it's been a great
       | experience.
       | 
       | I have no first hand exerience wtih Okta, but everything I read
       | about it makes me scared to use it. i.e. stability and security.
        
       | rexreed wrote:
       | Sounds like a whole lot of stuff for a startup. Maybe start with
       | a simple stack until there's market fit. Even Amazon didn't start
       | this way.
        
       | iandanforth wrote:
       | For people who enjoyed this post but want to see the other side
       | of the spectrum where self hosted is the norm I'll point to the
       | now classic series of posts on how Stack Overflow runs its infra:
       | https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...
       | 
       | If anyone has newer posts like the above, please reply with links
       | as _I_ would love to read them.
        
         | alecthomas wrote:
         | https://world.hey.com/dhh/why-we-re-leaving-the-cloud-654b47...
         | is another good one. There are a few different posts on it
         | scattered around:
         | 
         | https://world.hey.com/dhh/we-stand-to-save-7m-over-five-year...
         | 
         | https://world.hey.com/dhh/our-cloud-exit-has-already-yielded...
         | 
         | Related, looks like X is doing similar:
         | https://twitter.com/XEng/status/1717754398410240018
        
       | Shorel wrote:
       | I see more 'Endorse' items than 'Regret' items.
       | 
       | Anyway, amazing write up.
       | 
       | Learning about alternatives to Jira is always good.
        
       | maccard wrote:
       | I see homebrew in here as a way to distribute <stuff> internally.
       | 
       | We have non-developers (artists, designers) on our team, and
       | asking them to manage homebrew is a non-starter. We're also on
       | windows.
       | 
       | We current just shove everything (and I mean everything) in
       | perforce. Are there any better ways of distributing this for a
       | small team?
        
       | pavel_lishin wrote:
       | > _Discourage private messages and encourage public channels._
       | 
       | I wish my current company did this. It's infuriating. The other
       | day, I asked a question about how to set something up, and a
       | manager linked me to a channel where they'd discussed that very
       | topic - but it was private, and apparently I don't warrant an
       | invite, so instead I have to go bother some other engineers (one
       | of whom is on vacation.)
       | 
       | Private channels should be for sensitive topics (legal, finance,
       | etc) or for "cozy spaces" - a team should have a private channel
       | that feels like their own area, but for things like projects and
       | anything that should be searchable, please keep things public.
        
       | foxhop wrote:
       | I think kubernetes is a mistake and should have went with AWS ECS
       | (using fargate or backed by autoscaling ec2), if single change he
       | wouldn't need to even thing about a bunch of other topics on his
       | list. Something to think about, AWS Lambda first then fallback to
       | AWS ECS for everything else that needs to really be on 100% of
       | the time.
        
       | pigcat wrote:
       | > My general infrastructure advice is "less is better".
       | 
       | I found this slightly ironic given there are ~50 headers in the
       | article :)
       | 
       | I liked the format of the writeup
        
       | thesurlydev wrote:
       | I've seen a lot of comments about how bad DataDog is because of
       | cost but surprisingly I haven't seen open-source alternatives
       | like OpenTelemetry/Prometheus/Grafana/Tempa mentioned.
       | 
       | Is it because most people are willing to pay someone else to
       | manage monitoring infrastructure or other reasons?
        
         | kevinslin wrote:
         | the way I think of datadog is that datadog it provides a second
         | to none DX combined with a wide suite of product offerings that
         | is good enough for most companies most of the time. does it
         | have opaque pricing that can be 100x more expensive than
         | alternatives? absolutely! will people continue to use it? yes!
         | 
         | something to keep in mind is that most companies are not like
         | the folks in this thread. they might not have the expertise,
         | time or bandwidth to build invest in observability.
         | 
         | the vast majority of companies just want something that
         | basically works and doesn't take a lot of training to use. I
         | think of Datadog as the Apple of observability vendors - it
         | doesn't offer everything and there are real limitations (and
         | price tags) for more precise use cases but in the general case,
         | it just works (especially if you stay within its ecosystem)
        
       | data_maan wrote:
       | Noob here - all these are great... but why can't I just use
       | Heroku to radically not have to deal with a large prt if these
       | things?
        
       ___________________________________________________________________
       (page generated 2024-02-10 23:01 UTC)