[HN Gopher] Flowchart: How should I run containers on AWS?
       ___________________________________________________________________
        
       Flowchart: How should I run containers on AWS?
        
       Author : kiyanwang
       Score  : 100 points
       Date   : 2021-11-14 14:02 UTC (9 hours ago)
        
 (HTM) web link (www.vladionescu.me)
 (TXT) w3m dump (www.vladionescu.me)
        
       | pilotneko wrote:
       | If you have less than 250 engineers, this guide is not for you?
       | Strange.
       | 
       | Edit: Turns out I read it wrong, apologies. This guide is for
       | companies with less than 250 engineers.
        
         | bluehatbrit wrote:
         | I believe it's actually saying if you have more than 250
         | engineers, then the guide is not for you.
        
         | [deleted]
        
         | fivea wrote:
         | > If you have less than 250 engineers, this guide is not for
         | you? Strange.
         | 
         | I'd guess that if your org is big enough to have 250 engineers
         | in its payroll, AWS services are a waste of cash given you can
         | deploy better and cheaper. For example, Hetzner has less than
         | 200 employees, and it's a global cloud provider.
        
           | hhh wrote:
           | I don't think this is true at all. AWS has a lot more in an
           | org than "the bill is smaller." There's a restrictive factor
           | when hiring if you move away from Google Cloud,Azure, and
           | AWS.
        
             | fivea wrote:
             | > I don't think this is true at all. AWS has a lot more in
             | an org than "the bill is smaller."
             | 
             | The bill is not only astronomically high but it's also
             | unpredictable and uncontrollable. Which added valued do you
             | believe justifies this?
             | 
             | Meanwhile, keep in mind that the likes of Dropbox gave AWS
             | a try but in the end learned from their experience and
             | opted to migrate out after only 5 years.
             | 
             | > There's a restrictive factor when hiring if you move away
             | from Google Cloud,Azure, and AWS.
             | 
             | I disagree, considering that in AWS you either run vanilla
             | VMs from EC2, or containerized solutions in ECS/EKS. Either
             | way you are already better off using either your local VMs
             | or your company's kubernetes deployment.
        
               | marcinzm wrote:
               | >The bill is not only astronomically high but it's also
               | unpredictable and uncontrollable.
               | 
               | If you've got 250 engineers then the AWS is going to be
               | fairly predictable and well monitored.
               | 
               | >Meanwhile, keep in mind that the likes of Dropbox gave
               | AWS a try but in the end learned from their experience
               | and opted to migrate out after only 5 years.
               | 
               | Dropbox is memorable because they migrated off which says
               | what many other companies don't do.
               | 
               | As you can tell by the massive growth of Azure many large
               | enterprise companies definitely see the value despite
               | already have their own datacenters.
        
               | fivea wrote:
               | > If you've got 250 engineers then the AWS is going to be
               | fairly predictable and well monitored.
               | 
               | This is the very first time I hear anyone describe AWS
               | services as "fairly predictable and well monitored",
               | because there is simply no such thing. There are
               | professional services being sold at a premium whose value
               | proposition is to make sense of AWS billing and reign it
               | in. With AWS, your company simply sets a budget with a
               | significant slack and hope for the best.
               | 
               | > Dropbox is memorable because they migrated off which
               | says what many other companies don't do.
               | 
               | Not true. Dropbox is memorable because it was depicted as
               | a poster child of AWS, which ultimately found out it was
               | not worth it at all. And Dropbox is not alone.
               | 
               | https://www.datacenterknowledge.com/archives/2016/03/16/m
               | ovi...
               | 
               | https://www.uscloud.com/blog/3-reasons-33-companies-
               | ditching...
        
           | xyzzy123 wrote:
           | At the high end AWS was a huge win for "traditional
           | enterprises" because it broke them free from being held
           | hostage by their own internal IT bureaucracies.
           | 
           | Enterprises provided many of the things inside AWS but much,
           | much worse.
           | 
           | I actually think the political revolution it (cloud)
           | triggered was a bigger thing than the technical and
           | engineering changes.
           | 
           | AWS is expensive compared to a machine under your desk but
           | not so much compared to the fully loaded cost of (say) a
           | VMware VM at a bank DC, which includes multi-year enterprise
           | hardware & software commitments to make it go plus staff and
           | admin costs.
           | 
           | The DC model choked off innovation that was not approved
           | "from the top" and severely damaged agility.
        
             | benjaminwootton wrote:
             | Totally agree with this and have made the same point many
             | times. The chance to reset the culture, operating model,
             | politics and suppliers is the really big opportunity for
             | big companies adopting cloud.
        
             | marcinzm wrote:
             | I think the people complaining about cloud have never had
             | to work in a large enterprise with an internal IT
             | department managing a physical datacenter. Even at Google
             | people complain about that side of things.
        
           | CSDude wrote:
           | This apples to oranges comparision always keeps coming up.
           | Comparing AWS to a VPS/Physical Services provider is strange.
           | AWS has more services than Hetzner has as employees. AWS
           | bandwidth, computation has huge and unreasonable price, but
           | there are also superb services in AWS like S3, SQS, DynamoDB,
           | Lambda.
           | 
           | All major cloud providers have similar computation pricing.
           | As much as I love Hetzner, they are not a major cloud
           | provider.
        
             | qaq wrote:
             | DynamoDB is a horrible KV
        
           | marcinzm wrote:
           | In my experience the productivity cost on a larger company of
           | physical servers is massive. The IT department is a
           | monopolistic provider of compute services to other
           | departments. At the same time IT competes with those
           | departments for overall company budget. Needless to say the
           | result of these mismatched economic and political incentives
           | is a mess.
        
           | tgv wrote:
           | That's my guess to, but then the flowchart mentions 25
           | requests. Per day!
        
         | GavinAnderegg wrote:
         | There's an explanation about that (and the other points in the
         | flowchart) in the image shown further down in the blog post.
         | You can see it here:
         | https://www.vladionescu.me/posts/flowchart-how-should-i-run-...
        
         | OJFord wrote:
         | *more than, not less. (And then sibling comments/the annotated
         | version explain why if you still think that's strange.)
        
       | evoxmusic wrote:
       | You should take a look at qovery.com - quite simple to deploy a
       | containerized app on AWS
        
       | fivea wrote:
       | Cool link, although I think that in case your deployment package
       | is <100MB then it's preferable to use AWS Lambdas by simply
       | deploying the zip file archive instead of pushing a container
       | image.
        
       | time0ut wrote:
       | Last time I tried it (~May 2021), Lambda containers had terrible
       | cold start times. Even a node hello world would cold start in 2-3
       | seconds. Same code packaged as a ZIP file would cold start in
       | less than 1/10th the time. Maybe its better now?
        
         | jayar95 wrote:
         | It's been a year since I touched an AWS lambda, but I'd bet
         | money that cold starts are still an issue. There is a common
         | hack that half works: have your function run every minute (you
         | can use eventbridge rules for this); in the function handler,
         | the first thing you should evaluate is whether or not it's a
         | warming event or not, and exit 0 if it is. Your results may
         | vary (mine did lol)
        
       | InTheArena wrote:
       | This is awful. It completely ignores the economics of leveraging
       | commodity services and re-usable skills while ignoring the true
       | places where you can get maximal value if you are willing to
       | accept vendor lock in - the high level services.
       | 
       | This is more or less the equivalent of mandating that everyone
       | use VMS once UNIX started to commodify, or use Windows for all of
       | your servers once Linux has taken over the market. Using EKS +
       | Fargate instead of ECS + fargate provides no savings over EKS,
       | All it serves to do is to lock you into a single hyper-scaler
       | infrastructure, at the same time as K8s is forcing he cloud
       | vendors into commoditization.
       | 
       | Want to use AWS effectively? Depend on high level services like
       | Glue, Athena, Kinesis Firehose, Sagemaker. Want to piss away any
       | chance to run your business effectively? Leverage ECS.
       | 
       | If you are a one-man shop, and you know ECS or are willing to
       | depend on underbaked solutions because they solve a problem, more
       | power to you. I suspect that you may benefit over your career by
       | investing in more universal skill sets (alternative, you may
       | benefit from hyper-specializing on AWS toolsets as well(.
        
         | glogla wrote:
         | I would very much say it's the other way around.
         | 
         | AWS is good at giving you commodity infrastructure at scale.
         | They are really good at "stupid" services like S3 or EC2 or
         | RDS. The high-level services meanwhile? I worked with quite a
         | few of them and they're mostly shit.
         | 
         | Athena is has account-wide limit of 5 concurrent queries which
         | cannot be increased. Even one larger dashboard will overload
         | it.
         | 
         | Redshift have the same limit of 30 concurrent queries. That's
         | good enough for casual use but not suitable for larger company.
         | 
         | Glue Catalog does not scale at all, and having more than few
         | objects will break it, and you will very soon end up begging
         | for more API limit due to throttling exceptions.
         | 
         | Kinesis has very strange limits (in messages per second) that
         | make it really expensive for use cases where traffic is peaky -
         | which is quite a few streaming use cases.
         | 
         | Just like you wouldn't use Amazon.com to buy high-quality,
         | important goods (because you're pretty likely to get something
         | broken or fake), don't use AWS "high level" services. Amazon is
         | company focused on scaling commodity use cases, not at
         | engineering excellency.
        
           | pilate wrote:
           | I recently had my Athena limit increased to 25 for my
           | account, so that's no longer true.
        
             | glogla wrote:
             | If they actually can increase the concurrency now, that's
             | good news!
             | 
             | We asked for that as well, and at first the support said
             | they did that, but turns out they increased the Athena
             | queue length instead of concurrency - so the queries
             | wouldn't fail (until the queue is full, anyway) but the
             | throughput was still the same.
             | 
             | Then the team of AWS account managers and solutions
             | architects (who assured us many times that the concurrency
             | can be increased) get into a email discussion with the
             | support and Athena team, before coming back and telling us
             | it cannot be increased. No apology or anything. Recommended
             | us to run Presto on EMR instead.
             | 
             | It was a sobering experience in how AWS operates.
        
           | finnh wrote:
           | > Amazon is company focused on scaling commodity use cases,
           | not at engineering excellency.
           | 
           | I think of it slightly differently. S3 is clearly the product
           | of engineering excellency (as is DynamoDb).
           | 
           | But I think there are A teams and B teams at AWS, as you
           | would expect from any company so large. And I get the sense
           | that at least some of the newer, less proven solutions were
           | written by the B teams. This likely includes some k8s-market-
           | share-grabby offerings.
        
             | glogla wrote:
             | I heard same story told slightly differently - for services
             | that process your requests when you go to amazon.com to buy
             | stuff, you can expect engineering excellency. That would be
             | dynamo, s3, ec2, networking (that whole side of AWS -
             | hypervisors, networks, etc., I'd call engineering marvel).
             | 
             | But that would not be software they build "for someone
             | else", like Redshift and Glue and Sagemaker and whole lot
             | of other stuff. There's a whole lot of half-baked services
             | in the ~200 they have.
        
       | larrymyers wrote:
       | If you have the desire to understand the individual components of
       | running infrastructure for docker containers I'd suggest the full
       | hashicorp stack.
       | 
       | Running nomad, vault, consul together isn't difficult, and will
       | get you a better understanding how to deploy 12 factor apps with
       | good secret storage and service discovery.
       | 
       | Add in Traefik for routing and you've got an equivalent stack to
       | k8s, but you can actually understand what each piece is doing and
       | scale each component as needed.
       | 
       | If you're going to run this all on AWS you can stick to just EC2
       | and not have to drown in documentation for each new abstraction
       | AWS launches.
       | 
       | As an added bonus nomad can run far more things than just
       | containers, so you have an on-ramp for your legacy apps.
        
         | tyingq wrote:
         | I suppose one watch out is to not get too comfortable with the
         | _" isn't difficult"_ part. That seemed to be the root cause of
         | the extended outage at Roblox. That the _" it all works
         | together"_ bit lulled them into not really researching the
         | impact of changes.
        
           | amitkgupta84 wrote:
           | How do you know that? I've only seen vague stuff from Roblox
           | about the outage. Did you read between the lines in one of
           | those posts and conclude it had to do with Hashistack
           | interoperability? Would love to know what you read that
           | implied that.
        
             | tyingq wrote:
             | Educated guess, based on things like:
             | 
             | https://news.ycombinator.com/item?id=29063026
             | 
             | Then reading back their vague statements like _" A core
             | system in our infrastructure became overwhelmed, prompted
             | by a subtle bug in our backend service communications"_ and
             | seeing if that matches. It seems to.
             | 
             | But, yes, I'm guessing.
        
       | atomland wrote:
       | I run multiple small EKS clusters at a small company. It doesn't
       | cost anywhere near $1 million per year, even taking my salary
       | into account. If you don't factor in my salary, it's maybe $50k
       | per year, and that's for 4 clusters.
       | 
       | Honestly this flowchart is kind of a mess, and I certainly
       | wouldn't recommend it to anyone.
        
         | dzikimarian wrote:
         | Same here. Multiple java/php applications on EKS. It got much
         | better when we found a few guys who focused on resolving the
         | issues instead complaining how hard kubernetes is.
        
       | jpgvm wrote:
       | Yeah no. ECS is the worst of all worlds, Fargate made it less
       | shit, it didn't make it good.
       | 
       | If you don't have the people to do k8s then stick to Lightsail,
       | don't do containers poorly just because you can.
       | 
       | Half-assing it will just make everyone miserable and end up with
       | a mix-match of "thing that can run on ECS" and "things that can't
       | because they need X" where X is really common/useful things like
       | stateful volumes (EBS).
        
         | DandyDev wrote:
         | My experience with ECS is quite okay. You start a cluster,
         | stick a container on it through a task and optionally a service
         | + load balancer and it just works.
         | 
         | It doesn't seem any harder than EKS, and it's mostly cheaper.
         | 
         | I also find some comments on this article about vendor lock-in
         | dubious, because in the end it's a bunch of containers created
         | from Dockerfiles, which you can easily reuse elsewhere.
        
           | peakaboo wrote:
           | It's much, much simpler than EKS, that's the huge selling
           | point of Fargate.
           | 
           | I've been running containers for many years on Fargate now
           | and they literally never fail or give me any problems.
           | 
           | Kubernetes on the other hand, weird issues of complexity all
           | over the place. It can do more than Fargate but do you need
           | it? If not, skip it.
        
         | dmw_ng wrote:
         | ECS can do EBS volumes via the Rex-Ray Docker plugin. Depending
         | on how fast you need new instances to come up, installation can
         | be a 5-liner in userdata
        
       | CSDude wrote:
       | Not a flowchart but has additional inforomation.
       | https://www.lastweekinaws.com/blog/the-17-ways-to-run-contai...
        
         | bilalq wrote:
         | Yeah, this article is the first thing that came to mind for me
         | as well.
         | 
         | I still need to explore how AppRunner compares to CodeBuild for
         | this purpose.
        
           | CSDude wrote:
           | CodeBuild runs and dies. AppRunner is an HTTP server with
           | autoscale based on concurrent requests.
        
             | bilalq wrote:
             | But AppRunner also scales down to zero (albeit with reduced
             | instead of zero billing).
             | 
             | For use-cases like distributed job scheduling, an
             | EventBridge event triggering either an AppRunner request or
             | a CodeBuild execution or an ECS task could work. It's still
             | a little tricky to figure out what the best choice is.
        
         | DandyDev wrote:
         | Super informative article, thanks for that!
         | 
         | It comes to the same conclusions that I intuitively had as
         | well:
         | 
         | - EKS if you really have to use k8s - ECS if you have a modicum
         | of complexity but still want to keep it simple - AppRunner if
         | you just want to run a container
        
       | gavinray wrote:
       | I have never heard of AppRunner, thanks for posting this!
       | 
       | I've used Fargate as I thought it was the easiest/cheapest route,
       | but according to the notes on chart:                 "From
       | 2020-2021 the best option was/is ECS and Fargate".
       | "Maybe from 2023-ish AppRunner will become the best option. It's
       | a preview service right now but on the path to be awesome!"
       | "It may not totally take over ECS on Fargate, but it will take
       | over most common usecases."
       | 
       | And according to this chart, AppRunner apparently is the service
       | I ought to be using for most of my apps.
        
         | CSDude wrote:
         | AppRunner does not have VPC support since the launch. Otherwise
         | great service.
        
           | ignoramous wrote:
           | If one is bought into the AWS ecosystem, they could consider
           | Lightsail Containers for cheaper Fargate-like devx: https://l
           | ightsail.aws.amazon.com/ls/docs/en_us/articles/amaz...
        
       | krinchan wrote:
       | Unfortunately, my enterprise scale employer forced me onto ECS on
       | EC2 for some of my apps for a very specific reason: Reserved
       | instance pricing. I think there's reserved instance-like pricing
       | for Fargate now. For one particular set of containers (Java
       | application we license from a vendor and then customize with
       | plugins somewhat), the CPU and RAM requirements are fairly large
       | so the savings of ECS on EC2 with the longest Reserved Instance
       | Contract means that I will forever be dealing with the idiocy of
       | ASG based Capacity Providers.
       | 
       | For those not in the know, ASG based Capacity Providers are hard
       | to work with because they are very immutable so you end up having
       | to create-then-delete any changes that touch the capacity
       | provider. A capacity provider cannot be deleted if the ASG has
       | any instances. Many tools like terraform's AWS provider will
       | refuse to delete the ASG till the capacity provider is deleted.
       | The terraform provider just cannot properly reason about the
       | process of discovering and scaling in the ECS tasks on the
       | provider, scaling in the ASG, waiting for 0 instances, and then
       | deleting the Capacity Provider. It's honestly beyond how
       | providers are supposed to work.
       | 
       | TL;DR: The flow chart is somewhat correct: Do everything in your
       | power to run on ECS Fargate. It's mature enough and has excellent
       | VPC and IAM support these days. Stay as far away from ECS on EC2
       | as you can.
       | 
       | As for EKS, I like it but this company runs on the whole
       | "everyone can do whatever" so each team would have to run its own
       | EKS cluster. If we had a centralized team providing a base k8s
       | cluster with monitoring and what not built in for us to deploy
       | on, I'd be more amenable to it. As it stands, I would have to
       | learn both the development and ops AND security sides of running
       | EKS for a handful of apps. ECS while seeming similar on the
       | surface is much simpler and externalizes concepts like ingresses
       | and load balancing and persistence into the AWS concepts and
       | tooling (CDK, CloudFormation, Terraform) you already know (one
       | hopes).
        
         | DandyDev wrote:
         | Even without using reserved instance pricing, ECS on EC2 is
         | much cheaper, isn't it? At work we use Hasura, which is written
         | in Haskell and cannot be (easily?) run as a Lambda. Our
         | alternative solution is to run it as a container on ECS. Given
         | that it's a permanently running service, with Fargate we'd pay
         | just to have it sit idle for half of the time, and Fargate is
         | not cheap.
         | 
         | Even when running non-reserved EC2 instances to make up our ECS
         | cluster, it is cheaper than using Fargate.
        
         | jpgvm wrote:
         | I would go as far to say avoid ECS as much as necessary. Using
         | ECS heavily either means using CloudFormation or Terraform
         | heavily, both of which are shitty tools (TF is probably the
         | best tool in it's class, haven't tried Pulumi yet but doesn't
         | stop it from being shit).
         | 
         | Importantly both of which are almost impossible for "normal"
         | developers to use with any level of competency. This leads to 2
         | inevitable outcomes, a) snowflakes and all the special per-app
         | ops work that entails and b) app teams pushing everything back
         | to infrastructure teams because they either don't want to work
         | with TF or can't be granted sufficient permissions to use it
         | effectively.
         | 
         | k8s solves these challenges much more effectively assuming your
         | ops team is capable of setting it up, managing the cluster(s),
         | namespaces, RBAC, etc and any base-level services like
         | external-dns, some ingress provider, cert-manager, etc.
         | 
         | Once you do this then app teams are able to deploy directly,
         | they can use helm (eww but it works) to spin up whatever off-
         | the-shelf software they want and are able to easily write
         | manifests in a way that they can't fuck up horribly as easily.
         | 
         | Best for both teams, ops and devs. Downside? Requires competent
         | ops team (hard to find) and also some amount of taste in
         | tooling (use things like tanka to make manifests less shit),
         | not to mention the time to actually spin all this up in peace
         | without being pushed to do tons of ad-hoc garbage continually
         | (i.e competent org).
         | 
         | So in summary, k8s is generally the right solution for larger
         | orgs because it enforces better split of responsibilities and
         | establishes a powerful (relatively) easy to use API that can
         | support practically everything.
         | 
         | Also in the future there is things like ACK
         | (https://aws.amazon.com/blogs/containers/aws-controllers-
         | for-...) coming which will further reduce the need for app
         | teams to interact with TF or Cloudformation.
        
           | twalla wrote:
           | People focus way too much on the orchestration aspect of
           | Kubernetes when the real draw for larger, more mature
           | organizations is that it provides a good API for doing pretty
           | much anything infrastructure or operations related. You can
           | take the API primitives and build whatever abstractions you
           | want on top of them to match up with how your org deploys and
           | operates software. Once you get into stuff like Open
           | Application Model and start marrying it to operators that can
           | deploy AWS resources and whatever GitOps tool you want to
           | use, you end up with a really nice, consistent interface for
           | developers that removes your infrastructure folks as a
           | bottleneck in the "Hi I want to deploy this new thing
           | everywhere" process.
        
             | jpgvm wrote:
             | Couldn't agree more. It's the model that is good and most
             | of the benefit is derived organisationally rather than
             | technically.
        
       | speedgoose wrote:
       | If I don't want vendor lock-in, or a minimal amount, how should I
       | run containers on AWS?
        
         | InTheArena wrote:
         | EKS + Fargate.
         | 
         | It's that simple. If you need extensions into the AWS
         | infrastructure, check out their CRD extensions that allow you
         | to provision all of the infrastructure using K8s.
        
         | bkanber wrote:
         | Docker or kubernetes or any other orchestration software of
         | your choice on EC2
        
         | glenjamin wrote:
         | If the containerized app you're deploying follows 12 factor
         | principles it's very unlikely that you'll be locked in due to
         | specific functionality
         | 
         | The cost to move your operations expertise to another platform
         | and learn all of its new quirks might be significant though.
        
           | speedgoose wrote:
           | Yes it's significant, that's why I don't like vendor lock-in.
           | It's very simple to use VMs on various cloud providers but as
           | soon as you use their more integrated products, it can be a
           | disaster.
        
         | atomland wrote:
         | Well, to begin with, I think people worry too much about vendor
         | lock-in. Use the tools your cloud vendor provides to make your
         | life easier. Isn't that one of the reasons you chose them?
         | 
         | That said, moving containers to another container orchestrator
         | isn't terribly difficult, so I don't personally worry about
         | vendor lock-in for containerized workloads. If your workloads
         | have dependencies on other vendor-specific services, that's a
         | different story, but basically a container is easy to move
         | elsewhere.
        
           | jayar95 wrote:
           | Vendor lock-in can be a real issue as you scale. I work at a
           | company going through a hypergrowth phase, and we're more or
           | less locked into Azure, which has been objectively awful for
           | us
        
           | speedgoose wrote:
           | My current company has a cluster on Amazon, one on Azure, and
           | used to have one on a local hosting company. So it's very
           | important to not be stuck too much with one cloud providers
           | locks.
           | 
           | We also chose our current cloud providers because the
           | alternatives were worse.
        
       | lysecret wrote:
       | our infra is running on ECS because we set it up right before
       | docker on lambda haha. Now we dont have the time to switch. Would
       | be much better for us though.
        
       | sombremesa wrote:
       | You don't know about lightsail containers? Why make a flowchart
       | that omits one of the simpler solutions in favor of complex ones?
       | 
       | Agreed with the other commenters that this flowchart is not to be
       | recommended, for this and other reasons.
        
         | tyingq wrote:
         | I'd personally omit them, assuming they are subject to the same
         | crazy CPU throttling that Lightsail VPS servers come with. Too
         | much work to predict when you can actually use it, e.g.:
         | https://aws.amazon.com/blogs/compute/proactively-monitoring-...
         | 
         | (Look at where the top of the "sustainable zone" is...5%? Like
         | 1/20th of a vCPU?)
        
       ___________________________________________________________________
       (page generated 2021-11-14 23:02 UTC)