[HN Gopher] Flowchart: How should I run containers on AWS?
___________________________________________________________________
Flowchart: How should I run containers on AWS?
Author : kiyanwang
Score : 100 points
Date : 2021-11-14 14:02 UTC (9 hours ago)
(HTM) web link (www.vladionescu.me)
(TXT) w3m dump (www.vladionescu.me)
| pilotneko wrote:
| If you have less than 250 engineers, this guide is not for you?
| Strange.
|
| Edit: Turns out I read it wrong, apologies. This guide is for
| companies with less than 250 engineers.
| bluehatbrit wrote:
| I believe it's actually saying if you have more than 250
| engineers, then the guide is not for you.
| [deleted]
| fivea wrote:
| > If you have less than 250 engineers, this guide is not for
| you? Strange.
|
| I'd guess that if your org is big enough to have 250 engineers
| in its payroll, AWS services are a waste of cash given you can
| deploy better and cheaper. For example, Hetzner has less than
| 200 employees, and it's a global cloud provider.
| hhh wrote:
| I don't think this is true at all. AWS has a lot more in an
| org than "the bill is smaller." There's a restrictive factor
| when hiring if you move away from Google Cloud,Azure, and
| AWS.
| fivea wrote:
| > I don't think this is true at all. AWS has a lot more in
| an org than "the bill is smaller."
|
| The bill is not only astronomically high but it's also
| unpredictable and uncontrollable. Which added valued do you
| believe justifies this?
|
| Meanwhile, keep in mind that the likes of Dropbox gave AWS
| a try but in the end learned from their experience and
| opted to migrate out after only 5 years.
|
| > There's a restrictive factor when hiring if you move away
| from Google Cloud,Azure, and AWS.
|
| I disagree, considering that in AWS you either run vanilla
| VMs from EC2, or containerized solutions in ECS/EKS. Either
| way you are already better off using either your local VMs
| or your company's kubernetes deployment.
| marcinzm wrote:
| >The bill is not only astronomically high but it's also
| unpredictable and uncontrollable.
|
| If you've got 250 engineers then the AWS is going to be
| fairly predictable and well monitored.
|
| >Meanwhile, keep in mind that the likes of Dropbox gave
| AWS a try but in the end learned from their experience
| and opted to migrate out after only 5 years.
|
| Dropbox is memorable because they migrated off which says
| what many other companies don't do.
|
| As you can tell by the massive growth of Azure many large
| enterprise companies definitely see the value despite
| already have their own datacenters.
| fivea wrote:
| > If you've got 250 engineers then the AWS is going to be
| fairly predictable and well monitored.
|
| This is the very first time I hear anyone describe AWS
| services as "fairly predictable and well monitored",
| because there is simply no such thing. There are
| professional services being sold at a premium whose value
| proposition is to make sense of AWS billing and reign it
| in. With AWS, your company simply sets a budget with a
| significant slack and hope for the best.
|
| > Dropbox is memorable because they migrated off which
| says what many other companies don't do.
|
| Not true. Dropbox is memorable because it was depicted as
| a poster child of AWS, which ultimately found out it was
| not worth it at all. And Dropbox is not alone.
|
| https://www.datacenterknowledge.com/archives/2016/03/16/m
| ovi...
|
| https://www.uscloud.com/blog/3-reasons-33-companies-
| ditching...
| xyzzy123 wrote:
| At the high end AWS was a huge win for "traditional
| enterprises" because it broke them free from being held
| hostage by their own internal IT bureaucracies.
|
| Enterprises provided many of the things inside AWS but much,
| much worse.
|
| I actually think the political revolution it (cloud)
| triggered was a bigger thing than the technical and
| engineering changes.
|
| AWS is expensive compared to a machine under your desk but
| not so much compared to the fully loaded cost of (say) a
| VMware VM at a bank DC, which includes multi-year enterprise
| hardware & software commitments to make it go plus staff and
| admin costs.
|
| The DC model choked off innovation that was not approved
| "from the top" and severely damaged agility.
| benjaminwootton wrote:
| Totally agree with this and have made the same point many
| times. The chance to reset the culture, operating model,
| politics and suppliers is the really big opportunity for
| big companies adopting cloud.
| marcinzm wrote:
| I think the people complaining about cloud have never had
| to work in a large enterprise with an internal IT
| department managing a physical datacenter. Even at Google
| people complain about that side of things.
| CSDude wrote:
| This apples to oranges comparision always keeps coming up.
| Comparing AWS to a VPS/Physical Services provider is strange.
| AWS has more services than Hetzner has as employees. AWS
| bandwidth, computation has huge and unreasonable price, but
| there are also superb services in AWS like S3, SQS, DynamoDB,
| Lambda.
|
| All major cloud providers have similar computation pricing.
| As much as I love Hetzner, they are not a major cloud
| provider.
| qaq wrote:
| DynamoDB is a horrible KV
| marcinzm wrote:
| In my experience the productivity cost on a larger company of
| physical servers is massive. The IT department is a
| monopolistic provider of compute services to other
| departments. At the same time IT competes with those
| departments for overall company budget. Needless to say the
| result of these mismatched economic and political incentives
| is a mess.
| tgv wrote:
| That's my guess to, but then the flowchart mentions 25
| requests. Per day!
| GavinAnderegg wrote:
| There's an explanation about that (and the other points in the
| flowchart) in the image shown further down in the blog post.
| You can see it here:
| https://www.vladionescu.me/posts/flowchart-how-should-i-run-...
| OJFord wrote:
| *more than, not less. (And then sibling comments/the annotated
| version explain why if you still think that's strange.)
| evoxmusic wrote:
| You should take a look at qovery.com - quite simple to deploy a
| containerized app on AWS
| fivea wrote:
| Cool link, although I think that in case your deployment package
| is <100MB then it's preferable to use AWS Lambdas by simply
| deploying the zip file archive instead of pushing a container
| image.
| time0ut wrote:
| Last time I tried it (~May 2021), Lambda containers had terrible
| cold start times. Even a node hello world would cold start in 2-3
| seconds. Same code packaged as a ZIP file would cold start in
| less than 1/10th the time. Maybe its better now?
| jayar95 wrote:
| It's been a year since I touched an AWS lambda, but I'd bet
| money that cold starts are still an issue. There is a common
| hack that half works: have your function run every minute (you
| can use eventbridge rules for this); in the function handler,
| the first thing you should evaluate is whether or not it's a
| warming event or not, and exit 0 if it is. Your results may
| vary (mine did lol)
| InTheArena wrote:
| This is awful. It completely ignores the economics of leveraging
| commodity services and re-usable skills while ignoring the true
| places where you can get maximal value if you are willing to
| accept vendor lock in - the high level services.
|
| This is more or less the equivalent of mandating that everyone
| use VMS once UNIX started to commodify, or use Windows for all of
| your servers once Linux has taken over the market. Using EKS +
| Fargate instead of ECS + fargate provides no savings over EKS,
| All it serves to do is to lock you into a single hyper-scaler
| infrastructure, at the same time as K8s is forcing he cloud
| vendors into commoditization.
|
| Want to use AWS effectively? Depend on high level services like
| Glue, Athena, Kinesis Firehose, Sagemaker. Want to piss away any
| chance to run your business effectively? Leverage ECS.
|
| If you are a one-man shop, and you know ECS or are willing to
| depend on underbaked solutions because they solve a problem, more
| power to you. I suspect that you may benefit over your career by
| investing in more universal skill sets (alternative, you may
| benefit from hyper-specializing on AWS toolsets as well(.
| glogla wrote:
| I would very much say it's the other way around.
|
| AWS is good at giving you commodity infrastructure at scale.
| They are really good at "stupid" services like S3 or EC2 or
| RDS. The high-level services meanwhile? I worked with quite a
| few of them and they're mostly shit.
|
| Athena is has account-wide limit of 5 concurrent queries which
| cannot be increased. Even one larger dashboard will overload
| it.
|
| Redshift have the same limit of 30 concurrent queries. That's
| good enough for casual use but not suitable for larger company.
|
| Glue Catalog does not scale at all, and having more than few
| objects will break it, and you will very soon end up begging
| for more API limit due to throttling exceptions.
|
| Kinesis has very strange limits (in messages per second) that
| make it really expensive for use cases where traffic is peaky -
| which is quite a few streaming use cases.
|
| Just like you wouldn't use Amazon.com to buy high-quality,
| important goods (because you're pretty likely to get something
| broken or fake), don't use AWS "high level" services. Amazon is
| company focused on scaling commodity use cases, not at
| engineering excellency.
| pilate wrote:
| I recently had my Athena limit increased to 25 for my
| account, so that's no longer true.
| glogla wrote:
| If they actually can increase the concurrency now, that's
| good news!
|
| We asked for that as well, and at first the support said
| they did that, but turns out they increased the Athena
| queue length instead of concurrency - so the queries
| wouldn't fail (until the queue is full, anyway) but the
| throughput was still the same.
|
| Then the team of AWS account managers and solutions
| architects (who assured us many times that the concurrency
| can be increased) get into a email discussion with the
| support and Athena team, before coming back and telling us
| it cannot be increased. No apology or anything. Recommended
| us to run Presto on EMR instead.
|
| It was a sobering experience in how AWS operates.
| finnh wrote:
| > Amazon is company focused on scaling commodity use cases,
| not at engineering excellency.
|
| I think of it slightly differently. S3 is clearly the product
| of engineering excellency (as is DynamoDb).
|
| But I think there are A teams and B teams at AWS, as you
| would expect from any company so large. And I get the sense
| that at least some of the newer, less proven solutions were
| written by the B teams. This likely includes some k8s-market-
| share-grabby offerings.
| glogla wrote:
| I heard same story told slightly differently - for services
| that process your requests when you go to amazon.com to buy
| stuff, you can expect engineering excellency. That would be
| dynamo, s3, ec2, networking (that whole side of AWS -
| hypervisors, networks, etc., I'd call engineering marvel).
|
| But that would not be software they build "for someone
| else", like Redshift and Glue and Sagemaker and whole lot
| of other stuff. There's a whole lot of half-baked services
| in the ~200 they have.
| larrymyers wrote:
| If you have the desire to understand the individual components of
| running infrastructure for docker containers I'd suggest the full
| hashicorp stack.
|
| Running nomad, vault, consul together isn't difficult, and will
| get you a better understanding how to deploy 12 factor apps with
| good secret storage and service discovery.
|
| Add in Traefik for routing and you've got an equivalent stack to
| k8s, but you can actually understand what each piece is doing and
| scale each component as needed.
|
| If you're going to run this all on AWS you can stick to just EC2
| and not have to drown in documentation for each new abstraction
| AWS launches.
|
| As an added bonus nomad can run far more things than just
| containers, so you have an on-ramp for your legacy apps.
| tyingq wrote:
| I suppose one watch out is to not get too comfortable with the
| _" isn't difficult"_ part. That seemed to be the root cause of
| the extended outage at Roblox. That the _" it all works
| together"_ bit lulled them into not really researching the
| impact of changes.
| amitkgupta84 wrote:
| How do you know that? I've only seen vague stuff from Roblox
| about the outage. Did you read between the lines in one of
| those posts and conclude it had to do with Hashistack
| interoperability? Would love to know what you read that
| implied that.
| tyingq wrote:
| Educated guess, based on things like:
|
| https://news.ycombinator.com/item?id=29063026
|
| Then reading back their vague statements like _" A core
| system in our infrastructure became overwhelmed, prompted
| by a subtle bug in our backend service communications"_ and
| seeing if that matches. It seems to.
|
| But, yes, I'm guessing.
| atomland wrote:
| I run multiple small EKS clusters at a small company. It doesn't
| cost anywhere near $1 million per year, even taking my salary
| into account. If you don't factor in my salary, it's maybe $50k
| per year, and that's for 4 clusters.
|
| Honestly this flowchart is kind of a mess, and I certainly
| wouldn't recommend it to anyone.
| dzikimarian wrote:
| Same here. Multiple java/php applications on EKS. It got much
| better when we found a few guys who focused on resolving the
| issues instead complaining how hard kubernetes is.
| jpgvm wrote:
| Yeah no. ECS is the worst of all worlds, Fargate made it less
| shit, it didn't make it good.
|
| If you don't have the people to do k8s then stick to Lightsail,
| don't do containers poorly just because you can.
|
| Half-assing it will just make everyone miserable and end up with
| a mix-match of "thing that can run on ECS" and "things that can't
| because they need X" where X is really common/useful things like
| stateful volumes (EBS).
| DandyDev wrote:
| My experience with ECS is quite okay. You start a cluster,
| stick a container on it through a task and optionally a service
| + load balancer and it just works.
|
| It doesn't seem any harder than EKS, and it's mostly cheaper.
|
| I also find some comments on this article about vendor lock-in
| dubious, because in the end it's a bunch of containers created
| from Dockerfiles, which you can easily reuse elsewhere.
| peakaboo wrote:
| It's much, much simpler than EKS, that's the huge selling
| point of Fargate.
|
| I've been running containers for many years on Fargate now
| and they literally never fail or give me any problems.
|
| Kubernetes on the other hand, weird issues of complexity all
| over the place. It can do more than Fargate but do you need
| it? If not, skip it.
| dmw_ng wrote:
| ECS can do EBS volumes via the Rex-Ray Docker plugin. Depending
| on how fast you need new instances to come up, installation can
| be a 5-liner in userdata
| CSDude wrote:
| Not a flowchart but has additional inforomation.
| https://www.lastweekinaws.com/blog/the-17-ways-to-run-contai...
| bilalq wrote:
| Yeah, this article is the first thing that came to mind for me
| as well.
|
| I still need to explore how AppRunner compares to CodeBuild for
| this purpose.
| CSDude wrote:
| CodeBuild runs and dies. AppRunner is an HTTP server with
| autoscale based on concurrent requests.
| bilalq wrote:
| But AppRunner also scales down to zero (albeit with reduced
| instead of zero billing).
|
| For use-cases like distributed job scheduling, an
| EventBridge event triggering either an AppRunner request or
| a CodeBuild execution or an ECS task could work. It's still
| a little tricky to figure out what the best choice is.
| DandyDev wrote:
| Super informative article, thanks for that!
|
| It comes to the same conclusions that I intuitively had as
| well:
|
| - EKS if you really have to use k8s - ECS if you have a modicum
| of complexity but still want to keep it simple - AppRunner if
| you just want to run a container
| gavinray wrote:
| I have never heard of AppRunner, thanks for posting this!
|
| I've used Fargate as I thought it was the easiest/cheapest route,
| but according to the notes on chart: "From
| 2020-2021 the best option was/is ECS and Fargate".
| "Maybe from 2023-ish AppRunner will become the best option. It's
| a preview service right now but on the path to be awesome!"
| "It may not totally take over ECS on Fargate, but it will take
| over most common usecases."
|
| And according to this chart, AppRunner apparently is the service
| I ought to be using for most of my apps.
| CSDude wrote:
| AppRunner does not have VPC support since the launch. Otherwise
| great service.
| ignoramous wrote:
| If one is bought into the AWS ecosystem, they could consider
| Lightsail Containers for cheaper Fargate-like devx: https://l
| ightsail.aws.amazon.com/ls/docs/en_us/articles/amaz...
| krinchan wrote:
| Unfortunately, my enterprise scale employer forced me onto ECS on
| EC2 for some of my apps for a very specific reason: Reserved
| instance pricing. I think there's reserved instance-like pricing
| for Fargate now. For one particular set of containers (Java
| application we license from a vendor and then customize with
| plugins somewhat), the CPU and RAM requirements are fairly large
| so the savings of ECS on EC2 with the longest Reserved Instance
| Contract means that I will forever be dealing with the idiocy of
| ASG based Capacity Providers.
|
| For those not in the know, ASG based Capacity Providers are hard
| to work with because they are very immutable so you end up having
| to create-then-delete any changes that touch the capacity
| provider. A capacity provider cannot be deleted if the ASG has
| any instances. Many tools like terraform's AWS provider will
| refuse to delete the ASG till the capacity provider is deleted.
| The terraform provider just cannot properly reason about the
| process of discovering and scaling in the ECS tasks on the
| provider, scaling in the ASG, waiting for 0 instances, and then
| deleting the Capacity Provider. It's honestly beyond how
| providers are supposed to work.
|
| TL;DR: The flow chart is somewhat correct: Do everything in your
| power to run on ECS Fargate. It's mature enough and has excellent
| VPC and IAM support these days. Stay as far away from ECS on EC2
| as you can.
|
| As for EKS, I like it but this company runs on the whole
| "everyone can do whatever" so each team would have to run its own
| EKS cluster. If we had a centralized team providing a base k8s
| cluster with monitoring and what not built in for us to deploy
| on, I'd be more amenable to it. As it stands, I would have to
| learn both the development and ops AND security sides of running
| EKS for a handful of apps. ECS while seeming similar on the
| surface is much simpler and externalizes concepts like ingresses
| and load balancing and persistence into the AWS concepts and
| tooling (CDK, CloudFormation, Terraform) you already know (one
| hopes).
| DandyDev wrote:
| Even without using reserved instance pricing, ECS on EC2 is
| much cheaper, isn't it? At work we use Hasura, which is written
| in Haskell and cannot be (easily?) run as a Lambda. Our
| alternative solution is to run it as a container on ECS. Given
| that it's a permanently running service, with Fargate we'd pay
| just to have it sit idle for half of the time, and Fargate is
| not cheap.
|
| Even when running non-reserved EC2 instances to make up our ECS
| cluster, it is cheaper than using Fargate.
| jpgvm wrote:
| I would go as far to say avoid ECS as much as necessary. Using
| ECS heavily either means using CloudFormation or Terraform
| heavily, both of which are shitty tools (TF is probably the
| best tool in it's class, haven't tried Pulumi yet but doesn't
| stop it from being shit).
|
| Importantly both of which are almost impossible for "normal"
| developers to use with any level of competency. This leads to 2
| inevitable outcomes, a) snowflakes and all the special per-app
| ops work that entails and b) app teams pushing everything back
| to infrastructure teams because they either don't want to work
| with TF or can't be granted sufficient permissions to use it
| effectively.
|
| k8s solves these challenges much more effectively assuming your
| ops team is capable of setting it up, managing the cluster(s),
| namespaces, RBAC, etc and any base-level services like
| external-dns, some ingress provider, cert-manager, etc.
|
| Once you do this then app teams are able to deploy directly,
| they can use helm (eww but it works) to spin up whatever off-
| the-shelf software they want and are able to easily write
| manifests in a way that they can't fuck up horribly as easily.
|
| Best for both teams, ops and devs. Downside? Requires competent
| ops team (hard to find) and also some amount of taste in
| tooling (use things like tanka to make manifests less shit),
| not to mention the time to actually spin all this up in peace
| without being pushed to do tons of ad-hoc garbage continually
| (i.e competent org).
|
| So in summary, k8s is generally the right solution for larger
| orgs because it enforces better split of responsibilities and
| establishes a powerful (relatively) easy to use API that can
| support practically everything.
|
| Also in the future there is things like ACK
| (https://aws.amazon.com/blogs/containers/aws-controllers-
| for-...) coming which will further reduce the need for app
| teams to interact with TF or Cloudformation.
| twalla wrote:
| People focus way too much on the orchestration aspect of
| Kubernetes when the real draw for larger, more mature
| organizations is that it provides a good API for doing pretty
| much anything infrastructure or operations related. You can
| take the API primitives and build whatever abstractions you
| want on top of them to match up with how your org deploys and
| operates software. Once you get into stuff like Open
| Application Model and start marrying it to operators that can
| deploy AWS resources and whatever GitOps tool you want to
| use, you end up with a really nice, consistent interface for
| developers that removes your infrastructure folks as a
| bottleneck in the "Hi I want to deploy this new thing
| everywhere" process.
| jpgvm wrote:
| Couldn't agree more. It's the model that is good and most
| of the benefit is derived organisationally rather than
| technically.
| speedgoose wrote:
| If I don't want vendor lock-in, or a minimal amount, how should I
| run containers on AWS?
| InTheArena wrote:
| EKS + Fargate.
|
| It's that simple. If you need extensions into the AWS
| infrastructure, check out their CRD extensions that allow you
| to provision all of the infrastructure using K8s.
| bkanber wrote:
| Docker or kubernetes or any other orchestration software of
| your choice on EC2
| glenjamin wrote:
| If the containerized app you're deploying follows 12 factor
| principles it's very unlikely that you'll be locked in due to
| specific functionality
|
| The cost to move your operations expertise to another platform
| and learn all of its new quirks might be significant though.
| speedgoose wrote:
| Yes it's significant, that's why I don't like vendor lock-in.
| It's very simple to use VMs on various cloud providers but as
| soon as you use their more integrated products, it can be a
| disaster.
| atomland wrote:
| Well, to begin with, I think people worry too much about vendor
| lock-in. Use the tools your cloud vendor provides to make your
| life easier. Isn't that one of the reasons you chose them?
|
| That said, moving containers to another container orchestrator
| isn't terribly difficult, so I don't personally worry about
| vendor lock-in for containerized workloads. If your workloads
| have dependencies on other vendor-specific services, that's a
| different story, but basically a container is easy to move
| elsewhere.
| jayar95 wrote:
| Vendor lock-in can be a real issue as you scale. I work at a
| company going through a hypergrowth phase, and we're more or
| less locked into Azure, which has been objectively awful for
| us
| speedgoose wrote:
| My current company has a cluster on Amazon, one on Azure, and
| used to have one on a local hosting company. So it's very
| important to not be stuck too much with one cloud providers
| locks.
|
| We also chose our current cloud providers because the
| alternatives were worse.
| lysecret wrote:
| our infra is running on ECS because we set it up right before
| docker on lambda haha. Now we dont have the time to switch. Would
| be much better for us though.
| sombremesa wrote:
| You don't know about lightsail containers? Why make a flowchart
| that omits one of the simpler solutions in favor of complex ones?
|
| Agreed with the other commenters that this flowchart is not to be
| recommended, for this and other reasons.
| tyingq wrote:
| I'd personally omit them, assuming they are subject to the same
| crazy CPU throttling that Lightsail VPS servers come with. Too
| much work to predict when you can actually use it, e.g.:
| https://aws.amazon.com/blogs/compute/proactively-monitoring-...
|
| (Look at where the top of the "sustainable zone" is...5%? Like
| 1/20th of a vCPU?)
___________________________________________________________________
(page generated 2021-11-14 23:02 UTC)