hngopher.com

       [HN Gopher] Making EC2 boot time faster
       ___________________________________________________________________
        
       Making EC2 boot time faster
        
       Author : jacobwg
       Score  : 160 points
       Date   : 2024-05-23 14:31 UTC (8 hours ago)
        
 (HTM) web link (depot.dev)
 (TXT) w3m dump (depot.dev)
        
       | amluto wrote:
       | I don't use EC2 enough to have played with this, but a big part
       | here is the population of the AMI into the per-instance EBS
       | volume.
       | 
       | ISTM one could do much better with an immutable/atomic setup: set
       | up an immutable read-only EBS volume, and have each instance
       | share that volume and have a per-instance volume that starts out
       | blank.
       | 
       | Actually pulling this off looks like it would be limited by the
       | rules of EBS Multi-Attach. One could have fun experimenting with
       | an extremely minimal boot AMI that streams a squashfs or similar
       | file from S3 and unpacks it.
       | 
       | edit: contemplating a bit, unless you are willing to babysit your
       | deployment and operate under serious constraints, EBS multi-
       | attach looks like the wrong solution. I think the right approach
       | would be build a very very small AMI that sets up a rootfs using
       | s3fs or a similar technology and optionally puts an overlayfs on
       | top. Alternatively, it could set up a block device backed by an
       | S3 file and optionally use it as a base layer of a device-mapper
       | stack. There's plenty of room to optimize this.
        
         | mdaniel wrote:
         | I believe they addressed this in their post because one cannot
         | (currently?) `aws ec2 run-instances --volume-id vol-cafebabe`,
         | rather one can only tell AWS what volume parameters to use when
         | they _create_ the root device. Your theory may still be sound
         | about using some kind of super bare bones AMI but there will be
         | no such outcome of  "hey, friend, use this existing EBS as your
         | root volume, don't create a new one"
        
         | stingraycharles wrote:
         | Isn't EBS multi-attach only available for the (very expensive)
         | io1 / io2 volume types?
        
           | amluto wrote:
           | Hmm, it does look like it, although one could carefully use
           | large IO.
           | 
           | But the bigger issue might be durability. Most EBS types have
           | rather low quoted durability, and, for a shared volume like
           | this, that's a problem. Using S3 instead would be better all
           | around except for the smallish engineering effort and
           | deployment effort needed.
           | 
           | Getting a tool like mkosi to generate a boot-from-S3 setup
           | should be straightforward. Converting most any bootable
           | container should also be doable, even automatically.
           | Converting an AMI would involve more heuristics and be more
           | fragile, but it ought to work reliably with most modern Linux
           | distros.
        
         | Szpadel wrote:
         | we used s3fs in production. please don't use it, it's
         | unreliable, unpredictable failure modes, can bring whole
         | instance down. if you really need something like that use
         | rclone mount
        
         | attentive wrote:
         | That's reinventing ebs/ami/snapshots. They are already doing it
         | i.e. data goes lazily from s3 to ebs/ec2.
        
       | maccard wrote:
       | I don't use GHA as some of our code is stored in Perforce, but
       | we've faced the same challenges with EC2 instance startup times
       | on our self managed runners on a different provider.
       | 
       | We would happily pay someone like depot for "here's the AMI I
       | want to run & autoscale, can you please do it faster than AWS?"
       | 
       | We hit this problem with containers too - we'd _love_ to just run
       | all our CI on something like fargate and have it automatically
       | scale and respond to our demand, but the response times and rate
       | limting are just _so slow_ that it means instead we just end up
       | starting/stopping instances with a lambda which feels so 2014.
        
         | CaptainOfCoit wrote:
         | > We would happily pay someone like depot for "here's the AMI I
         | want to run & autoscale, can you please do it faster than AWS?"
         | 
         | Change that to "here's the ISO/IMG I want to run & autoscale,
         | can you please do it faster than AWS?" and you'll have tons of
         | options. Most platforms using Firecracker would most likely be
         | faster, maybe try to use that as a search vector.
        
           | maccard wrote:
           | Can you maybe share some examples? We're fine to use other
           | image formats, but a lot of the value of AWS is that the
           | services interact, IAM works nicely together, etc.
           | 
           | Fly.io comes up often [0] on HN, but there's an overwhelming
           | amount of "it's a nice idea, but it just doesn't work"
           | feedback on it.
           | 
           | [0] https://news.ycombinator.com/item?id=39363499
        
         | everfrustrated wrote:
         | Out of curiosity what CI system are you using with Perforce?
        
           | maccard wrote:
           | We use buildkite with a customised verison of
           | https://github.com/improbable-eng/perforce-buildkite-plugin/
           | 
           | Our game code is in P4, but our backend services are on GH.
           | Having a single CI system means we get easy interop e.g. game
           | updates can trigger backend pipelines and vice versa.
           | 
           | In the past I've used TeamCity, Jenkins, and
           | ElectricCommander(!)
        
         | Szpadel wrote:
         | I'm not fully investigated fargate limitations but I think it
         | would be possible to use any k8s native CI on eks + fargate,
         | maybe even use kubevirt for VM creation? from my exploration of
         | fargate with eks, aws provisioned capacity in around 1s region
        
           | maccard wrote:
           | > AWS offers something very similar to this approach called
           | warm pools for EC2 Auto Scaling. This allows you to define a
           | certain number of EC2 instances inside an autoscaling group
           | that are booted once, perform initialization, then shut down,
           | and the autoscaling group will pull from this pool of compute
           | first when scaling up.
           | 
           | > While this sounds like it would serve our needs,
           | autoscaling groups are very slow to react to incoming
           | requests to scale up. From experimentation, it appears that
           | autoscaling groups may have a slow poll loop that checks if
           | new instances are needed, so the delay between requesting a
           | scale up and the instance starting can exceed 60 seconds. For
           | us, this negates the benefit of the warm pool.
           | 
           | I pulled this from the article, but it's the same problem.
           | Technically yes, eks + fargate works. In practice the
           | response times from "thing added to queue" to "node is
           | responding" is minutes with that setup.
        
       | immibis wrote:
       | There's something to say about building a tower of abstractions
       | and then trying to tear it back down. We used to just run a
       | compiler on a machine. Startup time: 0.001 seconds. Then we'd run
       | a Docker container on a machine. Startup time: 0.01 sections.
       | Fine, if you need that abstraction. Now apparently we're booting
       | full VMs to run compilers - startup time: 5 seconds. But that's
       | not enough, because we're also allocating a bunch of resources in
       | a distributed network - startup time: 40 seconds.
       | 
       | Do we actually need all this stuff, or does it suffice to get one
       | really powerful server (price less than $40k) and run Docker on
       | it?
        
         | cjk2 wrote:
         | I'm mostly just running the (Go) compiler on my laptop which is
         | considerably faster than on docker and considerably cheaper
         | than the server...
         | 
         | I mean an ass end M3 macbook has the same compile time as an
         | i9-14900k. God knows what an equivalent Xeon/Epyc costs...
        
           | immibis wrote:
           | Maybe your container isn't set up right - Docker contains run
           | directly on the host, just partitioned off from accessing
           | stuff outside of themselves with the equivalent of chroot. Or
           | it could be a Mac-specific thing. Docker only works that way
           | on Linux, and has to emulate Linux on other platforms.
        
             | yjftsjthsd-h wrote:
             | Right, they said they're on a macbook so unless they're
             | going out of their way to run Linux bare-metal it has to
             | use a VM. And AIUI there are extra footguns in that
             | situation, especially that mapping volumes from the host is
             | slower because instead of just telling the kernel to make
             | the directory visible you have to actually share from the
             | host to the VM.
             | 
             | See also: https://reece.tech/posts/osx-docker-performance/
             | 
             | See also: https://docs.docker.com/desktop/settings/mac/
             | 
             | > Shared folders are designed to allow application code to
             | be edited on the host while being executed in containers.
             | For non-code items such as cache directories or databases,
             | the performance will be much better if they are stored in
             | the Linux VM, using a data volume (named volume) or data
             | container.
        
             | cjk2 wrote:
             | Why would I use docker? You don't have to use it. I'm just
             | generating static binaries.
             | 
             | Does anyone understand how to do stuff without containers
             | these days?
        
               | skydhash wrote:
               | I'm using VMs these day because of conflicts and
               | inconsistencies between tooling. But the VM is dedicated
               | to one project and I set it up just like a real machine
               | (GUI, browser, and stuff). No file sharing. It's been a
               | blast.
        
               | rfoo wrote:
               | Because you just said:
               | 
               | > which is considerably faster than on docker
               | 
               | And we are curious why it is like so because we not only
               | understand how to do stuff without containers, we _also_
               | understand how containers work and your claim sounds off.
        
               | cjk2 wrote:
               | I don't understand what you are saying.
               | 
               | I'm saying it is slower on docker due to container
               | startup, pulling images, overheads, working out what
               | incantations to run, filesystem access, network
               | weirdness, things talking to other things, configuration
               | required, pull limits, API tokens, all sorts.
               | 
               | Versus "go run"
        
           | benwaffle wrote:
           | reminds me of https://world.hey.com/dhh/we-re-moving-
           | continuous-integratio...
        
             | cjk2 wrote:
             | Yep.
             | 
             | And you usually get lumbered with some shitty thing like
             | github actions which consumes one mortal full time to keep
             | it working, goes down twice a month (yesterday wasn't it
             | this week?), takes bloody forever to build anything and is
             | impossible to debug.
             | 
             | Edit: and MORE YAML HELL!
        
         | mike_hearn wrote:
         | A really powerful server should not cost you anywhere near $40k
         | unless you're renting bare metal in AWS or something like that.
         | 
         | Getting rid of the overhead is possible but hard, unless you're
         | willing to sacrifice things people really want.
         | 
         | 1. Docker. Adds a few hundred msec of startup time to
         | containers, configuration complexity, daemons, disk caches to
         | manage, repositories .... a lot of stuff. In rigorously
         | controlled corp environments it's not needed. You can just have
         | a base OS distro that's managed centrally and tell people to
         | target it. If they're building on e.g. the JVM then Docker
         | isn't adding much. I don't use it on my own companies CI
         | cluster for example, it's just raw TeamCity agents on raw
         | machines.
         | 
         | 2. VMs. Clouds need them because they don't trust the Linux
         | kernel to isolate customers from each other, and they want to
         | buy the biggest machines possible and then subdivide them.
         | That's how their business model works. You can solve this a few
         | ways. One is something like Firecracker where they make a super
         | bare bones VM. Another would be to make a super-hardened
         | version of Linux, so hardened people trust it to provide inter-
         | tenant isolation. Another way would be a clean room kernel
         | designed for security from day one (e.g. written in Rust, Java
         | or C#?)
         | 
         | 3. Drives on a distributed network. Honestly not sure why this
         | is needed. For CI runners entirely ephemeral VMs running off
         | read only root drive images should be fine. They could swap to
         | local NVMe storage. I think the big clouds don't always like to
         | offer this because they have a lot of machines with no local
         | storage whatsoever, as that increases the density and allows
         | storage aggregation/binpacking, which lowers their costs.
         | 
         | Basically a big driver of overheads is that people want to be
         | in the big clouds because it avoids the need to do long term
         | planning or commit capital spend to CI, but the cloud is so
         | popular that providers want to pack everyone in as tightly as
         | possible which requires strong isolation and the need to avoid
         | arbitrary boundaries caused by physical hardware shapes.
        
         | necovek wrote:
         | How do you get Docker container startup time of 0.01s with any
         | real-life workload (yes, I know they are just processes, so you
         | could build a simple "hello world" thing, but I'd be surprised
         | if even that runs this fast)?
         | 
         | Do you have an example image and network config that would
         | demonstrate that?
         | 
         | (I'd love to understand the performance limits of Docker
         | containers, but never played with them deeply enough since they
         | are usually in >1s space which is too slow for me to care)
        
         | iudqnolq wrote:
         | That doesn't solve the same problem.
         | 
         | GitHub actions in the standard setup needs to run untrusted
         | code and so you essentially need a VM.
         | 
         | You can lock it down at the cost of sacrificing features and
         | usability, but that's a tradeoff.
        
       | develatio wrote:
       | Maybe AWS should actually take a look into this. I know comparing
       | AWS to other (smaller) cloud providers is not totally fair given
       | the size of AWS, but for example creating / booting an instance
       | in Hetzner takes a few seconds.
        
         | matt-p wrote:
         | What's size got to do with boot time? Serious question.
        
           | RationPhantoms wrote:
           | More employed eyes on an issue or ability to compensate the
           | best-in-class engineers to take a look.
        
           | CaptainOfCoit wrote:
           | Smaller companies are faster and more nimble than larger
           | corporations.
        
           | develatio wrote:
           | By "the size" I meant to say "the size of the
           | infrastructure", meaning that AWS has to manage orders of
           | magnitude more instances than Hetzner. This might as well
           | contribute to "things" being slower.
        
             | londons_explore wrote:
             | Arguably it can also make things faster. A small provider
             | might need to migrate other instances around to make space
             | for your new instance, whereas a big provider almost
             | certainly can satisfy your request from existing free
             | capacity, and it should therefore be a matter of
             | milliseconds to identify the physical machine your new VM
             | will run on.
        
           | playingalong wrote:
           | Likely they mean that following Conway's law in AWS there are
           | more abstraction layers involved.
        
         | tekla wrote:
         | They have and I know this because I've hammered them on this
         | because we demand thousands of instances to autoscale very
         | aggressively in 1-3 minutes. Very few people give a shit about
         | initialization times. They care more about instance ready times
         | which is constrained by the OS that is running.
        
         | everfrustrated wrote:
         | Hetzner does not offer a network block storage comparable to
         | EBS that can be used as a root (bootable) file system. AWS
         | local-attached ephemeral disk are also immediately available
         | but cannot be seeded with data (same as Hetzner they are wiped
         | clean ahead of boot).
        
           | andersa wrote:
           | This is an advantage. EBS is terrible! Literally orders of
           | magnitude slower than modern SSDs.
        
             | tekla wrote:
             | EBS is great for workloads that dont require SSDs, which
             | most don't.
             | 
             | If it does, you can do provisioned which will get you alot
             | more or go NVME.
        
               | Nextgrid wrote:
               | Even provisioned won't get you the access times of a
               | direct-attached SSD. Speed of light and all that - EBS is
               | using the network under the hood, it's not a direct
               | connection to the host.
        
               | tekla wrote:
               | Yes I know, and? Thats why I mentioned NVME
        
             | stingraycharles wrote:
             | Depends on your definition of slow. Throughput-wise, I
             | think it's fairly decent -- we typically set up 4 EBS
             | volumes in raid0 and get 4GB/sec for a really decent price.
        
               | Nextgrid wrote:
               | Sequential throughput _can_ be fine. Random access is
               | always going to be orders of magnitude slower than a
               | direct-attach disk.
               | 
               | Remember why we switched from spinning hard drives to
               | SSDs? Well EBS is like going back to a spinning drive.
        
         | torginus wrote:
         | It also takes a few seconds on AWS. The guy is comparing
         | setting up a whole new machine from an image, with network and
         | all, to turning on a stopped EC2 instance.
         | 
         | The latter takes a few seconds, the former is presumably
         | longer. This is the great relevation of this blog post.
        
           | dylan604 wrote:
           | wait, restarting a stopped machine is faster than launching
           | an AMI from scracth is a great revelation?
           | 
           | That's like saying waking your MacbookPro is faster than
           | booting from powered off state. Of course it is, and that's
           | precisely why the option exists.
        
             | mdeeks wrote:
             | If you aren't familiar with how EBS works and how volumes
             | are warmed, then yes, this is an interesting blog post. Not
             | everyone is an expert. They become experts by reading
             | things like this and learning.
             | 
             | If you didn't know about this EBS behavior it would be
             | logical to assume that booting from scratch is roughly
             | equivalent to starting/stopping/starting again.
        
             | jpambrun wrote:
             | I think this is unexpected. I expected that once created,
             | my boot volume would have the same performance on the first
             | boot than on the second. It's really not obvious that the
             | volume is really empty and lazily loaded from S3. The
             | proposed work around is also a bit silly: read all blocks
             | one by one even tho maybe 1% of the block have something in
             | them on a new VM. This is actually a revelation.
        
         | attentive wrote:
         | It depends on instance type and OS and can be real short on
         | ec2.
        
       | everfrustrated wrote:
       | It's too bad that EBS doesn't natively support Copy-On-Write.
       | 
       | Snapshots are persisted into S3 (transparently to the user) but
       | it means each new EBS volume spawned doesn't start at full IOPS
       | allocation.
       | 
       | I presume this is due to EBS volumes being specific-AZ so to be
       | able to launch an AMI-seeded EBS volume in any AZ it needs to go
       | via S3 (multi-AZ)
        
         | Twirrim wrote:
         | EBS volumes are "expensive" compared to S3, due to the
         | limitations of what you can do with live block volumes +
         | replicas, vs S3. It takes more disk space to have an image be a
         | provisioned volume ready to be used for copy-on-write, vs
         | having it as something backed up in S3. So the incentives
         | aren't there vs just trying to make the volume creation process
         | as smooth and fast as possible.
         | 
         | I'd guess it's likely that EBS is using a tiered caching
         | system, where they'll keep live volumes around for Copy-on-
         | write cloning for the more popular images/snapshots, with
         | slightly less popular images maybe stored in an EBS cache of
         | some form, before it goes all the way back to S3. You're just
         | not likely to end up getting a live volume level of caching
         | until you hit a certain threshold of launches.
        
       | cmckn wrote:
       | You can enable fast restore on the EBS snapshot that backs your
       | AMI: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-fast-
       | sn...
       | 
       | It's not cheap, but it speeds things up.
        
         | stingraycharles wrote:
         | $540/month per EBS volume per AZ. And it's still fairly
         | limited, at a maximum of 8 credits, it wouldn't nearly cover
         | the use case described in the article (launching 50 instances
         | quickly).
        
       | bingemaker wrote:
       | Curious, how do you measure the time taken for those 4 steps
       | listed in "What takes so long?" section?
        
       | waiwai933 wrote:
       | I believe this is similar to EC2 Fast Launch which is available
       | for Windows AMIs, but I don't know exactly how that works under
       | the hood.
       | 
       | https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/win-a...
        
       | necovek wrote:
       | From a technical perspective, Amazon has actually optimized this
       | but turned that into "serverless functions": their ultra-
       | optimized image paired with Firecracker achieve ultra-fast boot-
       | up of virtual Linux machines. IIRC from when Firecracker was
       | being introduced, they are booting up in sub-second times.
       | 
       | I wonder if Amazon would ever decide to offer booting the same
       | image with the same hypervisor in EC2 as they do for lambdas?
        
         | arianvanp wrote:
         | And AWS now has a product to spin up Lambdas for GitHub Actions
         | CI runners
         | 
         | https://docs.aws.amazon.com/codebuild/latest/userguide/actio...
        
         | cr125rider wrote:
         | Fargate is an alternative that runs on Firecracker as well.
         | It's hidden behind ECS and EKS, however.
        
         | 20thr wrote:
         | 100% -- EC2's general purpose nature is not in my opinion the
         | best fit for ephemeral use-cases. You'll be constantly fighting
         | the infrastructure as the set of trade-offs and design goals
         | are widely different.
         | 
         | This is why CodeSandbox, Namespace, and even fly.io built
         | special-purpose architectures to guarantee extremely start-up
         | time.
         | 
         | In the case of Namespace it's ~2sec on cold boots with a set of
         | user-supplied containers, with storage allocations.
         | 
         | (Disclaimer, I'm with Namespace -- https://namespace.so)
        
       | crohr wrote:
       | > while we can boot the Actions runner within 5 seconds of a job
       | starting, it can take GitHub 10+ seconds to actually deliver that
       | job to the runner
       | 
       | This. I went the same route with regards to boot time
       | optimisations for [1] (cleaning up the AMI, cloud-init, etc.),
       | and can boot a VM from cold in 15s (I can't rely on prewarming
       | pools of machines -- even stopped -- since RunsOn doesn't share
       | machines with multiple clients and this would not make sense
       | economically).
       | 
       | But the time taken by the official runner binary to load and then
       | get assigned a job by GitHub always takes around 8s, which is
       | more than half of the VM boot time :( At some point it would be
       | great if GitHub could give us a leaner runner binary with less
       | legacy stuff, and tailored for ephemeral runners (that, or
       | reverse-engineer the protocol).
       | 
       | [1] https://runs-on.com
        
       | suryao wrote:
       | This is very cool optimization.
       | 
       | I make a similar product offering fast Github actions runners[1]
       | and we've been down this rabbit hole of boot time optimization.
       | 
       | Eventually, we realized that the best solution is to actually
       | build scale. There are two factors in your favor then: 1) Spikes
       | are less pronounced and the workloads are a lot more predictable.
       | 2) The predictability means that you have a decent estimate of
       | the workload to expect at any given time, within reason for
       | maintaining an efficient warm pool.
       | 
       | This enables us to simplify the stack and not have high-
       | maintenance optimizations while delivering great user experience.
       | 
       | We have some pretty heavy use customers that enable us to do
       | this.
       | 
       | [1] https://www.warpbuild.com
        
       | Nextgrid wrote:
       | I don't get why they're using EBS here to begin with. EBS trades
       | off cost and performance for durability. It's slow because it's a
       | network-attached volume that's most likely also replicated under
       | the hood. You use this for data that you need high durability
       | for.
       | 
       | It looks like their use-case fetches all the data it needs from
       | the network (in the form of the GH Actions runner getting the job
       | from GitHub, and then pulling down Docker containers, etc).
       | 
       | What they need is a minimal Linux install (Arch Linux would be
       | good for this) in a squashfs/etc and the only thing in EBS should
       | be an HTTP-aware boot loader like IPXE or a kernel+initrd capable
       | of pulling down the squashfs from S3 and run it from memory.
       | Local "scratchspace" storage for the build jobs can be provided
       | by the ephemeral NVME drives which are also direct-attach and
       | much faster than EBS.
        
         | jedberg wrote:
         | By using EBS they don't have to wait for disk to fill from
         | network on second+ boot.
        
           | Nextgrid wrote:
           | Ah so they are keeping the machines around? Do they need to
           | do that - does the GH runner actually persist anything worth
           | keeping in between runs?
        
             | jedberg wrote:
             | They keep the instances in a "stopped" state, which means
             | keeping the EBS volume around (and paying for it) but not
             | paying for the instance (which could be another machine
             | when turn it back on, which is why you can't load it into
             | scratch space and then stop it).
             | 
             | What's on the EBS is their docker image, so they don't have
             | to load it back up again.
        
               | Nextgrid wrote:
               | Makes sense. I still think it would be cheaper to just
               | reload it from S3 (straight into memory, not using EBS at
               | all) on every boot. The entire OS shouldn't be more than
               | a gigabyte which is quite fast to download as a bulk
               | transfer straight into RAM.
        
               | jedberg wrote:
               | Yes it would be cheaper, but the whole point of this
               | article is trading off cost for faster boot times. They
               | address your points in the article, how it's faster to
               | boot off a warm EBS instead of loading from scratch.
        
       | jedberg wrote:
       | Boot time is the number one factor in your success with auto-
       | scaling. The smaller your boot time, the smaller your prediction
       | window needs to be. Ex. If your boot time is five minutes, you
       | need to predict what your traffic will be in five minutes, but if
       | you can boot in 20 seconds, you only need to predict 20 seconds
       | ahead. By definition your predictions will be more accurate the
       | smaller the window is.
       | 
       | But! Autoscaling serves two purposes. One is to address load
       | spikes. The other is to reduce costs with scaling down. What this
       | solution does is trade off some of the cost savings by prewarming
       | the EBS volumes and then paying for them.
       | 
       | This feels like a reasonable tradeoff if you can justify the cost
       | with better auto-scaling.
       | 
       | And if you're not autoscaling, it's still worth the cost if the
       | trade off is having your engineers wait around for instance
       | boots.
        
         | sfilmeyer wrote:
         | >By definition your predictions will be more accurate the
         | smaller the window is.
         | 
         | Small nit, and this doesn't detract from your points. I don't
         | think this is universally true by definition, even if it is
         | almost always true. You could come up with some rare conditions
         | where your traffic at t+5 minutes is actually easier to predict
         | than at t+20 seconds. Of course, even in that case you're
         | better off (ceteris paribus) being able to spin things up in 20
         | seconds.
        
           | jedberg wrote:
           | I can come up with a lot of examples where it is easier to
           | predict further out[0], but that also means I can predict
           | them 20 seconds out. :)
           | 
           | [0] For example I can tell you exactly when spikes will
           | happen to Netflix's servers on Saturday morning (because the
           | kids all get up at the same time). And I can tell you there
           | will be spikes on the hour during prime time as people shift
           | from linear TV to streaming (or at least they did a lot more
           | 10 years ago!). I can also tell you when spikes to Alexa will
           | be because I already know what times peoples alarms are set
           | for.
        
       | paulddraper wrote:
       | > From a billing perspective, AWS does not charge for the EC2
       | instance itself when stopped, as there's no physical hardware
       | being reserved; a stopped instance is just the configuration that
       | will be used when the instance is started next. Note that you do
       | pay for the root EBS volume though, as it's still consuming
       | storage.
       | 
       | Shutdown standbys absolutely the way to do it.
       | 
       | Does AWS offer anything for this, because it's very tedious to
       | set this up.
        
         | tekla wrote:
         | Warm pools
        
           | paulddraper wrote:
           | yep, that's it, thank you kind person
        
       | mnutt wrote:
       | They talk about the limitations of the EC2 autoscaler and mention
       | calling LaunchInstances themselves, but are there any autoscaler
       | service projects for EC2 ASGs out there? The AWS-provided one is
       | slow (as they mention), annoyingly opaque, and has all kinds of
       | limitations like not being able to use Warm Pools with multiple
       | instance types etc.
        
       | fduran wrote:
       | So I've created ~300k ec2 instances with SadServers and my
       | experience was that starting an ec2 VM from stopped took ~30
       | seconds and creating one from AMI took ~50 seconds.
       | 
       | Recently I decided to actually look at boot times since I store
       | in the db when the servers are requested and when they become
       | ready and it turns out for me it's really bi-modal; some take
       | about 15-20s and many take about 80s, see graph
       | https://x.com/sadservers_com/status/1782081065672118367
       | 
       | Pretty baffled by this (same region, same pretty much
       | everything), any idea why?. Definitively going to try this trick
       | in the article.
        
         | fletchowns wrote:
         | Perhaps in one case you are getting a slice of a machine that
         | is already running, versus AWS powering up a machine that was
         | offline and getting a slice of that one?
        
       ___________________________________________________________________
       (page generated 2024-05-23 23:00 UTC)