[HN Gopher] WebSockets cost us $1M on our AWS bill
___________________________________________________________________
WebSockets cost us $1M on our AWS bill
Author : tosh
Score : 197 points
Date : 2024-11-06 18:50 UTC (4 hours ago)
(HTM) web link (www.recall.ai)
(TXT) w3m dump (www.recall.ai)
| handfuloflight wrote:
| Love the transparency here. Would also love if the same
| transparency was applied to pricing for their core product.
| Doesn't appear anywhere on the site.
| lawrenceduk wrote:
| It's ok, it's now a million dollars/year cheaper when your
| renewal comes up!
|
| Jokes aside though, some good performance sleuthing there.
| DrammBA wrote:
| I use that as a litmus test when deciding whether to use a
| service: if I can't find a prominently linked pricing page on
| the homepage, I'm out.
| hipadev23 wrote:
| what was the actual cost? cpu?
| cynicalsecurity wrote:
| They are desperately trying to blame anyone except themselves.
| cosmotic wrote:
| Why decode to then turn around and re-encode?
| ketzo wrote:
| I had the same question, but I imagine that the "media
| pipeline" box with a line that goes directly from "compositor"
| to "encoder" is probably hiding quite a lot of complexity
|
| Recall's offering allows you to get "audio, video, transcripts,
| and metadata" from video calls -- again, total conjecture, but
| I imagine they do need to decode into raw format in order to
| split out all these end-products (and then re-encode for a
| video recording specifically.)
| pavlov wrote:
| Reading their product page, it seems like Recall captures
| meetings on whatever platform their customers are using: Zoom,
| Teams, Google Meet, etc.
|
| Since they don't have API access to all these platforms, the
| best they can do to capture the A/V streams is simply to join
| the meeting in a headless browser on a server, then capture the
| browser's output and re-encode it.
| MrBuddyCasino wrote:
| They're already hacking Chromium. If the compressed video
| data is unavailable in JS, they could change that instead.
| moogly wrote:
| They did what every other startup does: put the PoC in
| production.
| pavlov wrote:
| If you want to support every meeting platform, you can't
| really make any assumptions about the data format.
|
| To my knowledge, Zoom's web client uses a custom codec
| delivered inside a WASM blob. How would you capture that
| video data to forward it to your recording system? How do
| you decode it later?
|
| Even if the incoming streams are in a standard format,
| compositing the meeting as a post-processing operation from
| raw recorded tracks isn't simple. Video call participants
| have gaps and network issues and layer changes, you can't
| assume much anything about the samples as you would with
| typical video files. (Coincidentally this is exactly what
| I'm working on right now at my job.)
| Szpadel wrote:
| my guess is either that video they get use some proprietary
| encoding format (js might do some magic on the feed) or it's
| because it's latency optimized stream that consumes a lot of
| bandwidth
| a_t48 wrote:
| Did they consider iceoryx2? From the outside, it feels like it
| fits the bill.
| ComputerGuru wrote:
| I don't mean to be dismissive, but this would have been caught
| very early on (in the planning stages) by anyone that had/has
| experience in system-level development rather than full-stack web
| js/python development. Quite an expensive lesson for them to
| learn, even though I'm assuming they _do_ have the talent
| somewhere on the team if they 're able to maintain a fork of
| Chromium.
|
| (I also wouldn't be surprised if they had even more memory copies
| than they let on, marshalling between the GC-backed JS runtime to
| the GC-backed Python runtime.)
|
| I was coming back to HN to include in my comment a link to
| various high-performance IPC libraries, but another commenter
| already beat me linking to iceoryx2 (though of course they'd need
| to use a python extension).
|
| SHM for IPC has been well-understood as the better option for
| high-bandwidth payloads from the 1990s and is a staple of Win32
| application development for communication between services
| (daemons) and clients (guis).
| Sesse__ wrote:
| It's not even clear why they need a browser in the mix; most of
| these services have APIs you can use. (Also, why fork Chromium
| instead of using CEF?)
| CharlieDigital wrote:
| > I don't mean to be dismissive, but this would have been
| caught very early on (in the planning stages) by anyone that
| had/has experience in system-level development rather than
| full-stack web js/python development
|
| Based on their job listing[0], Recall is using Rust on the
| backend.
|
| [0] https://www.workatastartup.com/companies/recall-ai
| diroussel wrote:
| Sometimes it is more important to work on proving you have a
| viable product and market to sell it in before you optimise.
|
| On the outside we can't be sure. But it's possible that they
| took the right decision to go with a naive implementation
| first. Then profile, measure and improve later.
|
| But yes the hole idea of running a headless web browser to get
| run JavaScript to get access to a video stream is a bit crazy.
| But I guess that's just the world we are in.
| whatever1 wrote:
| Wouldn't also something like redis be an alternative?
| randomdata wrote:
| _> rather than full-stack web js /python development._
|
| The product is not a full-stack web application. What makes you
| think that they brought in people with that kind of experience
| just for this particular feature?
|
| Especially when they claim that they chose that route because
| it was what was most convenient. While you might argue that
| wasn't the right tradeoff, it is a common tradeoff developers
| of all kinds make. "Make It Work, Make It Right, Make It Fast"
| has become pervasive in this industry, for better or worse.
| jgalt212 wrote:
| > But it turns out that if you IPC 1TB of video per second on AWS
| it can result in enormous bills when done inefficiently.
|
| As a point of comparison, how many TB per second of video does
| Netflix stream?
| ffsm8 wrote:
| I don't think that number is as easy to figure out as most
| people think.
|
| Netflix has hardware ISPs can get so they can serve their
| content without saturating the ISPs lines.
|
| There is a statistic floating around that Netflix was
| responsible for 15% of the global traffic 2022/2023, and
| YouTube 12%. If that number is real... That'd be _a lot_ more
| CyberDildonics wrote:
| Actual reality beyond the fake title:
|
| "using WebSockets over loopback was ultimately costing us
| $1M/year in AWS spend"
|
| then
|
| "and the quest for an efficient high-bandwidth, low-latency IPC"
|
| Shared memory. It has been there for 50 years.
| renewiltord wrote:
| That's a good write-up with a standard solution in some other
| spaces. Shared memory buffers are very fast too. It's interesting
| to see them being used here. Nice write up. It wasn't what I
| expected: that they were doing something dumb with API Gateway
| Websockets. This is actual stuff. Nice.
| OptionOfT wrote:
| Did they originally NOT run things on the same machine? Otherwise
| the WebSocket would be local and incur no cost.
| jgauth wrote:
| Did you read the article? It is about the CPU cost of using
| WebSockets to transfer data over loopback.
| kunwon1 wrote:
| I read the entire article and that wasn't my takeaway. After
| reading, I assumed that AWS was (somehow) billing for
| loopback bandwidth, it wasn't apparent (to me) from the
| article that CPU costs were the sticking point
| DrammBA wrote:
| > We set a goal for ourselves to cut this CPU requirement
| in half, and thereby cut our cloud compute bill in half.
|
| From the article intro before they dive into what exactly
| is using the CPU.
| magamanlegends wrote:
| our websocket traffic is roughly 40% of recall.ai and our bill
| was $150 USD this month using a high memory VPS
| nemothekid wrote:
| > _WebSocket would be local and incur no cost._
|
| The memcopys are the cost that they were paying, even if it was
| local.
| akira2501 wrote:
| > A single 1080p raw video frame would be 1080 * 1920 * 1.5 =
| 3110.4 KB in size
|
| They seem to not understand the fundamentals of what they're
| working on.
|
| > Chromium's WebSocket implementation, and the WebSocket spec in
| general, create some especially bad performance pitfalls.
|
| You're doing bulk data transfers into a multiplexed short
| messaging socket. What exactly did you expect?
|
| > However there's no standard interface for transporting data
| over shared memory.
|
| Yes there is. It's called /dev/shm. You can use shared memory
| like a filesystem, and no, you should not be worried about
| user/kernel space overhead at this point. It's the obvious
| solution to your problem.
|
| > Instead of the typical two-pointers, we have three pointers in
| our ring buffer:
|
| You can use two back to back mmap(2) calls to create a ringbuffer
| which avoids this.
| Scaevolus wrote:
| It's pretty funny that they assumed that memory copying was the
| limiting factor when they're pushing a mere 150MB/s around
| instead of the various websocket overheads, then jumped right
| into over-engineering a zero copy ring buffer. I get it, but
| come on!
|
| >50 GB/s of memory bandwidth is common nowadays[1], and will
| basically never be the bottleneck for 1080P encoding. Zero copy
| matters when you're doing something exotic, like Netflix
| pushing dozens of GB/s from a CDN node.
|
| [1]: https://lemire.me/blog/2024/01/18/how-much-memory-
| bandwidth-...
| anonymous344 wrote:
| well someone will feel like an idiot after reading your facts.
| This is why education and experience is important. Not just
| React/rust course and then you are full stack senior :D
| didip wrote:
| I agree with you. The moment they said shared memory, I was
| thinking /dev/shm. Lots of programming languages have libraries
| to /dev/shm already.
|
| And since it behaves like filesystem, you can swap it with real
| filesystem during testing. Very convenient.
|
| I am curious if they tried this already or not and if they did,
| what problems did they encounter?
| Dylan16807 wrote:
| The title makes it sound like there was some kind of blowout, but
| really it was a tool that wasn't the best fit for this job, and
| they were using twice as much CPU as necessary, nothing crazy.
| yapyap wrote:
| > But it turns out that if you IPC 1TB of video per second on AWS
| it can result in enormous bills when done inefficiently.
|
| that's surprising to.. almost no one? 1TBPS is nothing to scoff
| at
| blibble wrote:
| in terms of IPC, DDR5 can do about 50GB/s per memory channel
|
| assuming you're only shuffling bytes around, on bare metal this
| would be ~20 DDR5 channels worth
|
| or 2 servers (12 channels/server for EPYC)
|
| you can get an awful lot of compute these days for not very
| much money
|
| (shipping your code to the compressed video instead of the
| exact opposite would probably make more sense though)
| pyrolistical wrote:
| Terabits vs gigabytes
| turtlebits wrote:
| Is this really an AWS issue? Sounds like you were just burning
| CPU cycles, which is not AWS related. WebSockets makes it sound
| like it was a data transfer or API gateway cost.
| VWWHFSfQ wrote:
| > Is this really an AWS issue?
|
| I doubt they would have even noticed this outrageous cost if
| they were running on bare-metal Xeons or Ryzen colo'd servers.
| You can rent real 44-core Xeon servers for like, $250/month.
|
| So yes, it's an AWS issue.
| JackSlateur wrote:
| You can rent real 44-core Xeon servers for like, $250/month.
|
| Where, for instance ?
| Faaak wrote:
| Hetzner for example. An EPYC 48c (96t) goes for 230 euros
| dilyevsky wrote:
| Hetzner network is complete dog. They also sell you
| machines that are long should be EOL'ed. No serious
| business should be using them
| dijit wrote:
| What cpu do you think your workload is using on AWS?
|
| GCP exposes their cpu models, and they have some Haswell
| and Broadwell lithographies in service.
|
| Thats a 10+ year old part, for those paying attention.
| dilyevsky wrote:
| Most of GCP and some AWS instances will migrate to
| another node when it's faulty. Also disk is virtual. None
| of this applies to baremetal hetzner
| dijit wrote:
| Why is that relevant to what I said?
| dilyevsky wrote:
| Only relevant if you care about reliability
| dijit wrote:
| AWS was working "fine" for about 10 years without live
| migration, and I've had several individual machines
| running without a reboot or outage for quite literally
| half a decade. Enough to hit bugs like this: https://supp
| ort.hpe.com/hpesc/public/docDisplay?docId=a00092...
|
| Anyway, depending on individual nodes to always be up for
| reliability is incredibly foolhardy. Things can happen,
| cloud isn't magic, I've had instances become
| unrecoverable. Though it is rare.
|
| So, I still don't understand the point, that was not
| exactly relevant to what I said.
| tsimionescu wrote:
| I think they meant that Hetzner is offering specific
| machines they know to be faulty and should have EOLd to
| customers, not that they use deprecated CPUs.
| dijit wrote:
| Thats scary if true, any sources? My google-fu is failing
| me. :/
| akvadrako wrote:
| It's not scary, it's part of the value proposition.
|
| I used to work for a company that rented lots of hetzner
| boxes. Consumer grade hardware with frequent disk
| failures was just what we excepted for saving a buck.
| speedgoose wrote:
| I know serious businesses using Hetzner for their
| critical workloads. I wouldn't unless money is tight, but
| it is possible. I use them for my non critical stuff, it
| costs so much less.
| blibble wrote:
| I just cat'ed /proc/cpuinfo on my Hetzner and AWS
| machines
|
| AWS: E5-2680 v4 (2016)
|
| Hetzner: Ryzen 5 (2019)
| GauntletWizard wrote:
| Hetzner: https://www.hetzner.com/dedicated-
| rootserver/#cores_threads_...
| VWWHFSfQ wrote:
| There are many colos that offer dedicated server
| rental/hosting. You can just google for colos in the region
| you're looking for. I found this one
|
| https://www.colocrossing.com/server/dedicated-servers/
| petcat wrote:
| I don't know anything about Colo Crossing (are they a
| reseller?) but I would bet their $60 per month 4-core
| Intel Xeons would outperform a $1,000 per month "compute
| optimized" EC2 server.
| fragmede wrote:
| What benchmark would you like to use?
| petcat wrote:
| This blog is about doing video processing on the CPU, so
| something akin to that.
| phonon wrote:
| For $1000 per month you can get a c8g.12xlarge (assuming
| you use some kind of savings plan).[0] That's 48 cores,
| 96 GB of RAM and 22.5+ Gbps networking. Of course you
| still need to pay for storage, egress etc., but you seem
| to be exaggerating a bit....they do offer a 44 core
| Broadwell/128 GB RAM option for $229 per month, so AWS is
| more like a 4x markup[1]....the C8g would likely be much
| faster at single threaded tasks though[2][3]
|
| [0]https://instances.vantage.sh/aws/ec2/c8g.12xlarge?regi
| on=us-... [1]https://portal.colocrossing.com/register/ord
| er/service/480
| [2]https://browser.geekbench.com/v6/cpu/8305329
| [3]https://browser.geekbench.com/processors/intel-
| xeon-e5-2699-...
| petcat wrote:
| > That's 48 cores
|
| That's not dedicated 48 cores, it's 48 "vCPUs". There are
| probably 1,000 other EC2 instances running on those cores
| stealing all the CPU cycles. You might get 4 cores of
| actual compute throughput. Which is what I was saying
| phonon wrote:
| That's not how it works, sorry. (Unless you use burstable
| instances, like T4g) You can run them at 100% as long as
| you like, and it has the same performance (minus a small
| virtualization overhead).
| petcat wrote:
| Are you telling me that my virtualized EC2 server is the
| only thing running on the physical hardware/CPU? There
| are no other virtualized EC2 servers sharing time on that
| hardware/CPU?
| brazzy wrote:
| Neither the title nor the article are painting it as an AWS
| issue, but as a websocket issue, because the protocol
| implicitly requires all transferred data to be copied multiple
| times.
| turtlebits wrote:
| If you call out your vendor, the issue usually lies with some
| specific issue with them or their service. The title
| obviously states AWS.
|
| If I said that "childbirth cost us 5000 on our <hospital
| name> bill", you assume the issue is with the hospital.
| bigiain wrote:
| I disagree. Like @turtlebits, I was waiting for the part of
| the story where websocket connections between their AWS
| resources somehow got billed at Amazon's internet data egress
| rates.
| anitil wrote:
| I didn't know this - why is this the case?
| londons_explore wrote:
| They are presumably using the GPU for video encoding....
|
| And the GPU for rendering...
|
| So they should instead just be hooking into Chromium's GPU
| process and grabbing the pre-composited tiles from the
| LayerTreeHostImpl[1] and dealing with those.
|
| [1]:
| https://source.chromium.org/chromium/chromium/src/+/main:cc/...
| isoprophlex wrote:
| You'd think so but nope, they deliberately run on CPU, as per
| the article...
| yjftsjthsd-h wrote:
| > We do our video processing on the CPU instead of on GPU, as
| GPU availability on the cloud providers has been patchy in
| the last few years.
|
| I dunno, when we're playing with millions of dollars in costs
| I hope they're at least regularly evaluating whether they
| could at least run _some_ of the workload on GPUs for better
| perf /$.
| londons_explore wrote:
| And their workload is rendering and video encoding. Using
| GPU's should have been where they started, even if it
| limits their choice of cloud providers a little.
| mbb70 wrote:
| They are very explicit in the article that they run everything
| on CPUs.
| orf wrote:
| One of the first parts of the post explains how they are using
| CPUs only
| cogman10 wrote:
| This is such a weird way to do things.
|
| Here they have a nicely compressed stream of video data, so they
| take that stream and... decode it. But they aren't processing the
| decoded data at the source of the decode, so instead they forward
| that decoded data, uncompressed(!!), to a different location for
| processing. Surprisingly, they find out that moving uncompressed
| video data from one location to another is expensive. So, they
| compress it later (Don't worry, using a GPU!)
|
| At so many levels this is just WTF. Why not forward the
| compressed video stream? Why not decompress it where you are
| processing it instead of in the browser? Why are you writing it
| without any attempt at compression? Even if you want lossless
| compression there are well known and fast algorithms like flv1
| for that purpose.
|
| Just weird.
| isoprophlex wrote:
| Article title should have been "our weird design cost us $1M".
|
| As it turns out, doing something in Rust does not absolve you
| of the obligation to actually think about what you are doing.
| dylan604 wrote:
| TFA opening graph "But it turns out that if you IPC 1TB of
| video per second on AWS it can result in enormous bills when
| done inefficiently. "
| rozap wrote:
| Really strange. I wonder why they omitted this. Usually you'd
| leave it compressed until the last possible moment.
| dylan604 wrote:
| > Usually you'd leave it compressed until the last possible
| moment.
|
| Context matters? As someone working in production/post, we
| want to keep it uncompressed until the last possible moment.
| At least as far as no more compression than how it was
| acquired.
| DrammBA wrote:
| > Context matters?
|
| It does, but you just removed all context from their
| comment and introduced a completely different context
| (video production/post) for seemingly no reason.
|
| Going back to the original context, which is grabbing a
| compressed video stream from a headless browser, the
| correct approach to handle that compressed stream is to
| leave it compressed until the last possible moment.
| pavlov wrote:
| Since they aim to support every meeting platform, they
| don't necessarily even have the codecs. Platforms like
| Zoom can and do use custom video formats within their web
| clients.
|
| With that constraint, letting a full browser engine
| decode and composite the participant streams is the only
| option. And it definitely is an expensive way to do it.
| tbarbugli wrote:
| Possibly because they capture the video from xvfb or similar
| (they run a headless browser to capture the video) so at that
| point the decoding already happened (webrtc?)
| bri3d wrote:
| I think the issue with compression is that they're scraping the
| online meeting services rather than actually reverse
| engineering them, so the compressed video stream is hidden
| inside some kind of black box.
|
| I'm pretty sure that feeding the browser an emulated hardware
| decoder (ie - write a VAAPI module that just copies compressed
| frame data for you) would be a good semi-universal solution to
| this, since I don't think most video chat solutions use DRM
| like Widevine, but it's not as universal as dumping the
| framebuffer output off of a browser session.
|
| They could also of course one-off reverse each meeting service
| to get at the backing stream.
|
| What's odd to me is that even with this frame buffer approach,
| why would you not just recompress the video at the edge? You
| could even do it in Javascript with WebCodecs if that was the
| layer you were living at. Even semi-expensive compression on a
| modern CPU is going to be way cheaper than copying raw video
| frames, even just in terms of CPU instruction throughput vs
| memory bandwidth with shared memory.
|
| It's easy to cast stones, but this is a weird architecture and
| making this blog post about the "solution" is even stranger to
| me.
| dbrower wrote:
| How much did the engineering time to make this optimization cost?
| thadk wrote:
| Could Arrow be a part of the shared memory solution in another
| context?
| bauruine wrote:
| FWIW: The MTU of the loopback interface on Linux is 64KB by
| default
| beoberha wrote:
| Classic Hacker News getting hung up on the narrative framing.
| It's a cool investigation! Nice work guys!
| marcopolo wrote:
| Masking in the WebSocket protocol is kind of a funny and sad fix
| to the problem of intermediaries trying to be smart and helpful,
| but failing miserably.
|
| The linked section of the RFC is worth the read: https://www.rfc-
| editor.org/rfc/rfc6455#section-10.3
| jazzyjackson wrote:
| I for one would like to praise the company for sharing their
| failure, hopefully next time someone Googles "transport video
| over websocket" theyll find this thread.
| pier25 wrote:
| Why were they using websockets to send video in the first place?
|
| Was it because they didn't want to use some multicast video
| server?
| trollied wrote:
| >In a typical TCP/IP network connected via ethernet, the standard
| MTU (Maximum Transmission Unit) is 1500 bytes, resulting in a TCP
| MSS (Maximum Segment Size) of 1448 bytes. This is much smaller
| than our 3MB+ raw video frames.
|
| > Even the theoretical maximum size of a TCP/IP packet, 64k, is
| much smaller than the data we need to send, so there's no way for
| us to use TCP/IP without suffering from fragmentation.
|
| Just highlights that they do not have enough technical knowledge
| in house. Should spend the $1m/year saving on hiring some good
| devs.
| karamanolev wrote:
| I fail to see how TCP/IP fragmentation really affects this use
| case. I don't know why it's mentioned and given that there
| aren't multiple network devices with different MTUs it will
| cause issues. Am I right? Is that the lack of technical
| knowledge you're referring to or am I missing something?
| drowsspa wrote:
| Sounds weird that apparently they expected to send 3 MB in a
| single TCP packet
| bcrl wrote:
| Modern NICs will do that for you via a feature called TSO
| -- TCP Segmentation Offload.
|
| More shocking to me is that anyone would attempt to run
| network throughput oriented software inside of Chromium.
| Look at what Cloudflare and Netflix do to get an idea what
| direction they should really be headed in.
| maxmcd wrote:
| Please explain?
| hathawsh wrote:
| Why do you say that? Their solution of using shared memory
| (structured as a ring buffer) sounds perfect for their use
| case. Bonus points for using Rust to do it. How would you do
| it?
|
| Edit: I guess perhaps you're saying that they don't know all
| the networking configuration knobs they could exercise, and
| that's probably true. However, they landed on a more optimal
| solution that avoided networking altogether, so they no longer
| had any need to research network configuration. I'd say they
| made the right choice.
| maxmcd wrote:
| Yes, maybe they're talking about this:
| https://en.wikipedia.org/wiki/TCP_window_scale_option
| adamrezich wrote:
| This reminds me of when I was first starting to learn "real
| game development" (not using someone else's engine)--I was
| using C#/MonoGame, and, while having no idea what I was doing,
| decided I wanted vector graphics. I came across libcairo,
| figured out how to use it, set it all up correctly and
| everything... and then found that, whoops, sending 1920x1080x4
| bytes to your GPU to render, 60 times a second, doesn't exactly
| work--for reasons that were incredibly obvious, in retrospect!
| At least it didn't cost me a million bucks to learn from my
| mistake.
| lttlrck wrote:
| The article reads a like a personal "learn by doing" blog post.
| cperciva wrote:
| _We use atomic operations to update the pointers in a thread-safe
| manner_
|
| Are you sure about that? Atomics are not locks, and not all
| systems have strong memory ordering.
| jpc0 wrote:
| > ... update the pointers ...
|
| Pretty sure ARM and x86 you would be seeing on AWS does have
| strong memory ordering, and has atomic operations that operate
| on something the size of a single register...
| cperciva wrote:
| Graviton has weaker memory ordering than amd64. I know this
| because FreeBSD had a ring buffer which was buggy on
| Graviton...
| Sesse__ wrote:
| Rust atomics, like C++ atomics, include memory barriers (the
| programmer chooses how strong, the compiler/CPU is free to give
| stronger).
| gwbas1c wrote:
| Classic story of a startup taking a "good enough" shortcut and
| then coming back later to optimize.
|
| ---
|
| I have a similar story: Where I work, we had a cluster of VMs
| that were always high CPU and a bit of a problem. We had a lot of
| fire drills where we'd have to bump up the size of the cluster,
| abort in-progress operations, or some combination of both.
|
| Because this cluster of VMs was doing batch processing that the
| founder believed should be CPU intense, everyone just assumed
| that increasing load came with increasing customer size; and that
| this was just an annoyance that we could get to after we made one
| more feature.
|
| But, at one point the bean counters pointed out that we spent
| disproportionately more on cloud than a normal business did.
| After one round of combining different VM clusters (that really
| didn't need to be separate servers), I decided that I could take
| some time to hook up this very CPU intense cluster up to a
| profiler.
|
| I thought I was going to be in for a 1-2 week project and would
| follow a few worms. Instead, the CPU load was because we were
| constantly loading an entire table, that we never deleted from,
| into the application's process. The table had transient data that
| should only last a few hours at most.
|
| I quickly deleted almost a decade's worth of obsolete data from
| the table. After about 15 minutes, CPU usage for this cluster
| dropped to almost nothing. The next day we made the VM cluster a
| fraction of its size, and in the next release, we got rid of the
| cluster and merged the functionality into another cluster.
|
| I also made a pull request that introduced a simple filter to the
| query to only load 3 days of data; and then introduced a
| background operation to clean out the table periodically.
| alsetmusic wrote:
| As much as you can say (perhaps not hard numbers, but as a
| percentage), what was the savings to the bottom line / cloud
| costs?
| gwbas1c wrote:
| Probably ~5% of cloud costs. Combined with the prior round of
| optimizations, it was substantial.
|
| I was really disappointed when my wife couldn't get the night
| off from work when the company took everyone out to a fancy
| steak house.
| chgs wrote:
| So you saved the company $10k a month and got a $200 meal
| in gratitude? Awesome.
| antisthenes wrote:
| 99% of the time, it's either a quadratic (or exponential)
| algorithm or a really bad DB query.
| wiml wrote:
| > One complicating factor here is that raw video is surprisingly
| high bandwidth.
|
| It's weird to be living in a world where this is a _surprise_ but
| here we are.
|
| Nice write up though. Web sockets has a number of nonsensical
| design decisions, but I wouldn't have expected that _this_ is the
| one that would be chewing up all your cpu.
| handfuloflight wrote:
| > It's weird to be living in a world where this is a surprise
| but here we are.
|
| I think it's because the cost of it is so abstracted away with
| free streaming video all across the web. Once you take a look
| at the egress and ingress sides you realize how quickly it adds
| up.
| arccy wrote:
| I think it's just rare for a lot of people to be handling raw
| video. Most people interact with highly efficient (lossy)
| codecs on the web.
| carlhjerpe wrote:
| I was surprised when calculating and sizing the shared memory
| for my Gaming VM for use with "Looking-Glass". At 165hz 2k HDR
| it's many gigabytes per second, that's why HDMI and DisplayPort
| is specced really high
| sensanaty wrote:
| I always knew video was "expensive", but my mark for what
| expensive meant was a good few orders of magnitude off when I
| researched the topic for a personal project.
|
| I can easily imagine the author being in a similar boat,
| knowing that it isn't cheap, but then not realizing that
| expensive in this context truly does mean expensive until they
| actually started seeing the associated costs.
| IX-103 wrote:
| Chromium already has a zero-copy IPC mechanism that uses shared
| memory built-in. It's called Mojo. That's how the various browser
| processes talk to each other. They could just have passed
| mojo::BigBuffer messages to their custom.process and not had to
| worry about platform-specific code.
|
| But writing a custom ring buffer implementation is also nice, I
| suppose...
| cyberax wrote:
| Egress fees strike again.
| sfink wrote:
| ...and this is why I will never start a successful business.
|
| The initial approach was shipping _raw video_ over a _WebSocket_.
| I could not imagine putting something like that together and
| selling it. When your first computer came with 64KB in your
| entire machine, some of which you can 't use at all and some you
| can't use without bank switching tricks, it's really really hard
| to even conceive of that architecture as a possibility. It's a
| testament to the power of today's hardware that it worked at all.
|
| And yet, it did work, and it served as the basis for a successful
| product. They presumably made money from it. The inefficiency
| sounds like it didn't get in the way of developing and iterating
| on the rest of the product.
|
| I can't do it. Premature optimization may be the root of all
| evil, but I can't work without having _some_ sense for how much
| data is involved and how much moving or copying is happening to
| it. That sense would make me immediately reject that approach. I
| 'd go off over-architecting something else before launching, and
| somebody would get impatient and want their money back.
| apitman wrote:
| I've been toying around with a design for a real-time chat
| protocol, and was recently in a debate of WebSockets vs HTTP long
| polling. This should give me some nice ammunition.
| pavlov wrote:
| No, this story is about interprocess communication on a single
| computer, it has practically nothing to do with WebSockets vs
| something else over an IP network.
___________________________________________________________________
(page generated 2024-11-06 23:00 UTC)