hngopher.com

       [HN Gopher] Building the heap: racking 30 petabytes of hard driv...
       ___________________________________________________________________
        
       Building the heap: racking 30 petabytes of hard drives for
       pretraining
        
       Author : nee1r
       Score  : 223 points
       Date   : 2025-10-01 15:00 UTC (7 hours ago)
        
 (HTM) web link (si.inc)
 (TXT) w3m dump (si.inc)
        
       | g413n wrote:
       | No mention of disk failure rates? curious how it's holding up
       | after a few months
        
         | ClaireBookworm wrote:
         | good point
        
         | bayindirh wrote:
         | The disk failure rates are very low when compared to decade
         | ago. I used to change more than a dozen disks every week a
         | decade ago. Now it's an eyebrow raising event which I seldom
         | see.
         | 
         | I think following Backblaze's hard disk stats is enough at this
         | point.
        
           | gordonhart wrote:
           | Backblaze reports an annual failure rate of 1.36% [0]. Since
           | their cluster uses 2,400 drives, they would likely see ~32
           | failures a year (extra ~$4,000 annual capex, almost
           | negligible).
           | 
           | [0] https://www.backblaze.com/cloud-storage/resources/hard-
           | drive...
        
             | joering2 wrote:
             | Their rate will probably be higher since they are utilizing
             | used drives. From the spec:
             | 
             | 2,400 drives. Mostly 12TB used enterprise drives (3/4 SATA,
             | 1/4 SAS). The JBOD DS4246s work for either.
        
               | antisthenes wrote:
               | Not necessarily, since disk failures are typically
               | U-shaped.
               | 
               | Buying used drives eliminates the high rate of early
               | failure (but does get you a bit closer to the 2nd part of
               | the U-curve).
               | 
               | Typically most drives would become more obsolete before
               | hitting the high failure rate of the right side of the
               | U-curve from longevity (7+ years)
        
         | cjaackie wrote:
         | They mentioned the cluster being used enterprise drives, I can
         | see the desire to save money but agree, that is going to be one
         | expensive mistake down the road.
         | 
         | I should also note personally for home cluster use, I learned
         | quickly that used drives didn't seem to make sense. Too much
         | performance variability.
        
           | g413n wrote:
           | in a datacenter context failure rates are just a remote-hands
           | recurring cost so it's not too bad with front-loaders
           | 
           | e.g. have someone show up to the datacenter with a grocery
           | list of slot indices and a cart of fresh drives every few
           | months.
        
           | guywithahat wrote:
           | Used drives make sense if maintaining your home server is a
           | hobby. It's fun to diagnose and solve problem in home
           | servers, and failing drives give me a reason to work on the
           | server. (I'm only half-joking, it's kind of fun)
        
           | jms55 wrote:
           | If I remember correctly, most drives either:
           | 
           | 1. Fail in the first X amount of time
           | 
           | 2. Fail towards the end of their rated lifespan
           | 
           | So buying used drives doesn't seem like the worst idea to me.
           | You've already filtered out the drivers that would fail
           | early.
           | 
           | Disclaimer: I have no idea what I'm talking about
        
             | g413n wrote:
             | we don't have perfect metrics here but this seems to match
             | our experience; a lot of failures happened shortly after
             | install before the bulk of the data download onto the heap,
             | so actual data loss is lower than hardware failure rates
        
               | frakkingcylons wrote:
               | [delayed]
        
             | dboreham wrote:
             | Over in hardware-land we call this "the bathtub curve".
        
         | dylan604 wrote:
         | I've mentioned this story before, but we had massive drive
         | failures when bringing up multiple disk arrays. We get them
         | racked on a friday afternoon, and then I wrote a quick and
         | dirty shell script to read/write data back and forth between
         | them over the weekend that was to kick in after they finished
         | striping the raid arrays. By quick and dirty I mean there was
         | no logging, and just a bunch of commands saved as .sh. Came in
         | on Monday to find massive failures in all of the arrays, but no
         | insight into when they failed during the stripe or during
         | stressing them. It was close to 50% failure rate. Turned out to
         | be a bad batch from the factory. Multiple customers of our
         | vendor were complaining. All the drives were replaced by the
         | manufacturer. It just delayed the storage being available to
         | production. After that, not one of them failed in the next 12
         | months before I left for another job.
        
           | jeffrallen wrote:
           | > next 12 months before I left for another job
           | 
           | Heh, that's a clever solution to the problem of managing
           | storage through the full 10 year disk lifecycle.
        
       | ClaireBookworm wrote:
       | great write up, really appreciate the explanations / showing the
       | process
        
       | nharada wrote:
       | So how do they get this data to the GPUs now...? Just run it over
       | the public internet to the datacenter?
        
         | bayindirh wrote:
         | They can rent a dark fiber for themselves for that distance,
         | and it'll be cheap.
         | 
         | However, as they noted they use 100gbps capacity from their
         | ISP.
        
           | nee1r wrote:
           | We want to get darkfiber from the datacenter to the office. I
           | love 100Gbps
        
             | dylan604 wrote:
             | I'm now envisioning a poster with a strand of fiber wearing
             | aviators with large font size Impact font reading Dark
             | Fiber with literal laser beams coming out of the eyes.
        
           | geor9e wrote:
           | Does San Francisco really still have dark fiber? That 90s
           | bubble sure did overshoot demand.
        
             | madsushi wrote:
             | DWDM tech improvements have outpaced nearly every other
             | form of technology growth, so the same single pair of fiber
             | that used to carry 10 Mbps can now carry 20 Tbps, which is
             | a 2,000,000x multiplier. The same somewhat-fixed supply of
             | fiber can go a very long way today, so the price pressure
             | for access is less than you might expect.
        
             | dpe82 wrote:
             | I think these days folks say "dark fiber" for any kind of
             | connection you buy. It bothers me too.
        
               | bayindirh wrote:
               | I meant a "single mode, non terminated fiber optic cable
               | from point to point". In other words, your own cable
               | without any other traffic on it.
               | 
               | A shared one will be metro Ethernet in my parlance.
        
         | g413n wrote:
         | 7.5k for zayo 100gig so that's like half of the MRC
        
         | nee1r wrote:
         | yeah, exactly! we have a 100G uplink, and then we use nginx
         | secure links that we then just curl from the machines using
         | HTTP. (funnily HTTPS adds overhead so we just pre-sign URLs)
        
       | miniman1337 wrote:
       | Used Disks, No DR, not exactly a real shoot out.
        
         | nee1r wrote:
         | True, though this is specifically for pretraining data (S3
         | wouldn't sell us used disk + no DR storage).
        
           | p_ing wrote:
           | You're in a seismically active part of the world. Will the
           | venture last in a total loss scenario?
        
             | nee1r wrote:
             | We're currently 1/1 for the recent 4.3 magnitude earthquake
             | (though if SF crumbles we might lose data)
        
               | p_ing wrote:
               | 4.3 is a baby quake. I'd hope that you'd be 1/1!
        
             | antonkochubey wrote:
             | They spent $300,000 on drives, with AWS they would have
             | spent 4x that PER MONTH. They're already ahead of the
             | cloud.
        
               | p_ing wrote:
               | AWS/cloud doesn't factor into my question what so ever.
               | Loss of equipment is one thing. Loss of all data is quite
               | a different story.
        
           | Sanzig wrote:
           | I do appreciate the scrappiness of your solution. Used drives
           | for a storage cluster is like /r/homelab on steroids. And
           | since it's pretraining data, I suppose data integrity isn't
           | critical.
           | 
           | Most venture-backed startups would have just paid the AWS or
           | Cloudflare tax. I certainly hope your VCs appreciate how
           | efficient you are being with their capital :)
        
             | g413n wrote:
             | worth stressing that we literally could not afford
             | pretraining without this, approx our entire seed round
             | would go into cloud storage costs
        
       | leejaeho wrote:
       | how long do you think it'll be before you fill all of it and have
       | to build another cluster LOL
        
         | nee1r wrote:
         | Already filled up and looking to possibly copy and paste :)
        
           | giancarlostoro wrote:
           | So, others have asked, and I'm curious myself are you
           | sourcing the videos yourselves or third parties?
        
             | tomas789 wrote:
             | My guess would be they are running some dummy app like
             | quote of the day or something and it records the screen at
             | 1fps or so.
        
       | not--felix wrote:
       | But where do you get 90 million hours worth of video data?
        
         | myflash13 wrote:
         | And not just any video data, they specifically mentioned screen
         | recordings for agentic computer uses. A very specific kind of
         | video. My guess is they have a partnership with someone like
         | Rewind.ai
        
         | conception wrote:
         | Arrr matey
        
       | mschuster91 wrote:
       | Shows how crazy cheap on prem can be. _tips hat_
        
         | nee1r wrote:
         | _tips hat back_
        
         | stackskipton wrote:
         | Not included is overhead of dealing with maintenance. S3/R2
         | generally don't require OPS type dedicated to care and feeding.
         | This type of setup will likely require someone to spend 5 hours
         | a week dealing with it.
        
           | mschuster91 wrote:
           | I once had about three racks full of servers under my
           | control, admittedly they weren't a ton of disks, but still
           | the hardware maintenance effort was pretty much negligible
           | over a few years (until it all went to the cloud).
           | 
           | The majority of server wrangling work I spent dealing with OS
           | updates and, most annoyingly, OpenStack. But that's something
           | you can't escape even if you run your stuff in the cloud...
        
             | stackskipton wrote:
             | With S3/R2 whatever, you do get away from it. You dump a
             | bunch of files on them and then retrieve them. OS Updates,
             | Disk Failures, OpenStack, additional hardware? Pssh, that's
             | S3 company problem, not yours.
             | 
             | $LastJob we ran a ton of Azure Web App Containers, alot of
             | OS work no longer existed so it's possible with Cloud to
             | remove alot of OS toil.
        
           | nee1r wrote:
           | True, this is a large reason why we chose to have the
           | datacenter a couple blocks away from the office.
        
           | hanikesn wrote:
           | Why 5h a week? Just for hardware?
        
             | datadrivenangel wrote:
             | 5h a week is basically 3 days a month. So if you have an
             | issue that takes a couple of days per month to fix, which
             | seems very fair, you're at that point.
        
           | dpe82 wrote:
           | a) 5hrs/week is negligible compared to that potential AWS
           | bill.
           | 
           | b) The seem tolerant of failures so it's not going to be
           | anything like 5hrs/week of physical maintenance. It will be
           | bursty though (eg. box died, time to replace it...) but
           | assuming they have spares of everything sitting around /
           | already racked it shouldn't be a big deal.
        
         | buckle8017 wrote:
         | And this is actually relatively expensive.
        
       | g413n wrote:
       | the doodles are great
        
         | nee1r wrote:
         | Thanks! Lots of hard work went into them.
        
       | zparky wrote:
       | $125/disk, 12k/mo depreciation cost which i assume means disk
       | failures, so ~100 disks/mo or 1200/yr, which is half of their
       | disks a year - seems like a lot.
        
         | devanshp wrote:
         | no, we wanted to be conservative by depreciating somewhat more
         | aggressively than that. we have much closer to 5% yearly disk
         | failure rates.
        
         | AnotherGoodName wrote:
         | It's an accounting term. You need to report the value of assets
         | of your company each reporting cycle. This allows you to report
         | company profit more accurately since the 2400 drives aren't
         | likely not worth what the company originally paid. It's stated
         | as a tax write-off but people get confused with that term (they
         | think X written off == X less tax paid). It's better to
         | correctly state it as a way to more accurately report profit
         | (which may end up with less company tax paid but obviously not
         | 1:1 since company tax is not 100%).
         | 
         | So anyway you basically pretend you resold the drives today.
         | Here they are assuming in 3 years time no one will pay anything
         | for the drives. Somewhat reasonable to be honest since the
         | setup's bespoke and you'll only get a fraction of the value of
         | 3 year old drives if you resold them.
        
           | zparky wrote:
           | oh i see, thanks! i might be too used to reading backblaze
           | reports :p
        
       | ttfvjktesd wrote:
       | The biggest part that is always missing in such comparisons is
       | the employee salaries. In the calculation they give $354k/year of
       | total cost per year. But now add the cost of staff in SF to
       | operate that thing.
        
         | g413n wrote:
         | someone has to go and power-cycle the machines every couple
         | months it's chill, that's the point of not using ceph
        
           | paxys wrote:
           | So the drives are never going to fail? PSUs are never going
           | to burn out? You are never going to need to procure new
           | parts? Negotiate with vendors?
        
             | buckle8017 wrote:
             | They mention data loss is acceptable, so im guessing
             | they're only fixing big outages.
             | 
             | Ignoring failed hdds week likely mean very little
             | maintenance.
        
             | theideaofcoffee wrote:
             | This concern troll that everyone trots out when anyone
             | brings up running their own gear is just exhausting. The
             | hyperscalers have melted people's brains to a point where
             | they can't even fathom running shit for themselves.
             | 
             | Yes, drives are going to fail. Yes, power supplies are
             | going to burn out. Yes, god, you're going to get new parts.
             | Yes, you will have to actually talk to vendors.
             | 
             | Big. Deal. This shit is -not- hard.
             | 
             | For the amount of money you save by doing it like that, you
             | should be clamoring to do it yourself. The concern trolling
             | doesn't make any sort of argument against it, it just makes
             | you look lazy.
        
               | immibis wrote:
               | Very good point. There was something on the HN front page
               | like this about self-hosted email, too.
               | 
               | I point out to people that AWS is between _ten_ to _one
               | hundred_ times more expensive than a normal server. The
               | response is  "but what if I only need it to handle peak
               | load three hours a day?" _Then you still come out ahead
               | with your own server._
               | 
               | We have multiple colo cages. We handle enough traffic -
               | terabytes per second - that we'll never move those to
               | cloud. Yet management always wants more cloud. While
               | simultaneously complaining about how we're not making
               | enough money.
        
               | ranger_danger wrote:
               | I don't think the answer is so black-and-white. IMO This
               | only realistically applies to larger companies or ones
               | that either push lots of traffic or have a need for large
               | amounts of compute/storage/etc.
               | 
               | But for smaller groups that don't have large/sustained
               | workloads, I think they can absolutely save money
               | compared to colo/dedicated servers using one of multiple
               | different kinds of AWS services.
               | 
               | I have several customers that coast along just fine with
               | a $50/mo EC2 instance or less, compared to hundreds per
               | month for a dedicated server... I wouldn't call that "ten
               | times" by any stretch.
        
           | ttfvjktesd wrote:
           | You are under the assumption that only Ceph (and similar
           | complex software) requires staff, whereas plain 30 PB can be
           | operated basically just by rebooting from time to time.
           | 
           | I think that anyone with actual experience of operating
           | thousands of physical disks in datacenters would challenge
           | this assumption.
        
             | devanshp wrote:
             | we have 6 months of experience operating thousands of
             | physical disks in datacenters now! it's about a couple
             | hours a month of employee time in steady-state.
        
               | ttfvjktesd wrote:
               | How about all the other infrastructure. Since you are
               | obviously not using the cloud, you must have massive
               | amounts of GPUs and operating systems. All of that has
               | been working together, it's not just keep watching for
               | the physical disks and all is set.
               | 
               | Don't get me wrong, I buy the actual numbers regarding
               | hardware costs, but in addition to that presenting the
               | rest as basically a one man show in terms of maintenance
               | hours is the point where I'm very sceptical.
        
               | g413n wrote:
               | oh we use cloud gpus, infiniband h100s absolutely aren't
               | something we want to self-host. not aws tho, they're
               | crazy overpriced; mithril and sfcompute!
               | 
               | we also use cloudflare extensively for everything that
               | isn't the core heap dataset, the convenience of buckets
               | is totally worth it for most day-to-day usage.
               | 
               | the heap is really _just_ the main pretraining corpus and
               | nothing else.
        
               | ttfvjktesd wrote:
               | How is it going to work when the GPU is in the cloud and
               | the storage is miles away in a local colo in SF down the
               | street? I was under the impression that the GPUs has to
               | go multiple times over the training dataset, which means
               | transfer 30 PB multiple times in and out of the clouds.
               | Is the data link even fast enough? How much are you
               | charged for data transfer fees.
        
           | datadrivenangel wrote:
           | Assuming that they end up hiring a full time ops person at
           | 500k annually total costs (250k base for a data center
           | wizard), then that's 42k extra a month, or ~$70k. Still 200k
           | per month lower than their next best offering.
        
             | Symbiote wrote:
             | It's really not necessary.
             | 
             | I have four racks rather than ten, and less storage but
             | more compute. All purchased new from HP with warranties.
             | 
             | Ordering each year takes a couple of days work. Racking
             | that takes one or two.
             | 
             | Initial setup (seeing differences with a new generation of
             | server etc and customizing Ubuntu autoinstallation) is done
             | in a day.
             | 
             | So that's a week per year for setup.
             | 
             | If we are really unlucky, add another week for a strange
             | failure. (This happened once in the 10 years I've been
             | doing this, a CPU needed replacement by the HP engineer.)
             | 
             | I replaced a couple of drives in July, and a network fibre
             | transceiver in May.
        
         | 827a wrote:
         | The biggest part missing from the opposing side is: Their view
         | is very much rooted in the pre-Cloud hardware infrastructure
         | world, where you'd pay sysadmins a full salary to sit in a dark
         | room to monitor these servers.
         | 
         | The reality nowadays is: the on-prem staff is covered in the
         | colo fees, which is split between everyone coloing in the
         | location and reasonably affordable. The software-level work
         | above that has massively simplified over the past 15 years, and
         | effectively rivals the volume of work it would take to run
         | workloads in the cloud (do you think managing IAM and Terraform
         | is free?)
        
           | ttfvjktesd wrote:
           | > do you think managing IAM and Terraform is free?
           | 
           | No, but I would argue that a SaaS offering, where the whole
           | maintenance of the storage system is maintained for you
           | actually requires less maintenance hours than hosting 30 PB
           | in a colo.
           | 
           | In terraform you define the S3 bucket and run terraform
           | apply. Afterwards the company's credit card is the limit.
           | Setting up and operating 30 PB yourself is an entirely
           | different story.
        
           | g413n wrote:
           | yeah colo help has been great, we had a power blip and
           | without any hassle they covered the cost and installation of
           | UPSes for every rack, without us needing to think abt it
           | outside of some email coordination.
        
         | Aurornis wrote:
         | Small startup teams can sometimes get away with datacenter
         | management being a side task that gets done on an as-needed
         | basis at first. It will come with downtime and your stability
         | won't be anywhere near as good as Cloudflare or AWS no matter
         | how well you plan, though.
         | 
         | Every real-world colocation or self-hosting project I've ever
         | been around has underestimate their downtime and rate of
         | problems by at least an order of magnitude. The amount of time
         | lost to driving to the datacenter, waiting for replacement
         | parts to arrive, and scrambling to patch over unexpected
         | failure modes is always much higher than expected.
         | 
         | There is a false sense of security that comes in the early days
         | of the project when you think you've gotten past the big issues
         | and developed a system that's reliable enough. The real test is
         | always 1-2 years later when teams have churned, systems have
         | grown, and the initial enthusiasm for playing with hardware has
         | given way to deep groans whenever the team has to draw straws
         | to see who gets to debug the self-hosted server setup this time
         | or, worse, drive to the datacenter again.
        
           | calvinmorrison wrote:
           | > The amount of time lost to driving to the datacenter,
           | waiting for replacement parts to arrive, and scrambling to
           | patch over unexpected failure modes is always much higher
           | than expected.
           | 
           | I don't have this experience at all. Our colo handled almost
           | all work. the only time i ever went to the server farm was to
           | build out whole new racks. Even replacing servers the colo
           | handled for us at good cost.
           | 
           | Our reliability came from software not hardware, though of
           | course we had hundreds of spares sitting by, the defense in
           | depth (multiple datacenters, each datacenter having 2
           | 'brains' which could hotswap, each client multiply backed up
           | on 3-4 machines)...
           | 
           | servers going down were fairly common place, servers dying
           | were commonplace. i think once we had a whole rack outage
           | when the switch died, and we flipped it to the backup.
           | 
           | Yes these things can be done and a lot cheaper than paying
           | AWS.
        
             | Aurornis wrote:
             | > Our reliability came from software not hardware, though
             | of course we had hundreds of spares sitting by, the defense
             | in depth (multiple datacenters, each datacenter having 2
             | 'brains' which could hotswap, each client multiply backed
             | up on 3-4 machines)...
             | 
             | Of course, but building and managing the software stack,
             | managing hundreds of spares across locations, spanning
             | across datacenters, having a hotswap backup system is not a
             | simple engineering endeavor.
             | 
             | The only way to reach this point is to invest a very large
             | amount of time into it. It requires additional headcount or
             | to put other work on pause.
             | 
             | I was trying to address the type of buildout in this
             | article: Small team, single datacenter, gets the job done
             | but comes with tradeoffs.
             | 
             | The other type of self buildout that you describe is ideal
             | when you have a larger team and extra funds to allocate to
             | putting it all together, managing it, and staffing it.
             | However, once you do that it's not fair to exclude the cost
             | of R&D and the ongoing headcount needs.
             | 
             | It's tempting to sweep it under the rug and call it part of
             | the overall engineering R&D budget, but there is no
             | question a large cost associated with what you described as
             | opposed to spinning up an AWS or Cloudflare account and
             | having access to your battle-tested storage system a few
             | minutes later.
        
               | g413n wrote:
               | not caring about redundancy/reliability is really nice,
               | each healthy HDD is just the same +20TB of pretraining
               | data and every drive lost is the same marginal cost.
        
               | jeffrallen wrote:
               | When you lose 20 TB of video, where do you get 20 TB of
               | new video to replace it?
        
               | wongarsu wrote:
               | To be fair, what's described here is much more robust
               | than what you get with a simple AWS setup. At a minimum
               | that's a multi-region setup, but if the DCs have
               | different owners I'd even compare it to a multi-cloud
               | setup.
        
           | g413n wrote:
           | fwiw our first test rack has been up for about a year now and
           | the full cluster has been operational for training for the
           | past ~6 months. having it right down the block from our
           | office has been incredibly helpful, I am a bit worried abt
           | what e.g. freemont would look like if we expand there.
           | 
           | I think another big crux here is that there isn't really any
           | notion of cluster-wide downtime, aside from e.g. a full
           | datacenter power outage (which we've had ig, and now have
           | UPSes in each rack kindly provided and installed by our
           | datacenter). On the software/network level the storage isn't
           | really coordinated in any manner, so failures of one machine
           | only reflect as a degradation to the total theoretical
           | bandwidth for training. This means that there's generally no
           | scrambling and we can just schedule maintenance at our
           | leisure. Last time I drew straws for maintenance I clocked a
           | 30min round-trip to walk over and plug a crash cart into each
           | of the 3 problematic machines to reboot and re-intialize and
           | that was it.
           | 
           | Again having it right by the office is super nice, we'll need
           | to really trust our kvm setup before considering anything
           | offsite.
        
         | kabdib wrote:
         | I've built and maintained similar setups (10PB range).
         | Honestly, you just shove disks into it, and when they fail you
         | replace them. You need folks around to handle things like
         | controller / infrastructure failure, but hopefully you're
         | paying them to do other stuff, too.
        
       | OutOfHere wrote:
       | Is it correct that you have zero data redundancy? This may work
       | for you if you're just hoarding videos from YouTube, but not for
       | most people who require an assurance that their data is safe.
       | Even for you, it may hurt proper benchmarking, reproducibility,
       | and multi-iteration training if the parent source disappears.
        
         | nee1r wrote:
         | Definitely much less redundancy, this was definitely a tradeoff
         | we made for pretraining data and cost.
        
           | Sanzig wrote:
           | Did you do any kind of redundancy at least (eg: putting every
           | 10 disks in RAID 5 or RAID Z1)? Or I suppose your training
           | application doesn't mind if you shed a few terabytes of data
           | every so often?
        
             | g413n wrote:
             | atm we don't and we're a bit unsure whether it's a free
             | lunch wrt adding complexity. there's a really nice property
             | of having isolated hard drives where you can take any
             | individual one and `sudo mount` it and you have a nice
             | chunk of training data, and that's something anyone can
             | feel comfortable touching without any onboarding to some
             | software stack
        
       | RagnarD wrote:
       | I love this story. This is true hacking and startup cost
       | awareness.
        
         | nee1r wrote:
         | Thanks!! :)
        
       | boulos wrote:
       | It's quite cheap to just store data at rest, but I'm pretty
       | confused by the training and networking set up here. It sounds
       | like from other comments that you're not going to put the GPUs in
       | the same location, so you'll be doing all training over X 100
       | Gbps lines between sites? Aren't you going to end up totally
       | bottlenecked during pretraining here?
        
         | g413n wrote:
         | yeah we just have the 100gig link, atm that's about all the gpu
         | clusters can pull but we'll prob expand bandwidth and storage
         | as we scale.
         | 
         | I guess worth noting that we do have a bunch of 4090s in the
         | colo and it's been super helpful for e.g. calculating
         | embeddings and such for data splits.
        
           | mwambua wrote:
           | How did you arrive at the decision of not putting the GPU
           | machines in the colo? Were the power costs going to be too
           | high? Or do you just expect to need more physical access to
           | the GPU machines vs the storage ones?
        
             | g413n wrote:
             | When I was working at sfcompute prior to this we saw
             | multiple datacenters literally catch on fire bc the
             | industry was not experienced with the power density of
             | h100s. Our training chips just aren't a standard package in
             | the way JBODs are.
        
               | Symbiote wrote:
               | Isn't the easy option to spread the computers out, i.e.
               | not fill the rack, but only half of it?
               | 
               | A GPU cluster next to my servers has done this,
               | presumably they couldn't have 64A in one rack so they've
               | got 32A in two. (230V 3phase.)
        
               | pixl97 wrote:
               | Rackspace is typically at a premium at most data centers.
        
               | Symbiote wrote:
               | I'm more surprised that a data centre will apparently
               | provide more power to a rack than is safe to use.
        
               | lemonlearnings wrote:
               | Adding the compute story would be interesting as a follow
               | up.
               | 
               | Where is that done? How many GPUs do you need to
               | crunching all that data. Etc.
               | 
               | Very interesting and refreshing read though. Feels like
               | what Silicon Valley is more about than just the usual: tf
               | apply then smile and dial.
        
       | huxley_marvit wrote:
       | damn this is cool as hell. estimate on the maintenance cost in
       | person-hours/month?
        
         | nee1r wrote:
         | Around 2-5 hours/month, mostly powercycling the servers and
         | replacing hard drives
        
           | Symbiote wrote:
           | You should be able to power cycle the servers from their
           | management interfaces.
           | 
           | (But I have the luxury of everything being bought new from
           | HP, so the interfaces are similar.)
        
       | jonas21 wrote:
       | Nice writeup. All of the technical detail is great!
       | 
       | I'm curious about the process of getting colo space. Did you use
       | a broker? Did you negotiate, and if so, how large was the
       | difference in price between what you initially were quoted and
       | what you ended up paying?
        
         | nee1r wrote:
         | We reached out to almost every colocation space in SF/some in
         | Fremont to get quotes. There wasn't a difference between the
         | quote price and what we ended up paying, though we did
         | negotiate terms + one-time costs.
        
       | archmaster wrote:
       | Had the pleasure of helping rack drives! Nothing more fun than an
       | insane amount of data :P
        
         | nee1r wrote:
         | Thanks for helping!!!
        
       | miltonlost wrote:
       | And how much did the training data cost?
        
       | jimmytucson wrote:
       | Just wanted to say, thanks for doing this! Now the old rant...
       | 
       | I started my career when on-prem was the norm and remember so
       | much trouble. When you have long-lived hardware, eventually, no
       | matter how hard you try, you just start to treat it as a pet and
       | state naturally accumulates. Then, as the hardware starts to be
       | not good enough, you need to upgrade. There's an internal team
       | that presents the "commodity" interface, so you have to pick out
       | your new hardware from their list and get the cost approved (it's
       | a lot harder to just spend a little more and get a little more).
       | Then your projects are delayed by them racking the new hardware
       | and you properly "un-petting" your pets so they can respawn on
       | the new devices, etc.
       | 
       | Anyways, when cloud came along, I was like, yeah we're switching
       | and never going back. Buuut, come to find out that's part of the
       | master plan: it's a no-brainer good deal until you and everyone
       | in your org/company/industry forgets HTF to rack their own
       | hardware, and then it starts to go from no-brainer to brainer.
       | And basically unless you start to pull back and rebuild that
       | muscle, it will go from brainer to no-brainer _bad_ deal. So
       | thanks for building this muscle!
        
         | theideaofcoffee wrote:
         | I'm not op, but thanks for this. Like I mentioned in another
         | comment, the wholesale move to the cloud has caused so many
         | skills to become atrophied. And it's good that someone is
         | starting to exercise that skill again, like you said. The
         | hyperscalers are mostly to blame for this, the marketing FUD
         | being that you can't possibly do it yourself, there are too
         | many things to keep track of, let us do it (while conveniently
         | leaving out how eye-wateringly expensive they are in
         | comparison).
        
           | tempest_ wrote:
           | The other thing the cloud does not let you do is make trade
           | offs.
           | 
           | Sometimes you can afford not to have triple redundant 1000GB
           | network or a simple single machine with raid may have
           | acceptable down time.
        
             | g413n wrote:
             | yeah this
             | 
             | it means that even after negotiating much better terms than
             | baseline we run into the fact that cloud providers just
             | have a higher cost basis for the more premium/general
             | product.
        
         | g413n wrote:
         | we're in a pretty unique situation in that _very early on_ we
         | fundamentally can 't afford the hyperscaler clouds to cover
         | operations, so we're forced to develop some expertise. turned
         | out to be reasonably chill and we'll prob stick with it for the
         | foreseeable future, but we have seen a little bit of the state-
         | creep you mention so tbd.
        
         | nodja wrote:
         | Yeah from memory on-prem was always cheaper, it just removed a
         | lot of logistic obstacles and made everything convenient under
         | one bill.
         | 
         | IIRC the wisdom of the time cloud started becoming popular was
         | to always be on-prem and use cloud to scale up when demand
         | spiked. But over time temporarily scaling up became permanent,
         | and devs became reliant on instantly spawning new machines for
         | things other than spikes in demand and now everyone defaults to
         | cloud and treats it as the baseline. In the process we lost the
         | grounding needed to assess the real cost of things and
         | predictably the cost difference between cloud and on-prem has
         | only widened.
        
           | luhn wrote:
           | > IIRC the wisdom of the time cloud started becoming popular
           | was to always be on-prem and use cloud to scale up when
           | demand spiked.
           | 
           | I've heard that before but was never able to make sense of
           | it. Overflowing into the cloud seems like a nightmare to
           | manage, wouldn't overbuilding on-prem be cheaper than paying
           | your infra team to straddle two environments?
        
             | sgarland wrote:
             | As someone with experience with a company that did hybrid,
             | I'll say: it only makes sense if your infra team deeply
             | understands computers.
             | 
             | The end state is "just some IaC," wherein it doesn't really
             | matter to anyone where the application lives, but all of
             | the underlying difficulties in getting to that state
             | necessitate that your team actually, no-shit knows how
             | distributed systems work. They're going to be doing a lot
             | of networking configuration, for one, and that's a whole
             | speciality.
        
         | ares623 wrote:
         | Wanna see us do it again?
        
         | matt-p wrote:
         | Docker is _amazing_ for forcing the machines not to be pets,
         | seriously, a racked sever is just another K3 or K8 node (or
         | whatever) and doesn 't get the choice or ability of being
         | petted. It's so nice. You could maybe of said the same about
         | vm's but not really, the VM just became the pet, OK you could
         | at least image/snapshot it but it's not the same.
        
           | doublerabbit wrote:
           | I've found docker is as of a monstrous pet.
           | 
           | Docker is a monster that you have to treat as a pet. You've
           | still got to pet it through stages of updating, monitoring,
           | snapshots and networking. When the internal system breaks
           | it's no different to a server collapsing.
           | 
           | Snapshots are a haircut for the monster, useful but can make
           | things worse.
        
             | matt-p wrote:
             | Not in my experience, super easy to setup a K3s cluster in
             | a single rack. Certainly less hassle than VMWare was or XEN
             | ever was.
        
       | pronoiac wrote:
       | I wonder if they'll go with "toploaders" - like Backblaze Storage
       | Pods - later. They have better density and faster setup, as they
       | don't have to screw in every drive.
       | 
       | They got used drives. I wonder if they did any testing? I've
       | gotten used drives that were DOA, which showed up in tests -
       | SMART tests, short and long, then writing pseudorandom data to
       | verify capacity.
        
         | g413n wrote:
         | yeah we're very interested in trying toploaders, we'll do a
         | test rack next time we expand and switch to that if it goes
         | well.
         | 
         | w.r.t. testing the main thing we did was try to buy a bit from
         | each supplier a month or two ahead of time, so by the time we
         | were doing the full build that rack was a known variable. We
         | did find one drive lot which was super sketchy and just didn't
         | include it in the bulk orders later. diversity in suppliers
         | helps a lot with tail risk
        
           | joshvm wrote:
           | "don't have to screw in every drive" is relative, but at
           | least tool-less drive carriers are a thing now.
           | 
           | A lot of older toploaders from vendors like Dell are not
           | tool-free. If you bought vendor drives and one fails, you RMA
           | it and move on. However if you want to replace failed drives
           | in the field, or want to go it alone from the start with
           | refurbished drives... you'll be doing a lot of screwing.
           | They're quite fragile and the plastic snaps easily. It's
           | pretty tedious work.
        
         | tempest_ wrote:
         | Used Supermicro machines of this generation and very cheap (all
         | things considered)
         | 
         | https://www.theserverstore.com/supermicro-superstorage-ssg-6...
        
       | synack wrote:
       | IPMI is great and all, but I still prefer serial ports and remote
       | PDUs. Never met a BMC I could trust.
        
         | jeffrallen wrote:
         | Try Lenovo. Their BMCs Don't Suck (tm).
        
       | fragmede wrote:
       | My question isn't why do it yourself. A quick back of the
       | envelope math shows AWS being much more expensive. My question is
       | why San Francisco? It's one of the most expensive real estate
       | markets in the US (#2 residential, #1 commercial), and
       | electricity is _expensive_. $0.71 /KwH peak residential rate! A
       | jaunt down 280 to San Jose's gonna be cheaper, at the expense of.
       | having to take that drive to get hands on. But I'm sure you can
       | find someone who's capable of running a DC that lives in San Jose
       | and needs a job so the SF team doesn't have to commute down to
       | South Bay. Now obviously there's something to be said for having
       | the rack in the office, I know of at least two (three, now) in
       | San Francisco, it just seems like a weird decision if you're
       | already worrying about money to the point of not using AWS.
        
         | hnav wrote:
         | Article says their recurring cost is $17.5k, they'll spend at
         | least that amount in terms of human time tending to their
         | cluster if they have to drive to it. It's also a question of
         | magnitudes, going from $0.5m/mo to $0.05m/mo (hard costs plus
         | the extra headaches of dealing with cluster) is an order of
         | magnitude, even if you could cut another order of magnitude it
         | wouldn't be as impactful.
        
         | renewiltord wrote:
         | Problem when you self-roll this is that you inevitably make
         | mistakes and the cycle time of going down and up ruins
         | everything. Access trumps everything.
         | 
         | You can get a DC guy but then he doesn't have much to do post
         | setup and if you contract that you're paying mondo dollars
         | anyway to get it right and it's a market for lemons (lots of
         | bullshitters out there who don't know anything).
         | 
         | Learned this lesson painfully.
        
         | g413n wrote:
         | it's not just in sf it's across the street from our office
         | 
         | this has been incredibly nice for our first hardware project,
         | if we ever expand substantially then we'd def care more about
         | the colo costs.
        
       | tarasglek wrote:
       | i am still confused what their software stack is, they dont use
       | ceph but bought netapp, so they use nfs?
        
         | OliverGuy wrote:
         | The NetApps are just disk shelves, can plug it into a SAS
         | controller and use whatever software stack you please.
        
           | tarasglek wrote:
           | but they have multiple head nodes, so its some distributed
           | setup or just active/passive type thing?
        
             | hnav wrote:
             | I'm guessing the client software (outside the dc) is
             | responsible for enumerating all the nodes which all get
             | their own IP.
        
       | trebligdivad wrote:
       | The networking stuff seems....odd.
       | 
       | 'Networking was a substantial cost and required experimentation.
       | We did not use DHCP as most enterprise switches don't support it
       | and we wanted public IPs for the nodes for convenient and
       | performant access from our servers. While this is an area where
       | we would have saved time with a cloud solution, we had our
       | networking up within days and kinks ironed out within ~3 weeks.'
       | 
       | Where does the switch choice come into whether you DHCP? Wth
       | would you want public IPs.
        
         | giancarlostoro wrote:
         | > Wth would you want public IPs.
         | 
         | So anyone can download 30 PB of data with ease of course.
        
         | buzer wrote:
         | > Wth would you want public IPs.
         | 
         | Possibly to avoid needing NAT (or VPN) gateway that can handle
         | 100Gbps.
        
           | bombcar wrote:
           | I don't know what they're doing, but Mikrotik can perhaps
           | route that -
           | https://mikrotik.com/product/ccr2216_1g_12xs_2xq#fndtn-
           | testr... and is about the cost of their used thing.
           | 
           | And I think this would be a banger for IPv6 if they really
           | "need" public IPs.
        
             | dustywusty wrote:
             | Exactly what I came in to say, CCR2216 can do this for <
             | $2k, and does it well.
        
           | xp84 wrote:
           | No DHCP doesn't mean public IPs nor impact the need for NAT,
           | it just means the hosts have to be explicitly configured with
           | IP addresses, default gateways if they need egress, and DNS.
           | 
           | Those IPs you end up assigning manually could be private ones
           | or routable ones. If private, authorized traffic could be
           | bridged onto the network by anything, such as a random
           | computer with 2 NICs, one of which is connected eventually to
           | the Internet and one of which is on the local network.
           | 
           | If public, a firewall can control access just as well as
           | using NAT can.
        
             | buzer wrote:
             | I know, I was specifically answering the question of "why
             | the hell would you want public IPs".
             | 
             | I don't know why their network setup wouldn't support DHCP,
             | that's extremely common especially in "enterprise" switches
             | via DHCP forwarding.
        
         | pclmulqdq wrote:
         | They didn't seem to want to use a router. Purpose-built 100
         | Gbps routers are a bit expensive, but you can also turn a
         | computer into one.
        
           | flumpcakes wrote:
           | Many switches are L3 capable, making them in effect a router.
           | Considering their internet lines appear to be hooked up to
           | their 100 Gbps switch, I'd guess this is one of the L3 ones.
        
         | mystifyingpoi wrote:
         | It really feels like they wanted 30 PB of storage accessible
         | over HTTP and _literally nothing else_. No redundancy, no NAT,
         | dead simple nginx config + some code to track where to find
         | which file on the filesystem. I like that.
        
         | matt-p wrote:
         | This was not written by a network person, quite clearly.
         | Hopefully it's just a misunderstanding, otherwise they do need
         | someone with literally any clue about networks.
        
           | g413n wrote:
           | yeah misunderstanding we'll update the post-- separately it's
           | true that we aren't network specialists and the network
           | wrangling was prob disproportionately hard for us/ shouldn't
           | have taken so long.
        
             | trebligdivad wrote:
             | I assume your actual training is being done somewhere else?
             | Did you try getting colocation space in the same datacentre
             | as somewhere with the compute - it would have reduced your
             | internet costs even further.
        
               | g413n wrote:
               | yeah the cost calculus is very different for gpus, it
               | absolutely makes sense for us to be using cloud there.
               | also hardly any datacenters can support the power
               | density, esp in downtown sf
        
               | trebligdivad wrote:
               | Yeh; one other thing - you list a separate management
               | network as an optional - it's not optional! Under no
               | circumstance must you expose the managemnt IPs of
               | switches or the servers to the internet; they are, on
               | average, about as secure as a drunk politician. Use a
               | separate management net, make sure it's only securly
               | accessed.
        
               | Symbiote wrote:
               | I understood that it's optional because they can walk
               | down the road to the data center instead.
               | 
               | They mention plugging monitors in several times. I think
               | I've only done that once in the last couple of years,
               | when a firmware upgrade failed and reset the management
               | interface IP.
        
             | matt-p wrote:
             | Massive props for getting it done anyway. For others
             | reading: In general a switch should never run DHCPd, but
             | will normally/often relay it for you, your arista's would
             | 100% have supported relaying, but in this case it sounds
             | like it might even be flat L2. Normally you'd host dhcpd on
             | a server.
             | 
             | Some general feedback incase it's helpful.. -20K on
             | contractors seems insane if we're talking about rack and
             | stack for 10 racks. Many datacentres can be persuaded to do
             | it for free as part of you agreeing to sign their contract.
             | Your contractors should at least be using a server lift of
             | some kind, again often provided kindly by the facility. If
             | this included paying for server configuration and so on,
             | then ignore that comment (bargin!).
             | 
             | -I would almost never expect to actually pay a setup fee
             | (beyond something nominal like 500 per rack) to the
             | datacentre either, certainly if you're going to be paying
             | that fee it had better include rack and stack.
             | 
             | -A crash cart should not be used for a install of this
             | size, the servers should be plugged into the network, and
             | then automatically configured by a script/IPXE. It might
             | sound intimidating or hard but it's not, doesn't even
             | require IMPI (though frankly I would strongly, strongly
             | recommend it, if you do't already have it). I would use
             | managed switches for the management network too, for sure.
             | 
             | -Consider two switches, especially if they are second hand.
             | The cost of the cluster not being usable for a few days
             | while you source and install a replacement even here
             | probably is still thousands.
             | 
             | -Personally not a big fan of the whole JBOD architecture
             | and would have just filled by boots with single socket 4u
             | supermicro chasis. To each their own, but JBOD's main
             | benefit is a very small financial saving at the cost of
             | quite a lot of drawbacks IMO. YMMV.
             | 
             | -Depending on who you use for GPUs, getting a private link
             | or 'peering' to them might save you some cost and provide
             | higher capacity.
             | 
             | -I'm kind of shocked that FMT2 didn't turn out much cheaper
             | than your current colo, would expect less than those
             | figures possibly _with_ the 100G DIA included (normally
             | about $3000 /month no setup).
        
         | XorNot wrote:
         | I mean generally above a certain size of deployment DHCP is
         | much more trouble then it's worth.
         | 
         | DHCP is really only worth it when your hosts are truly dynamic
         | (i.e. not controlled by you). Otherwise it's a lot easier to
         | handle IP allocation as part of the asset lifecycle process.
         | 
         | Heck even my house IoT network is all static IPs because at the
         | small scale it's much more robust to not depend on my home
         | router for address assignment - replacing a smart bulb is a big
         | enough event, so DHCP is solely for bootstrapping in that case.
         | 
         | At the enterprise level unpacking a server and recording the
         | asset IDs etc is the time to assign IP addresses.
        
         | Symbiote wrote:
         | I have static, public IPs across 80 or so servers.
         | 
         | It gets set approximately once when the server's automated
         | Ubuntu installation runs, and I never think about it.
         | 
         | > Where does the switch choice come into whether you DHCP?
         | 
         | Perhaps from home routers which include I've.
         | 
         | > Wth would you want public IPs.
         | 
         | Why wouldn't you? They have a firewall.
        
       | OliverGuy wrote:
       | Aren't those netapp shelves pretty old at this point? See a lot
       | of people recommending against them even for homelab type uses.
       | You can get those 60 drive SuperMicro JBODs for pretty cheap now,
       | and those aren't too old, would have been my choice.
       | 
       | Plus, the TCO is already way under the cloud equiv. so might as
       | well spend a little more to get something much newer and more
       | reliable
        
         | g413n wrote:
         | yeah it's on the wishlist to try
        
       | drnick1 wrote:
       | Everyone should give AWS the middle finger and start doing this.
       | Beyond cost, it's a matter of sovereignty over one's computing
       | and data.
        
         | twoodfin wrote:
         | If this is a real market, I'd expect AWS to introduce S3
         | Junkyard with a similar durability and cost structure.
         | 
         | They probably still won't budge on the egress fees.
        
       | alchemist1e9 wrote:
       | Would have been much easier and probably cheaper to buy gear from
       | 45drives.
        
       | renewiltord wrote:
       | The cost difference is huge. Modern compute is just so much
       | bigger than one would think. Hurricane Electric is incredibly
       | cheap too. And Digital Realty in the city are pretty good. The
       | funny thing is that the Monkeybrains guys will make room for you
       | at $75/amp but that isn't competitive when a 9654 based system
       | pulls 2+ amps at peak.
       | 
       | Still fun for someone wanting to stick a computer in a DC though.
       | 
       | Networking is surprisingly hard but we also settled for the
       | cheapo life QSFP instead of the new Cisco switches that do 800
       | Gbps that are coming. Great writeup.
       | 
       | One that would be fun is about the mechanics of layout and
       | cabling and that sort of thing. Learning all that manually was a
       | pain in the ass. It's not just written down somewhere and I
       | should have done it when I was doing it but now I no longer am
       | doing it and so can't provide good photos.
        
       | ThinkBeat wrote:
       | So now you have all
       | 
       | - your storage in one place
       | 
       | - you own all backup,
       | 
       | -- off site backup (hot or cold)
       | 
       | - uptime worries
       | 
       | - maintenance drives
       | 
       | -- how many can fail. before it is a problem
       | 
       | - maintenance machines
       | 
       | -- how many can fail. before it is a problem
       | 
       | - maintenance misc/datacenter
       | 
       | - What to do the electricity is cut off suddenly
       | 
       | -- do you have a backup provider?
       | 
       | -- disel generators?
       | 
       | -- giant batteries?
       | 
       | -- Will the backup power also run cooling?
       | 
       | -natural disaster
       | 
       | -- earthquake
       | 
       | -- flooding
       | 
       | -- heatwave
       | 
       | - physical security
       | 
       | - employee training / (esp. if many quit)
       | 
       | - backup for networking (and power for it)
       | 
       | - employees on call 24/7
       | 
       | - protection against hacking
       | 
       | +++++
       | 
       | I agree that a lot of cloud providers overcharge by a lot, but
       | doing it all yourself gives you a lot of headaches.
       | 
       | co-hosting would seem like a valuable partial mitigator.
        
         | pclmulqdq wrote:
         | Most of these come from your colo provider (including a good
         | backup power and networking story), and you can pay remote
         | hands for a lot of the rest.
         | 
         | Things like "protection from hacking" also don't come from AWS.
        
       | yread wrote:
       | You could get pretty close to the cost 1$/TB/month using
       | Hetzner's sx135 with 8x22TB so 140TB in raidz1 for 240 eur. Maybe
       | you get a better rate if you rent 200 of them. Someone else takes
       | care of a lot of risks and you can sleep well at night
        
         | nodja wrote:
         | I don't think Hetzner provides locations in SF. Those 100GBit
         | connections don't do much if they need to connect outside the
         | city the rest of the equipment is in, but maybe peering has
         | gotten better and my views are outdated.
        
           | fuzzylightbulb wrote:
           | You're good. The speed of light through a glass fiber is
           | still just as slow as it ever was.
        
         | g413n wrote:
         | yeah it's totally plausible that we go with something like this
         | in the future. We have similar offers where we could separate
         | out either the financing, the build-out, or both and just do
         | the software.
         | 
         | (for Hetzner in particular it was a massive pain when we were
         | trying to get CPU quotas with them for other data operations,
         | and we prob don't want to have it in Europe, but it's been
         | pretty easy to negotiate good quotes on similar deals locally
         | now that we've shown we can do it ourselves)
        
         | mx7zysuj4xew wrote:
         | You cannot use hetzner for anything serious.
         | 
         | They'd most likely claim abuse and delete your data wholesale
         | without notice
        
       | coleca wrote:
       | For a workload of that size you would be able to negotiate
       | private pricing with AWS or any cloud provider, not just
       | CloudFlare. You can get a private pricing deal on S3 with as
       | little as half a PB. Not saying that your overall expenses would
       | be cheaper w/a CSP than DIY, but its not exactly an apples to
       | apples comparison of taking full retail prices for the CSPs
       | against eBayed equipment and free labor (minus the cost of the
       | pizza).
        
         | g413n wrote:
         | egress costs are the crux for AWS and they didn't budge when we
         | tried to negotiate that we them, it's just entirely unusable
         | for AI training otherwise. I think the cloudflare private quote
         | is pretty representative of the cheaper end of managed object-
         | bucket storage.
         | 
         | obv as we took on this project the delta between our cluster
         | and the next-best option got smaller, in part bc the ability to
         | host it ourselves gives us negotiating leverage, but managed
         | bucket products are fundamentally overspecced for simple
         | pretraining dumps. glacier does a nice job fitting the needs of
         | archival storage for a good cost, but there's nothing similar
         | for ML needs atm.
        
       | landryraccoon wrote:
       | Their electricity costs are $10K per month or about $120K per
       | year. At an interest rate of 7% that's $1.7M of capital tied up
       | in power bills.
       | 
       | At that rate I wonder if it makes sense to do a massive solar
       | panel and battery installation. They're already hosting all of
       | their compute and storage on prem, so why not bring electricity
       | generation on prem as well?
        
         | moffkalast wrote:
         | Let's just say we're not seeing all of these sudden private
         | nuclear reactor investments for no reason.
        
         | datadrivenangel wrote:
         | At 120K per year over the three year accounting life of the
         | hardware, that's 360k... how do you get to 1.7M?
        
           | landryraccoon wrote:
           | It seems unlikely to me that they'll never have to retrain
           | their model to account for new data. Is the assumption that
           | their power usage drastically drops after 3 years?
           | 
           | Unless they go out of business in 3 years that seems unlikely
           | to me. Is this a one-off model where they train once and it
           | never needs to be updated?
        
       | intalentive wrote:
       | "Solve computer use" and previous work is audio conversation
       | model. How do these go together? Is the idea to replace keyboard
       | and mouse with spoken commands? a la Star Trek
        
         | g413n wrote:
         | just general research work. Once the recipes are efficient
         | enough the modality is a smaller detail.
         | 
         | On the product side we're trying to orient more towards
         | 'productive work assistant' rather than the default pull of
         | audio models towards being an 'ai friend'.
        
         | nerpderp82 wrote:
         | Make me transparent aluminum!
        
       | Onavo wrote:
       | > _We kept this obsessively simple instead of using MinIO or Ceph
       | because we didn't need any of the features they provided; it's
       | much, much simpler to debug a 200-line program than to debug
       | Ceph, and we weren't worried about redundancy or sharding. All
       | our drives were formatted with XFS._
       | 
       | What do you plan to do if you start getting corruption and
       | bitrot? The complexity of S3 comes with a lot of hard guarantees
       | for data integrity.
        
         | g413n wrote:
         | our training stack doesn't make strong assumptions about data
         | integrity, it's chill
        
       | htrp wrote:
       | >We threw a hard drive stacking party in downtown SF and got our
       | friends to come, offering food and custom-engraved hard drives to
       | all who helped. The hard drive stacking started at 6am and
       | continued for 36 hours (with a break to sleep), and by the end of
       | that time we had 30 PB of functioning hardware racked and wired
       | up.
       | 
       | So how many actual man hours for 2400 drives?
        
         | g413n wrote:
         | around 250
        
       | Havoc wrote:
       | Cool write-up.
       | 
       | I do feel sorry for the friends that go suckered into doing a
       | bunch of grunt work for free though
        
         | g413n wrote:
         | yeah that's why we started paying people near the second half-
         | not super clearly stated in the blogpost, but the novelty
         | definitely wore off with plenty of drives left to stack, so we
         | switched strategies to get it done in time.
         | 
         | I think everyone who showed up for a couple hours as part of
         | the party had a good time tho, and the engraved hard drives we
         | were giving out weren't cheap :p
        
       | pighive wrote:
       | HDDs - are never one time costs. Do datacenters also offer
       | ordering and replacing HDDs?
        
         | epistasis wrote:
         | With 30PB it's likely they will simply let capacity fall as
         | drives fail.
         | 
         | They apparently have zero need for redundancy in their use
         | case, and the failure rate won't be high enough to take out a
         | significant percentage of their capacity.
        
         | Symbiote wrote:
         | They offer replacing, yes, but normally expect you to order the
         | new one. (Usually covered by a warranty, sent next business
         | day.)
        
       | supermatt wrote:
       | Where does one get "90 million hours of video data"?
        
         | hmcamp wrote:
         | I'm also curious about this. I don't recall seeing that
         | mentioned in the article
        
       | neilv wrote:
       | As a fan of eBay for homelab gear, I appreciate the can-do
       | scrappiness of doing it for a startup.
       | 
       | To adapt the old enterprise information infrastructure saying for
       | startups:
       | 
       | "Nobody Ever Got Fired for Buying eBay"
        
       | Scramblejams wrote:
       | Fun piece, thanks to the author. But for vicarious thrills like
       | this, more pictures are always appreciated!
        
         | echelon wrote:
         | If the authors chime in, I'd like to ask what "Standard
         | Intelligence PBC" does.
         | 
         | Is it a public benefit corp?
         | 
         | What are y'all building?
        
       | ThrowawayTestr wrote:
       | DIY is always cheaper than paying someone else. Great write-up.
        
       | akreal wrote:
       | How is/was the data written to disks? Something like
       | rsync/netcat?
        
       | lucb1e wrote:
       | The linked Discord post is also interesting and fun to read. Most
       | of the post is more serious but this is one of the small gems:
       | 
       | > One thing we discovered very quickly was that [world cup] goals
       | scored showed up in our monitoring graphs. This was very cool
       | because not only is it neat to see real-world events show up in
       | your systems, but this gave our team an excuse to watch soccer
       | during meetings. We weren't "watching soccer during meetings", we
       | were "proactively monitoring our systems' performance."
       | 
       | https://discord.com/blog/how-discord-stores-trillions-of-mes...
       | 
       | It is linked as evidence for Discord using "less than a petabyte"
       | of storage for messages. My best guess is that they multiplied
       | node size and count from this post, which comes out to 708 TB for
       | the old cluster and 648 in the new setup (presumably it also has
       | some space to grow)
        
       ___________________________________________________________________
       (page generated 2025-10-01 23:00 UTC)