[HN Gopher] Ask HN: How would you store 10PB of data for your st...
___________________________________________________________________
Ask HN: How would you store 10PB of data for your startup today?
I'm running a startup and we're storing north of 10PB of data and
growing. We're currently on AWS and our contract is up for renewal.
I'm exploring other storage solutions. Min requirements of AWS S3
One Zone IA (https://aws.amazon.com/s3/storage-
classes/?nc=sn&loc=3) How would you store >10PB if you'd be in my
shoes? Thought experiment can be with and without data transfer
cost our of current S3 buckets. Please mention also what your
experience is based on. Ideally you store large amounts of data
yourself and speak of first hand experience. Thank you for your
support!! I will post a thread once we got to a decision on what we
ended up doing. Update: Should have mentioned earlier, data needs
to be accessible at all time. It's user generated data that is
downloaded in the background to a mobile phone, so super low
latency is not important, but less than 1000ms required. The data
is all images and videos, and no queries need to be performed on
the data.
Author : philippb
Score : 154 points
Date : 2021-04-23 08:12 UTC (14 hours ago)
| philippb wrote:
| i just wanted to thank everyone for taking the time to reply.
| This has been way better input than I thought it would turn out.
| louwrentius wrote:
| I hope you will share your decision with us if you can, would
| be interesting to understand.
|
| Good luck.
| tw04 wrote:
| I should preface this with: I read the question as you want
| something on-premises/in a colo. If you're talking hosted S3 by
| someone other than Amazon that's a different story.
|
| It probably depends on if you are tied at the hip to _other_ AWS
| services. If you are, then you 're kind of stuck. The
| ingress/egress traffic will kill you doing anything with that
| data anywhere else.
|
| If you aren't, the major players for on-prem S3 (assuming you
| want to continue access the data that way) would be (in no
| specific order):
|
| Cloudian
|
| Scality
|
| NetApp Storagegrid
|
| Hitachi Vantara HCP
|
| Dell/EMC ECS
|
| There are plusses and minuses to all of them. At that capacity I
| would honestly avoid a roll-your-own unless you're on a
| shoestring budget. Any of the above will be cheaper than Amazon.
| babelfish wrote:
| I assume you're already making use of most of S3s auto-archive
| features?[0] Really it seems like this comes down to how quickly
| any of your data /needs/ to be loaded. I'd probably investigate
| after how much time a file is only ~1-10% likely to be accessed
| in the next 30 days, then auto-archive files in S3 to Glacier
| after that threshold. If you want to be a bit 'smarter' about it,
| here's an article by Dropbox[1] on how they saved $1.7M/year by
| determining which file previews actually need to be generated,
| and their strategy seems like it could be applied to your use
| case. That said, it seems like you are more likely to save money
| by going colo than by staying in the cloud.
|
| [0] https://aws.amazon.com/blogs/aws/archive-s3-to-glacier/ [1]
| https://dropbox.tech/machine-learning/cannes--how-ml-saves-u...
| sparrc wrote:
| Have you tried backblaze b2 storage? Requires more work client-
| side but is around 1/4 to 1/5 the price.
|
| The only issue is whether or not you have a CDN in front of this
| data. If you do then backblaze might not be much cheaper than
| S3->Cloudfront. You'd save storage costs but easily exceed those
| savings in egress.
| user5994461 wrote:
| What if you want to move off S3? Let's do the math.
|
| * To store 10+ PB of data.
|
| * You need 15 PB of storage (running at 66% capacity)
|
| * You need 30 PB of raw disks (twice for redundancy).
|
| You're looking at buying thousands of large disks, in the order
| of a million dollar upfront. Do you have that sort of money
| available right now?
|
| Maybe you do. Then, are you ready to receive and handle entire
| pallets of hardware? That will need to go somewhere with power
| and networking. They won't show up for another 3-6 months because
| that's the lead time to receive an order like that.
|
| If you talk to Dell/HP/other, they can advise you and sell you
| large storage appliances. Problem is, the larger appliances will
| only host 1 or 2 PB. That's nowhere near enough.
|
| There is a sweet spot in moving off the cloud, if you can fit
| your entire infrastructure into one rack. You're not in that
| sweet spot.
|
| You're going to be filling multiple racks, which is a pretty
| serious issue in terms of logistics (space, power, upfront costs,
| networking).
|
| Then you're going to have to handle "sharding" on top of the
| storage because there's no filesystem that can easily address 4
| racks of disks. (Ceph/Lustre is another year long project for
| half a person).
|
| The conclusion of this story. S3 is pretty good. Your time would
| be better spend optimizing the software. What is expensive? The
| storage or the bandwidth or both?
|
| * If it's the bandwidth. You need to improve your CDN and caching
| layer.
|
| * If it's the storage. You should work on better compression for
| the images and videos. And check whether you can adjust
| retention.
| louwrentius wrote:
| Very good advice!
| gamegoblin wrote:
| FWIW you can get great redundancy with far less than 2x storage
| factor. e.g. Facebook uses a 10:14 erasure coding scheme[1] so
| they can lose up to 4 disks without losing data, and that only
| incurs a 1.4x storage factor. If one's data is cold enough, one
| can go wider than this, e.g. 50:55 or something has a 1.1x
| factor.
|
| Not that this fundamentally changes your analysis and other
| totally valid points, but the 2x bit can probably be reduced a
| lot.
|
| [1] https://engineering.fb.com/2015/05/04/core-data/under-the-
| ho...
| crescentfresh wrote:
| > * To store 10+TB of data.
|
| > * You need 15 TB of storage (running at 66% capacity)
|
| > * You need 30 TB of raw disks (twice for redundancy).
|
| Did you mean PB?
| user5994461 wrote:
| Corrected.
| plasma wrote:
| Have you looked into the storage tiering (eg moving objects to
| glacier) for less active users?
|
| Perhaps it's a mix of some app pattern changes and leveraging the
| storage tier options in AWS to reduce your cost.
| JoelSchmoel wrote:
| Move to Oracle Cloud and before everybody starts hammering me
| look at this: https://www.oracle.com/cloud/economics/
|
| I am not from Oracle and I am also running startup with growing
| pains. Oracle is a bit late to the Cloud game so they are loading
| up customer's base now and squeezing ears will come in 3-5 years
| down the road. Maybe you can take advantage of this.
| miouge wrote:
| Cloud or self-hosted will depend on your in-house expertise. For
| cloud others have already mentioned Backblaze and Wasabi, but you
| can also check Scaleway, they do 0.02 EUR/GB/mo for hot storage
| and 0.002/GB/mo for cold storage.
|
| Since we're talking about images and videos, do you already have
| different quality of each media available? Maybe thumbnail, high
| quality, and full quality. It could allow you to use cold storage
| for the full quality media, serving the high quality version
| while waiting for retrieval.
|
| If the use case is more of a backup/restore service and a restore
| typically takes longer than a cold storage retrieval (being
| Glacier or self hosted tape robot), then keep just enough in S3
| to restore while you wait for the retrieval of the rest.
|
| If you go the self-hosted route, I like software that is flexible
| around hardware failures. Something that will rebalance
| automatically and reduce the total capacity of the cluster,
| rather than require you to swap the drive ASAP. That way you can
| keep batch all the hardware swapping/RMA once per
| week/month/quarter.
| tormeh wrote:
| I believe Scaleway costs 0.01 EUR/GB, so a bit more than half
| of S3.
| ZeroCool2u wrote:
| Take any credits you can get from a provider switch and then
| thoroughly map out your access patterns, ingestion, and egress.
| Do whatever you can to segment data by your needs for
| availability and modification.
|
| If it's all archival storage then it's pretty straight forward.
| If you're on GCP you take it all and dump it into archival single
| region DRA (Durable Reduced Availability) storage for the lowest
| costs.
|
| Otherwise, identify your segments and figure out a strategy for
| "load balancing" between standard, nearline, coldline, and
| archive storage classes. If you can figure out a chronological
| pattern, you can write a small script that uses the gsutils
| built-in rsync feature to mirror over data from a higher grade
| storage class to a lower one at the right time.
|
| The strategy will probably be similar in any of the other big 3
| providers as well, but fair warning, some providers archival
| grade storage does not have immediate availability last I
| checked.
|
| See: https://cloud.google.com/storage/docs/storage-classes
|
| https://cloud.google.com/storage/docs/gsutil/commands/rsync
| bushbaba wrote:
| Flip side. How much time would that migration take. As a
| startup focusing that time on product would lead to more VC
| investment or more sales sooner. With the seed/series funding
| and sales being many multiples of the cost savings.
| oneplane wrote:
| If I were in your shoes I'd still host it on AWS, unless your
| shoes have a problem with the AWS bill, but then you run into
| other problems:
|
| - Paying for physical space and facilities
|
| - Paying people to maintain it
|
| - Paying for DRP/BCP
|
| - Paying periodically since it doesn't last forever so it'll need
| replacements
|
| But if you were to have to move out of AWS but Azure and GCP
| aren't options, you can do: Ceph and HDDs. Dual copies of files
| so you have to lose three drives for any specific file to have
| (only those files) dataloss. Does not come with versioning or
| full IAM-style access control or webservers for static files
| (which you get 'for free' with S3).
|
| HDDs don't need to be in servers, they can be in drive racks,
| connected with SAS or iSCSI to servers. This means you only need
| a few nodes to control many harddisks.
|
| A more integrated option would be (As suggested) back blaze pod-
| style enclosures, or storinator type top loaders (supermicro has
| those too). It's generally 4U rack units for 40 to 60 3.5"
| drives, which again generally comes to about 1PB per 4U. A 48U
| rack holds 11 units when using side-mounted PDUs, a single top-
| of-rack switch and no environmental monitoring in the rack (and
| no electronic access control - no space!).
|
| This means that for redundancy you'd need 3 racks of 10 units. If
| availability isn't a problem (1 rack down == entire service down)
| you can do 1 rack. If availability is important enough that you
| don't want downtime for maintenance, you need at least 2 racks.
| Cost will be about 510k USD per rack. Lifetime is about 5 to 6
| years but you'll have to replace dead drives almost every day at
| that volume, which means an additional 2000 drives over the
| lifespan, perhaps some RAM will fail too, and maybe one or two
| HBAs, NICs and a few SFPs. That's about 1.500.000 spare parts
| over the life of the hardware, not including the racks
| themselves, not including power, cooling or physical facilities
| to locate them.
|
| Note: all of the figures above are 'prosumer' class and semi-DIY.
| There are vendors that will support you partially, but that is an
| additional cost.
|
| I'm probably repeating myself (and others) here, but unless you
| happen to already have most of this (say: the people, skills,
| experience, knowledge, facilities, money upfront and money during
| its lifecycle), this is a bad idea and 10PB isn't nearly enough
| to do by yourself 'for cheaper'. You'd have to get into the 100PB
| or more arena to 'start' with this stuff if you need to get all
| of those externalities covered as well (unless it happens to be
| your core business, which from the opening post it doesn't seem
| to be).
|
| A rough S3 IA 1Z calculation shows a worst-case cost of about
| 150.000 USD monthly, but at that rate you can get a lot of cost
| savings, and with some smart lifecycle configuration you can get
| that down as well. This means that doing it yourself vs. letting
| AWS do it makes AWS half as expensive.
|
| Calculation as follows:
|
| DIY: at least 3 racks to match AWS IA OneZone (you'd need 3 racks
| on 3 different locations, a total of 9 racks to have 3 zones but
| we're not doing that as per your request) which means the initial
| starting cost is a minimum of 1.530.000 and combined with a
| lifetime cost of at least 1.500.000, over 5 years, if we're
| lucky, so about 606.000 per year, just for the contents of racks
| that you have to already have.
|
| Adding to this, you'd have some average colocation costs, no
| matter if you have an entire room, a private cage or a shared
| corridor. That's at least 160U and in total at least 1400VA per
| 4U (or roughly 14A at 120V). That amount of power is what a third
| of a normal rack might use on its own! Roughly, that will boil
| down to a monthly racking cost of 1300USD per 4U if you use one
| of those colocation facilities. That's another ~45k per month, at
| the very least.
|
| So no-personnel colocated can be done, but doing all that stuff
| 'externally' is expensive, about 95.500 every month, with no
| scalability, no real security, no web services or load balancing
| etc.
|
| That means below-par features gets you a rough saving of 50k
| monthly if you didn't need any personnel and nothing breaks
| 'more' than usual. And you'd have to not use any other features
| in S3 besides storage. And if you use anything outside of the
| datacenter you're located (i.e. if you host an app in AWS EC2,
| ECS or a lambda or something) and you need a reasonable pipe
| between your storage and the app, that's a couple of K's per
| month you can add, eating into the perceived savings.
| thehappypm wrote:
| Strong plus-one here. Rolling your own basically means you will
| need an entire brand new business function to keep the lights
| on. That's something your entire company is going to have to
| adapt to. New staff, new ways of thinking about data, new
| problems the C-suite needs to consider. The opportunity cost
| alone can be immense here, since your engineers will need to
| spend their time working on rote data storage and not business
| problems.
| ForHackernews wrote:
| Have you considered deleting most of it?
|
| Chances are you don't need all of it. Every company today thinks
| they need "Big Data" to do their theoretical magic machine
| learning, but most of them are wrong. Hoarding petabytes of
| worthless data doesn't make you Facebook.
|
| To be a little less glib, I'd start by auditing how much of that
| 10PB actually matters to anyone.
| christophilus wrote:
| Wasabi + BunnyCDN has worked like a charm for us. We've got about
| 50TB there, if I recall. Our bill is dramatically smaller than
| when we were on AWS. Wasabi has had some issues-- notably a DNS
| snafu that took the service out for about 8 hours, if I recall.
| But over all, the savings have been worth it.
| tux wrote:
| Maybe take a look at BackBlaze Storage Pods;
|
| https://www.backblaze.com/blog/open-source-data-storage-serv...
|
| There Storage Pod 6.0 can hold up to 480TB per server.
| stlava wrote:
| If data storage isn't your startup's job then I would negotiate
| heavily on the AWS contract.
| CodesInChaos wrote:
| How much can you get the pricing reduced at AWS? At list price,
| 10PB of IA storage cost $1.5M/yr.
| cookguyruffles wrote:
| AWS will blow up your phone if they know you're interested in
| dealing. Various online forms smattered around the site will
| put you in this pipeline. Just ensure you have a competing
| quote for them to work against
| royalresolved wrote:
| I'm unsure if it's mature enough for your use right now (in
| particular, the retrieval market is undeveloped for fast access,
| but I wonder if you looked at filecoin?)
|
| https://file.app/ https://docs.filecoin.io/build/powergate/
|
| (Disclosure: I am indirectly connected to filecoin, but
| interested in genuine answers)
| Tepix wrote:
| It seems to me like you could save a _ton_ of money by using your
| own hardware. Perhaps buy a bunch of big Synology boxes? At that
| scale you should also consider looking at technologies such as
| Ceph.
|
| We've recently switched to a setup with several Synology boxes
| for around 1PB net storage.
| faeyanpiraat wrote:
| Those boxes are slooooow, the 8 slot box have like a 500MB/s
| read speed limit, even if you raid0 8 SSDs, and use 10gbps
| networking. This limitation is in the product spec document,
| but with the smallest letters possible.
| nobozo wrote:
| Funny you should mention this. I once worked at a startup that
| stored lots of remote sensing data. Their strategy was to put
| it on a Synology. When the Synology filled up, they bought
| another, and so forth. Only some of the Synologys were online
| at any particular time, and there was no indexing to find which
| Synology held what data.
|
| Plus, there were no backups so if one Synology were to blow up,
| all the data on it was lost.
|
| Since they were a small startup it made some sense to start
| this way, but they had no plans on what to do about it as they
| got bigger.
| Tepix wrote:
| Using Synologys doesn't mean that you have to be stupid about
| it :-)
| philippb wrote:
| Thank you!
| epistasis wrote:
| If you have good sysadmin/devops types, this is a few racks of
| storage in a datacenter. Ceph is pretty good at managing
| something this size, and offers an S3 interface to the data (with
| a few quirks). We were mostly storing massive keys that were many
| gigabytes, so if you have smaller keys, so I'm not sure about
| performance/scalding limits with smaller keys and 10PB. I'd be
| sure to give your team a few months to build a test cluster then
| build and scale the full size cluster. And a few months to
| transfer the data...
|
| But you'll need to balance the cost of finding people with that
| level of knowledge and adaptability with the cost of bundled
| storage packages. We were running super lean, got great deals on
| bandwidth, power, and has low performance requirements. When we
| ran the numbers for all in costs, it was less than we thought we
| could get from any other vendor. And if you commit to buying the
| severs racks it will take to fit 10PB, you can probably get
| somebody like Quanta to talk to you.
| philippb wrote:
| The is amazing. Thank you. I've been looking at Backblaze
| storage pods that seem to be designed for that use case. Never
| rented rack space.
|
| Do you remember somehow the math on how much cheaper it was or
| how you thought about upfront cost vs ongoing. Just order of
| magnitude would be great.
| cmeacham98 wrote:
| I've run the math on this for 1PB of similar data (all
| pictures), and for us it was about 1.5-2 orders of magnitude
| cheaper over the span of 10 years (our guess for depreciation
| on the hardware).
|
| Note that we were getting significantly cheaper bandwidth
| than S3 and similar providers, which made up over half of our
| savings.
| dangerboysteve wrote:
| if you have looked at BB storage pods you should look at the
| 45drives.com the child of Protocase which manufactures the BB
| pods.
| mceachen wrote:
| Roughly a decade ago S3 storage pricing had a ~10x premium
| over self-hosted. The convenience of not having to touch any
| hardware is expensive.
| sethhochberg wrote:
| Its also important to consider how often disks will fail
| when you are operating hundreds of them - its probably more
| often than you'd think, and if you don't have someone on
| staff and nearby to your colo provider you're going to pay
| a lot in remote hands fees.
|
| Your colo facility will almost certainly have 24/7 staff on
| hand who can help you with tasks like swapping disks from a
| pile of spares, but expect to pay $300+ minimum just to get
| someone to walk over to your racks, even if the job is 10
| mins.
|
| With that said, the cost savings can still be enormous. But
| know what you're getting into.
| marcinzm wrote:
| Like another comment said, don't bother swapping out
| disks, just leave the dead ones in place and disable them
| in software. Then eventually either replace the whole
| server or get someone on site to do a mass swap of disks.
| At this scale redundancy needs to be spread between
| machines anyway so no gain in replacing disks as they
| die.
| ehosca wrote:
| https://www.aberdeeninc.com/systems/storage/san/petarack
| kfrzcode wrote:
| Totally out-of-band for this thread, but... what are the uses
| for a multi-gigabyte key?! I'm clearly unaware of some cool
| tech, any key words I can search?
| dTal wrote:
| I'm no expert but I would guess it's just a fancy word for
| "file", as in "key-value store", as opposed to a god-proof
| encryption key.
| antpls wrote:
| In this case, wouldn't the _value_ be multi gigabit, not
| the key?
| lokl wrote:
| You wrote, "data needs to be accessible at all time ... less than
| 1000ms" latency, but this does not tell the whole story about
| accessibility/latency. Does your use case allow you to do
| something similar to lazy loading, where you serve reduced
| quality images/video at low latency and only offer the full
| quality on demand/as needed with greater latency? For example,
| initially serve a reduced-resolution or reduced-length video
| instead of the full-res/full-length original, which you keep in
| colder storage at a reduced cost? Depending on the details of
| what is permissible and data characteristics, this approach might
| save you a lot overall by reducing warm storage costs.
| laurensr wrote:
| Also have a look at the Datahoarder community [1] on Reddit. Some
| people are storing astronomical amounts of data. [1]:
| https://www.reddit.com/r/DataHoarder/
| rvr_ wrote:
| Try https://min.io/ I would 100% go for it if my company was not
| a https://www.caringo.com/products/swarm customer
| jeffrallen wrote:
| On tape.
| helge9210 wrote:
| For on-premises storage (without managing storage racks and Ceph
| yourself) you can look at Infinibox
| (https://www.infinidat.com/en/products-technology/infinibox).
|
| (I'm not working there anymore, posting this just to help)
| plank_time wrote:
| Why do you need all 10PB accessible? Have you analyzed your usage
| pattern to see if you really need that much data accessible? This
| seems so unlikely and could solve most of your problems if you
| change the parameters.
| jkuria wrote:
| Always look to nature first. Nature never lies. DNA storage:
|
| Escherichia coli, for instance, has a storage density of about 10
| to the 19 bits per cubic centimeter. At that density, all the
| world's current storage needs for a year could be well met by a
| cube of DNA measuring about one meter on a side.
|
| There are several companies doing it:
| https://www.scientificamerican.com/article/dna-data-storage-...
| giantg2 wrote:
| Agree with someone else's comment questioning how is the data
| ingested and used.
|
| 10PB seems like a lot to store in S3 buckets. I assume much of
| that data is not accessed frequently or would be used in a big
| data scenario. Maybe some other services like Glacier or RedShift
| (I think).
| speedgoose wrote:
| I strongly recommend having more than one zone. A datacenter
| being offline for a while or totally burning is possible. It did
| happen a few weeks ago and a lot of companies learnt the value of
| multi zones the hard way.
| jkingsbery wrote:
| Besides what others have asked:
|
| What are your access patterns? You say "no queries need to be
| performed," but are you accessing via key-value look-ups? Or
| ranged look-ups?
|
| What do customers do with the pictures? Do customers browse
| through images and videos?
|
| You mention it's "user generated data" - how many users (order of
| magnitude)? How often is new data generated? Does the dataset
| grow, or can you evict older images/videos (so you have a moving
| window of data through time)?
|
| Besides your immediate needs, what other needs do you anticipate?
| (Will you need to do ML/Analytics work on the data in the future?
| Will you want to generate thumbnails from the existing data set?)
|
| What my experience is based on: I was formerly Senior Software
| Engineer/Principal Engineer for a team that managed reporting
| tools for internal reporting of Amazon's Retail data. The team I
| was on provides tools for accessing several years worth of
| Amazon.com's order/shipment data.
| glitchc wrote:
| Compression is always a good alternative, which is especially
| effective when modification is infrequent.
| speedgoose wrote:
| If they are a good data storage company, the data is encrypted
| so they can't compress what they already have. Perhaps they
| could compress the new incoming data client side before
| encryption to save a few bits.
| zennzei wrote:
| Wasabi storage
| ilc wrote:
| Look at the cost of moving out of the cloud carefully.
|
| Can you afford the up-front costs of the hardware needed to run
| the solutions you may want to run?
|
| Will those solutions have good enough data locality to be useful
| to you?
|
| It isn't real useful to have all your data on-site, and then you
| operations in the cloud. You've introduced many new layers that
| can fail.
|
| If you go on-prem, the solution to look at is likely Ceph.
|
| Source: Storage Software Engineer, who has spoken at SNIA SDC. I
| currently maintain a "small" 1PB ceph cluster at work.
|
| Recommendation: Get someone who knows storage and systems
| engineering to work with you on the project. Even if you decide
| not to move, understanding why is the most important part.
| sgt wrote:
| Here's an unpopular answer - don't store 10PB of data. Find a way
| for your startup to work without needlessly having to store
| insane amounts of data that will likely never be needed.
| boffinism wrote:
| They're a data storage service.
| asadlionpk wrote:
| Excellent advice for a data backup startup.
| CyberDildonics wrote:
| Doesn't this imply that they started a company without
| actually having a plan for the most fundamental part of what
| they are selling?
|
| This is like an ISP asking how they can get hooked up to the
| internet.
| asadlionpk wrote:
| No. They are already managing 10PB, planning for which
| would be very stupid when just starting up.
| CyberDildonics wrote:
| Why would planning to be able to execute the single focus
| of your startup be stupid?
| lmeyerov wrote:
| A rule of thumb for performance is every 10-100X involves
| changing up your fundamentals
|
| It's a bit different nowadays that a lot of scaling tech
| is commoditized, but still means things like negotiating
| new contracts, finding & fixing the odd pieces that
| weren't stressed before, etc.
|
| (congrats on hitting the new usage levels + good luck!
| we're at a much smaller scale, but trying to figure out
| some similar questions for stuff like web-scale
| publishing of data journalism without dying on egress $,
| so it's an interesting thread...)
| matwood wrote:
| I also tend to agree. I think AWS is great and use it as
| my default solution, but if I was starting a company that
| had high bandwidth and/or storage requirements, I would
| be looking for other solutions from day one.
| spookthesunset wrote:
| I tend to agree. If you are a storage company, I'd think
| that part of your secret sauce should be how to store
| tens or hundreds of petabytes of customer backups
| economically.
|
| Maybe I'm wrong though. Perhaps the real secret sauce is
| the end user experience and the kind of storage you use
| on the backend doesn't matter at all.
|
| However I bet that the "cloud storage space" is pretty
| crowded and lots of people shop on price more than
| anything. If your business model is all about price, then
| finding economical storage is critical to your company
| and needs to be part of your core competency.
|
| If price isn't that important, perhaps it doesn't
| matter... the "winners" would win no matter how expensive
| their storage solution is.
|
| But honestly.... I feel like part of your core competency
| needs to be managing the storage system.
| qeternity wrote:
| It's more like a company that has validated product fit now
| needing to figure out how to scale economically.
|
| Apple didn't start manufacturing with mega Foxconn
| contracts. They had to figure that out along the way as
| their scale demanded.
|
| However I share your sentiment: doing things the same way
| but cheaper is usually not the solution. Doing things
| differently (in-sourcing) might be the path forward.
| CyberDildonics wrote:
| Figuring out how to scale is not just a part of a storage
| startup, it's the whole thing.
|
| Apple created something people wanted and sold at a price
| that would still make money if it was assembled by hand.
| They didn't form a company around a commodity like data
| storage.
|
| Data storage is a commodity. Everyone already has some,
| online storage companies already exist. If you don't know
| how to store a lot of data and your company's whole
| purpose is to store a lot of data, it sounds like
| something that should have been worked out before making
| the company.
| philippb wrote:
| If you can solve that for me without affecting revenue I have
| $1m in cash for you right there.
|
| We are a photo/video storage service.
| dariusj18 wrote:
| I don't know if this is a crazy idea or if it creates
| scalability issues, but could you craft an algorithm to cold
| store data for users who do not show a need for instant
| access, and/or warm up the data when you predict it will be
| needed? Kind of like a physical logistics company would need
| to do with distributed warehousing.
|
| Sticking points I see are, 1. If you get it wrong you'll need
| some form of UX that keeps the users from getting to angry
| about it. 2. The cost of moving the data between hot/cold
| storage might make this prohibitive until a much larger
| scale. 3. User behaviors might not be predictable enough.
| lostcolony wrote:
| So from a completely evil (well, capitalist) perspective, do
| you have data on how often people retrieve backups, and at
| what 'age' they do so?
|
| Because there may be an inflection point that offering
| monetary compensation for data loss, rather than actually
| trying to store the data, would make more financial sense.
| I.e., "All data > than 2 years gets silently expunged, and
| anyone trying to retrieve it at that point gets $10 per gig
| in compensation for 'our mistake'".
|
| Please don't actually consider that though.
| lostcolony wrote:
| (A less evil approach that might still lead to reduced
| costs would be detecting old, unaccessed data of sufficient
| size and flagging it for users, with a small refund or
| service discount if they purge it. Though that assumes you
| have 'power users' who are storing massive amounts, to
| where the savings in storage costs would be worth it)
|
| (And if you don't already, I would also consider making it
| so items that are in the trash for some period of time, say
| 30 days, get deleted automatically as well, possibly with a
| reminder email a few days before)
|
| (And lastly, depending on user profiles and usage,
| incentives around reducing resolution/quality of photos and
| video, and automating that in the app as part of the sync
| process, might provide some opportunities to reduce costs
| of storage > the lost revenue of cheaper plans.)
| european321 wrote:
| What about storing the older things in some slower backup
| service that's slow to access but cheap? If the user
| eventually accesses those some day you would kick of some
| background job to get them fresh again. Not super UX
| friendly of course, but could reduce costs. Or is this
| already the standard thing to do?
| saalweachter wrote:
| "We used advanced machine learning algorithms to predict
| which users will need to retrieve which pieces of data in
| the future, and silently delete everything else."
| hansvm wrote:
| You could probably put a fun marketing spin on that.
|
| "We use ML to ensure we only store the highest quality data,
| freeing you from the chains of having too much worthless data
| and nothing to do with it."
| qeternity wrote:
| GPT-3 that convinces you why you don't need to store this
| thing
| saalweachter wrote:
| Image processing to label each image ("baby with spaghetti
| on head", "cat playing with string", "naked person"), and
| then only save one image with each label.
| offtop5 wrote:
| If AWS is what you know I'd stick with it.
|
| Changing that can be very very difficult for not much gain. Plus
| AWS skills are very easy to recruit for vs Google cloud.
| nikisweeting wrote:
| Backblaze B2, ingress and egress are free through Cloudflare, and
| it's S3 compatible. It's peanuts by comparison but I've been
| storing ~22TB on there for years and love it.
|
| Wasabi and Glacier would be my 2nd choices.
| philippb wrote:
| I've looked at them. Would love to talk to you about your usage
| and experience with them.
| gruez wrote:
| >Backblaze B2, ingress and egress are free through cloudflare
|
| AFAIK cloudflare ToS prohibits you from using it as a file
| hosting proxy. You might not run into issues if you're
| transferring a few gigabytes a month, but if you're
| transferring multiple terabytes it's just asking for trouble.
|
| edit:
|
| https://www.cloudflare.com/terms/ section 2.8 Limitation on
| Serving Non-HTML Content
| rewq4321 wrote:
| You can definitely serve way more than a few GB per month
| through Cloudflare on the free plan. I serve tens of
| terabytes a month for free. If OP needs to serve hundreds of
| terabytes per month they may get an email asking to upgrade,
| but the backblaze/Cloudflare setup would probably still be
| the cheapest. BunnyCDN is great too.
| intergalplan wrote:
| OTOH I've been told (by CloudFlare support, in contact with
| their engineers) that for their "for hosting game levels and
| other content" use case[1], any of their ordinary plans
| should be fine.
|
| I'm not... super confident in that answer, because despite
| that being _a use case they promote on the site_ the terms
| seem a bit murkier, and the page on that use-case doesn 't
| say much about which plan(s) they expect you to use (I'd have
| expected an "enterprise" plan for serving hundreds of TB of
| transfer of game-assets per month, but they said no, any
| normal plan's fine, which... I was up front with them about
| what our usage would look like, and they held that line, but
| that seems too good to be true).
|
| I haven't tested these claims yet.
|
| [1] https://www.cloudflare.com/gaming/
| segmondy wrote:
| Definitely not backblaze. If you get a signed URL it remains
| valid for 24hrs and can be used over and over. If they are
| going through a proxy, that would be different, but I imagine
| they don't want that as that doubles bandwidth cost. You
| definitely don't want your client to be able to upload all the
| data they can in your bucket.
| PLenz wrote:
| I would consider moving to my own metal and using hadoop.
| joering2 wrote:
| It would be cool to actually have a "blockchain" for something
| like this. I know the huge amount of data to be store is a niche
| market, but hear me out:
|
| Everyone that wants to make extra money can join
|
| You join with your computer hooked up to internet, a piece of
| software running in background
|
| You share % of your hard-drive and limit speed that can be used
| to upload/download
|
| When someone needs to store 100PB of data ("uploader"), they
| submit a "contract" on a blockchain - they also setup what's the
| redundancy rate, meaning how many copies need to be spread to
| guarantee consistency of data as a whole
|
| The "uploader" shares a file - the file is being chop in chunks
| and each chunk being encrypted with uploader private PHP key. The
| info re chunks are uploaded to blockchain and everyone get a
| piece. In return, all parties that keep piece of uploader data
| get paid small % either via PayPal or simply in crypto.
|
| I think that would be a cool project, but someone would have to
| do back-of-napkin number crunching if that would be profitable
| enough to data hoarders :)
| philippb wrote:
| I was hoping that someone has experience in storing data with
| FileCoin. But I think it's just to early still to bet an
| existing business on it.
| edoceo wrote:
| is this a case where GlusterFS and ZFS would work? I dont have PB
| of data, but many TBs. Gluster nodes are spread around globe, use
| ZFS for the "brick" and then the Gluster magic gives me
| distribute / replica.
|
| surprised I didn't see Gluster already in this thread. maybe its
| not for such big scale?
|
| edit: Wikipedia says " GlusterFS to scale up to several petabytes
| on commodity hardware"
| znpy wrote:
| You can buy an appliance from Cloudian and have you S3 on-premise
| and support.
|
| They're basically 100% S3-compatible.
|
| I don't know the details of their pricing, but they're production
| grade in the reald sense of the word.
|
| I am not affiliated with them in any way, but I interviewed with
| them a couple of years ago and left with a good impression.
| louwrentius wrote:
| Ceph is a beast and will require at least 2-3 technicians with
| intricate Ceph knowledge to run multiple (!) Ceph clusters in a
| business continuity responsible manner.
|
| Because you must be able to deal with Ceph quirks.
|
| If you can shard your data over multiple independent stand-alone
| ZFS boxes, that would be much simpler and more robust. But it
| might not scale like Ceph.
| skynet-9000 wrote:
| At that kind of scale, S3 makes zero sense. You should definitely
| be rolling your own.
|
| 10PB costs more than $210,000 per month at S3, or more than $12M
| after five years.
|
| RackMountPro offers a 4U server with 102 bays, similar to the
| BackBlaze servers, which fully configured with 12GB drives is
| around $11k total and stores 1.2 PB per server.
| (https://www.rackmountpro.com/product.php?pid=3154)
|
| That means that you could fit all 15TB (for erasure encoding with
| Minio) in less than two racks for around $150k up-front.
|
| Figure another $5k/mo for monthly opex as well (power, bandwidth,
| etc.)
|
| Instead of $12M spent after five years, you'd be at less than
| $500k, including traffic (also far cheaper than AWS.) Even if you
| got AWS to cut their price in half (good luck with that), you'd
| still be saving more than $5 million.
|
| Getting the data out of AWS won't be cheap, but check out the
| snowball options for that:
| https://aws.amazon.com/snowball/pricing/
| FireBeyond wrote:
| Does Snowball let you exfiltrate data from AWS? I was under the
| impression it was only for bulk ingestion.
| skynet-9000 wrote:
| First sentence on the linked page: "With AWS Snowball, you
| pay only for your use of the device and _for data transfer
| out of AWS._ "
| natch wrote:
| That wording is not inconsistent with the interpretation
| that Snowball is for in only.
| leetrout wrote:
| Wow that's up to $500,000 just to export 10PB (depending on
| region).
| canucker2016 wrote:
| According to https://aws.amazon.com/snowball/pricing/,
| egress fees depends on the region, which can range from
| $0.03/GB (North America & parts of Europe) to $0.05/GB
| (parts of Asia and Africa).
|
| So US$300K to US$500K for egress fees + cost of Snowball
| devices.
|
| The major downside of Snowball in this export case is the
| size limit of 80TB per device - from
| https://aws.amazon.com/snowball/features/ :
|
| "Snowball Edge Storage Optimized provides 80 TB of HDD
| capacity for block volumes and Amazon S3-compatible
| object storage, and 1 TB of SATA SSD for block volumes."
|
| That'd be around 125 Snowball devices to get 10PB out.
|
| If OP actually has 10PB on S3 currently, the OP may want
| to fallback to leaving the existing data on S3 and
| accessing new data in the new location.
| nicoburns wrote:
| There's also the snowmobile
| https://aws.amazon.com/snowmobile/
| canucker2016 wrote:
| The AWS Snowmobile pages only talk about migrating INTO
| AWS, not OUT OF.
|
| from https://aws.amazon.com/snowmobile/ :
|
| AWS Snowmobile is an Exabyte-scale data transfer service
| used to move extremely large amounts of data to AWS. You
| can transfer up to 100PB per Snowmobile, a 45-foot long
| ruggedized shipping container, pulled by a semi-trailer
| truck. Snowmobile makes it easy to move massive volumes
| of data to the cloud, including video libraries, image
| repositories, or even a complete data center migration.
|
| from https://aws.amazon.com/snowmobile/faqs/ :
|
| Q: What is AWS Snowmobile?
|
| AWS Snowmobile is the first exabyte-scale data migration
| service that allows you to move very large datasets from
| on-premises to AWS.
| hn_throwaway_99 wrote:
| I mean, the title on the snowmobile page says:
|
| > Migrate or transport exabyte-scale data sets into _and
| out of_ AWS
| [deleted]
| user5994461 wrote:
| You realize you can't fit 10 appliances of 4U in a rack? (A
| rack is 42U)
|
| There's network equipment and power equipment that requires
| space in the rack. There's power limitations and weight
| limitations on the rack that prevents to fill it to the brim.
| cricalix wrote:
| The thing about fitting everything in one rack, potentially, is
| vibration. There have been several studies into drive
| performance degredation from vibration, and there's noticeable
| impact in some scenarios. The Open Compute "Knox" design as
| used by Facebook spins drives up when needed, and then back
| down, though whether that's for vibration impact, I don't know
| (their cold storage use [0]).
|
| 0: https://datacenterfrontier.com/inside-facebooks-blu-ray-
| cold...
|
| https://www.dtc.umn.edu/publications/reports/2005_08.pdf
|
| https://digitalcommons.mtu.edu/cgi/viewcontent.cgi?article=1...
| pokler wrote:
| Here is Brendan Gregg showing how vibrations can affect disk
| latency:
|
| https://www.youtube.com/watch?v=tDacjrSCeq4
| darkr wrote:
| I've heard reports that minio gets slow beyond the hundreds of
| millions of objects threshold
| tinus_hn wrote:
| You are mixing up your units, with 12GB drives and 15TB in a
| rack.
| u678u wrote:
| Sounds like a standard business problem, make a spec and get the
| main 20 cloud providers to submit bids.
| ktpsns wrote:
| I have worked in HPC (academia) where the cluster storage size is
| measured in multiples of PB since a decade. Since latency and
| bandwidth is a killer requirement there, Infiniband (instead of
| Ethernet) is the defacto standard for connecting the storage
| pools to the computing nodes.
|
| Maintaining such a (storage) cluster requires 1-2 people on site
| which replace a few hard disks every day.
|
| Nevertheless, when I would continously need massive amount of
| data, I would opt in doing it myself anytime instead of cloud
| services. I just know how well these clusters run and there is
| little to no saving when outsourcing it.
| glbrew wrote:
| Since he needs 1000ms response on storage isn't ethernet the
| better option? It can reach 400gb/s on fastest hardware now. I
| thought Infiniband was only reasonable to use when machines
| need to quickly access other machines primary memory. I would
| like to know if I'm wrong about this though.
| garciasn wrote:
| In my opinion, knowing what you're planning to do w/the data once
| it's stored is the important piece to giving you some idea of
| where to put it.
| philippb wrote:
| Good point. I updated the post with some more infos
| canadianfella wrote:
| Info
| x0x0 wrote:
| What is your loss tolerance? If a file is gone, who is
| annoyed: a free user, a $50/year customer, or a $10k/year
| customer?
|
| Are these files WORM?
| boringg wrote:
| Agreed - though I feel like every data use comes after the
| fact. Original software engineers/developpers rarely have the
| foresight that the data scientists need the information for (at
| least imho).
| lostcolony wrote:
| To be fair, the data scientists rarely have the foresight to
| know what the data scientists need the information for. The
| only time I've seen a data scientist correctly include all
| the data they needed (but still be wrong) was when they
| answered "All of it. We need all of the data".
| boringg wrote:
| So true. Tough to know in advance which data will hold the
| secrets.
| monkeybutton wrote:
| And you can't build a time machine to go and get it once
| you do know. Want X days of historical data for
| training/backtesting and we just implemented the metric
| this sprint? Good luck meeting your deadline!
| coverband wrote:
| Have you looked into Backblaze? They're a lot cheaper than Amazon
| and have S3-compatible APIs.
| gigatexal wrote:
| Latency being time to first byte downloaded I'd still store this
| in cloud somewhere so that the really "hot" images/videos could
| be cached in a cloudfront CDN or something.
|
| Also this is a startup, no? A million or so in storage so you
| need not preoccupy your startup with having to deal with failing
| disks, disk provisioning, collocation costs, etc. etc. not to
| mention the 11 9s of durability you get with S3, to me it just
| makes the most sense to do this on the cloud.
| howeyc wrote:
| If you want to stick with cloud, then stick with what you're
| doing or migrate to a cheaper alternative like wasabi, backblaze,
| etc.
|
| If you're not afraid of having a few operations people on staff
| and running a few racks in multiple data centers, then buy a
| bunch of drives and servers and install something to expose
| everything via S3 interface (Ceph, Minio, ...) so none of your
| tools have to change.
| segmondy wrote:
| I think they either stick to S3 or run their own DC with Minio
| in front. BB as I mentioned in another comment will be a bad
| idea due to the poor S3 compatible interface, See -
| https://www.backblaze.com/b2/docs/b2_get_upload_url.html Wasabi
| might be fine, but don't know if they can handle 10PB.
| ecesena wrote:
| S3 + Glacier. For data you're accessing via Spark/Presto/Hive I
| believe Parquet is a good format. At your scale AWS should prob
| provide discounts, worth connecting w/ an account rep.
|
| I'd recommend reaching out to some data eng in the various Bigs,
| they certainly have more clear numbers. Happy to make an intro if
| you need, feel free to dm me.
| silviot wrote:
| I think if I _had_ to decide (I'm not the best informed person on
| the matter) I'd lean towards leofs[1].
|
| I only read about it, but never used it.
|
| It advertises itself as exabyte scalable and provides s3 and nfs
| access.
|
| [1] https://leo-project.net/leofs/
| hamburga wrote:
| Meta-question: shouldn't there be a website dedicated
| specifically to reliable, crowd-sourced answers to questions like
| these? Does it really not exist? I'm thinking like StackShare,
| but you start from "What's the problem I'm trying to solve?", not
| "What products are big companies using?".
| killingtime74 wrote:
| Yes it's called hacker news
| nknealk wrote:
| How firm are your "less than 1000ms" requirements. Could you
| identify a subset of your images/videos that are very unlikely to
| ever be accessed and move those to s3 glacier and price in that
| some fractional percentage will require expedited retrieval
| costs?
| zmmmmm wrote:
| My only comment is that I have a hard time reconciling these two
| statements:
|
| > downloaded in the background to a mobile phone
|
| and
|
| > but less than 1000ms required
|
| I'm struggling to think of what kind of application needs data
| access _in the background_ with latency of less than 1000ms. That
| would normally be for interactive use of some kind.
|
| Getting to 1 min access time would get you into the S3 glacier
| territory ... you will obviously have considered this but I feel
| like some really hard scrutiny on requirements could be critical
| here. With intelligent tiering and smart software you might make
| a near order of magnitude difference in cost and lose almost no
| user-perceptible functionality.
| ignoramous wrote:
| In cloud:
|
| Wasabi's _Reserved Capacity Storage_ is likely to be the
| cheapest: https://wasabi.com/rcs/
|
| If you front it with Cloudflare, egress would be close to free
| given both these companies are part of the _Bandwidth Alliance_ :
| https://www.cloudflare.com/bandwidth-alliance/
|
| Cloudflare has an images product in closed beta, but that is
| likely unnecessary and probably expensive for your usecase:
| https://blog.cloudflare.com/announcing-cloudflare-images-bet...
|
| --
|
| If you're curious still, take a look at Facebook's F4 (generic
| blob store) and Haystack (for IO bound image workloads) designs:
| https://archive.is/49GUM
| msoad wrote:
| Having dealt with a lot of big data I often came to realization
| that we actually did not need most of it.
|
| Try being intentional and smart in front of your data pipeline
| and purge data that is not useful. Too many times people store
| data "just in case" and that case never happens years later.
| johngalt wrote:
| 900 LTO-U8 tapes
| exdsq wrote:
| RAM?
| anonu wrote:
| Tape is still very cost effective. Load latency might be a few
| minutes though
| blueteeth wrote:
| Lol. A stick of 256gb RAM costs ~$3000. 1TB needs 4. 1PB needs
| 4000. 10PB needs 40,000. So this would be an upfront cost of
| $120M.
|
| And this doesn't even cover how you'd fit 40,000 sticks of RAM
| together.
| bombcar wrote:
| 10PB of RAM being only $120M blows my mind, to be honest. I
| would have guessed that price was closer to the SSD cost for
| 10PB.
| nicoburns wrote:
| This made me curious about what the SSD cost would be. It
| looks like you can get a 2TB SSD for $200. So that's
| $100/TB = $1M for 10PB. Of course prices may be higher for
| enterprise SSDs and you may need redundancy. Then again,
| you could probably get a bulk discount at that scale.
| byteshock wrote:
| Wasabi is a good option. They're S3 compatible and don't charge
| any egress or ingress fees. Been using them for a few years.
| Great speeds and customer support.
| tejohnso wrote:
| I see from their home page they do not charge for egress, but
| the FAQ clarifies this is only valid if your monthly egress
| total is less than or equal to your storage total, otherwise
| they suspend your service. Should be clarified on the home page
| in my opinion. At least with an asterisk beside "No egress
| charges".
| jagger27 wrote:
| 10PB with their pricing calculator comes out to over
| $60,000/mo. Feels like a lot.
|
| edit: perhaps their RCS option would be cheaper if you know
| exactly how much data you need to store in advance.
| reversengineer wrote:
| To be fair, purchasing and hosting even the most basic
| mirrored RAID array of that scale comes to well over half a
| million for the disks alone. Then you need to manage them.
| daper wrote:
| 10x Supermicro SSG-6049P-E1CR60H servers (60 x 3,5" HDD in
| 4U enclosure) - $5k each
|
| 600x WESTERN DIGITAL Ultrastar DC HC550 18TB (10800PB in
| total) - $500 each
|
| ~$350k in hardware, up to 20kW energy consumption, should
| fit in two rack towers. You can host it for about $1.5k
| somewhere. All assuming no redundancy :)
| spookthesunset wrote:
| Don't forget labor. You need to find talent to manage
| your little data center. And deal with it when it shits
| the bed at 4:12am on Christmas morning.
|
| So toss in at least one SRE type person. Say $200k/year.
|
| Since you only have one, they are gonna be on call 24/7,
| so assume you'll burn them out after a year and a half
| and need to hire a new one....
|
| Since redundancy is a thing, double that $350k. And 10pb
| is what they have now so double it again for 20pb. Add in
| $10k per rack for switches, routers, wires, etc.
|
| So probably you are looking at a million dollars of
| capital plus labor to actually execute on this. And don't
| forget the lead time might be a month to get the hardware
| and a week or two to install it. Plus all the
| configuration management that needs to be built up. Not
| to mention monitoring. So maybe a quarter of work just to
| have it functional.
|
| I haven't even factored in opportunity costs. What could
| this business be doing that adds more value than building
| out a little data center?
|
| I dunno. Maybe it _does_ make sense to manage your own
| hardware. But it helps to calculate the entire cost of
| ownership, not just the cost of the servers.
| philippb wrote:
| Thanks for laying this out. Never rented rack space my
| self
| skynet-9000 wrote:
| 10 petabytes at AWS is $210,000 per month just for
| storage (even excluding AWS's very high egress and
| transaction pricing), so even $1M (which seems like a
| high estimate indeed) would be amortized in less than six
| months.
|
| Also, the hardware can be depreciated, which reduces its
| net (of taxes) cost dramatically over time.
|
| Five years (probably the useful life of the equipment in
| general) of $210,000 per month is $12.6M. That's a lot of
| savings.
| user5994461 wrote:
| At this scale you should be able to negotiate with AWS
| and get a deal better than the listed price.
|
| Regarding accounting, the AWS monthly charges are also
| net of taxes so it makes no difference.
| daper wrote:
| You're right. I'm wasn't really serious. Since I'm in the
| middle of calculating costs of own servers in rented
| racks in Poland (you're right labor is more difficult
| than hardware) let me imagine the rest of the
| infrastructure (probably not all) for this "projects",
| just for fun:
|
| - network switch Juniper EX4600 (10Gbps ports) + 3rd
| party optics ~$11k
|
| - cheap 1Gbps switch for management access <$1k
|
| - some router for VPN for management network - $500
|
| - 1Gbps (not guaranteed) internet access with few IPs
| ~$350 / month
|
| - 100Mbps low traffic internet access for the
| management/OOB network.
|
| Time to get the hardware - 2 months. Time to rent and
| install hardware in rack - about 1 month. I don't count
| configuring the software.
|
| This setup is full of single points of failure so I would
| consider it one "region" and use something like CEPH +
| some spare servers in each "region". That way you don't
| need to react immediately to hardware failures. Just send
| a box of hardware from time to time to the DC and use
| ~$20-40h/h remote hands service to replace the failed
| drives or whole servers. You could also buy on-site
| service from the hardware vendor for 1-3 years adding
| some cost.
|
| I think the most important thing would be have a cleaver
| person who design a fault tolerant system, automatic
| failover, good monitoring and alerting so that any on-
| call and maintenance job is easy and based on procedures.
| That way you could outsource it. Only then it might have
| some sense.
| Dylan16807 wrote:
| > Since you only have one, they are gonna be on call
| 24/7, so assume you'll burn them out after a year and a
| half and need to hire a new one....
|
| This person's entire job is managing a few racks of hard
| drives? How often do you think they're actually going to
| get called in?
|
| > Since redundancy is a thing, double that $350k.
|
| True, but you can do redundancy for cheaper with parity
| or tape.
|
| > And 10pb is what they have now so double it again for
| 20pb.
|
| > So probably you are looking at a million dollars of
| capital plus labor to actually execute on this.
|
| You can go a couple PB at a time if the upfront cost is
| daunting.
|
| > Add in $10k per rack for switches, routers, wires, etc.
|
| Yep, though that's not very much in comparison.
|
| > Plus all the configuration management that needs to be
| built up. Not to mention monitoring. So maybe a quarter
| of work just to have it functional.
|
| This is the one I'd really worry about.
|
| > I haven't even factored in opportunity costs. What
| could this business be doing that adds more value than
| building out a little data center?
|
| You always have to keep opportunity costs in mind, but
| something like this can pay for itself in under a year if
| there's significant bandwidth cost too, and that's an
| amazing ROI.
| user5994461 wrote:
| >>> This person's entire job is managing a few racks of
| hard drives? How often do you think they're actually
| going to get called in?
|
| How about every day?
|
| Quick guess how often disks need to be replaced when
| there are thousands of them. ;)
| Dylan16807 wrote:
| You can replace disks once a month or less. That's not an
| on-call thing, even if you do make your $200k admin do
| that grunt work.
|
| Also for one or two thousand disks I would expect less
| than one failure per week.
| spookthesunset wrote:
| > How often do you think they're actually going to get
| called in?
|
| Not often. But the server gods are a cruel mistress and
| it will definitely shit the bed when you are on your
| honeymoon, or maybe the day after your first kid is born.
| yoz-y wrote:
| > True, but you can do redundancy for cheaper with parity
| or tape.
|
| At this volumes you probably do want a carbon copy at
| another site to mitigate disasters like datacenter fires.
| hemmert wrote:
| I happen to own exa-byte.com, in case you need a domain for it
| ;-)
|
| (In 1998, in school, I looked up in our math book what would come
| after mega, giga... 20 years later, just as fresh and useless as
| on day one ;))
| verdverm wrote:
| It definitely depends on how you accumulate and the usage
| patterns. More clarity is needed there to make recommendations.
|
| As an aside, you can often get nice credits for moving off of AWS
| to Azure or GCP. I recommend the later.
| amacneil wrote:
| At that level of data you should be negotiating with the 3
| largest cloud providers, and going with whoever gives you the
| best deal. You can negotiate the storage costs and also egress.
| brudgers wrote:
| Is the storage of the data critical to the future growth of the
| business?
| [deleted]
| nixgeek wrote:
| What happens to your business if you lose this data?
| [deleted]
| timr wrote:
| At this scale, there's no one perfect answer. You need to
| consider your usage patterns, business needs, etc.
|
| Is the data cold storage, that is rarely accessed? Is it OK to
| risk losing a percentage of it? Can you identify that percentage?
| If it's actively utilized, is it _all_ used, or just a subset?
| Which subset? How much data is added every day? How much is
| deleted? What are the I /O patterns?
|
| Etc.
|
| I have direct experience moving big cloud datasets to on-site
| storage (in my case, RAID arrays), but it was a situation where
| the data had a long-tail usage pattern, and it didn't really
| matter if some was lost. YMMV.
| xnx wrote:
| Off topic, but I'm shocked that anyone would trust uploading
| sensitive files (e.g. nudes) to this service. Photo vault type
| apps can be useful, but I would never want the content in those
| apps to upload to a small service like this based on their word
| that employees won't go through it.
| DSingularity wrote:
| Amazing how one post will tell you that, at your scale, S3 is
| stupid and other posts will tell you that at your not-small-
| enough-and-yet not-big-enough scale S3 is the only option. I say
| stick with cloud. If cost is an issue go negotiate a better
| contract -- GCP will probably give you a nice discount. Setting
| up a highly available service at that scale is not a walk in the
| park. Can you afford the distractions from your primary app while
| you figure it out?
| throwaway823882 wrote:
| 1. Shrink your data. That's just an absurd amount of data for a
| start-up. Even large organizations can't quickly work around too
| much data. Resource growth directly affects system performance
| and complexity and limits what you will be able to practically do
| with the data. You already have a million problems as a start-up,
| don't make another one for yourself by trying to find a clever
| solution when you can just get rid of the problem.
|
| 2. As a general-purpose alternative, I would use Backblaze. It's
| cheap and they know what they're doing. Here is a comparison of
| (non-personal) cloud vendor storage prices:
| https://gist.github.com/peterwwillis/83a4636476f01852dc2b670...
|
| 3. You need to know how the architecture impacts the storage
| costs. There are costs for incoming traffic, outgoing traffic,
| intra-zone traffic, storage costs, archive costs, 'access' costs
| (cost per GET | POST | etc). You may end up paying $500K a month
| just to serve files smaller than 1KB.
|
| 4. You need to match up availability and performance requirements
| against providers' guarantees, and then measure a real-world
| performance test over a month. Some providers enforce rate
| limits, with others you might be in a shared pool of rate limits.
|
| 5. You need to verify the logistics for backup and restore. For
| 10PB you're gonna need an option to mail physical drives/tapes.
| Ensure that process works if you want to keep the data around.
|
| 6. Don't become your own storage provider. Unless you have a ton
| of time and money and engineering talent to waste and don't want
| to ship a reliable product soon.
| acd wrote:
| Using Erasure coding.
| Icer5k wrote:
| As others have said, it's a complicated question, but if you have
| the resources/wherewithal to run Ceph but don't want to deal with
| co-location, you can get a bunch of storage servers from Hetzner
| and get a much better grasp on cost over S3.
|
| For example, at 10PB with every object duplicated twice (so 20 PB
| raw storage), you'd need ~90 of their SX293[1] boxes, coming out
| to around EUR30k/mo. This doesn't include time to
| configure/maintain on your end, but it does cover any costs
| associated with drive replacement for failure.
|
| I've done similar setups for cheap video storage & CDN origin
| systems before, and it's worked fairly well if you're cost
| conscious.
|
| [1] https://www.hetzner.com/dedicated-
| rootserver/sx293/configura...
| skynet-9000 wrote:
| Buying just one of these looks pretty challenging, let alone
| ~90. :(
| killingtime74 wrote:
| You would probably pick up the phone to buy 90
| pmlnr wrote:
| Non-cloud:
|
| HPE sells their Apollo 4000[^1] line, which takes 60x3.5" drives
| - with 16TB drives, that's 960TB each machine, one rack of 10 of
| these is 9PB+ therefore, which nearly covers your 10PB needs. (We
| have some racks like this). They are not cheap. (Note: Quanta
| makes servers that can take 108x3.5" drive, but they need special
| deep racks.)
|
| The problem here would be the "filesystem" (read: the distributed
| service): I don't have much experience with Ceph, and ZFS across
| multiple machines is nasty as far as I'm aware, but I could be
| wrong. HDFS would work, but the latency can be completely random
| there.
|
| [^1]: https://www.hpe.com/uk/en/storage/apollo-4000.html
|
| So unless you are desperate to save money in the long run, stick
| to the cloud, and let someone else sweat about the filesystem
| level issues :)
|
| EDIT: btw, we let the dead drives "rot": replacing them would
| cost more, and the failure rate is not that bad, so they stay in
| the machine, and we disable them in fstabs, configs, etc.
|
| EDIT2: at 10PB HDFS would be happy; buy 3 racks of those apollos,
| and you're done. We started struggling at 1000+ nodes first; now,
| with 2400 nodes, nearly 250PB raw capacity, and literally a
| billion filesystem objects, we are slow as f*, so plan carefully.
| monstrado wrote:
| MinIO is an option as well and would allow you to transition
| from testing in S3 to your own MinIO cluster seamlessly.
| dividedbyzero wrote:
| Does it scale that far?
| secabeen wrote:
| You can also get units like this direct from Western
| Digital/HGST. We have a system with 3 of their 4U60 units, and
| they weren't all that expensive. Ordering direct from HGST, we
| only paid a small premium on top of the cost of the SAS drives.
| more_corn wrote:
| Don't buy HPE gear. Qualify the gear with sample units from a
| few competing vendors and you'll see why.
| siavosh wrote:
| Not sure the state of some of the decentralized solutions...
| Dylan16807 wrote:
| > Should have mentioned earlier, data needs to be accessible at
| all time. It's user generated data that is downloaded in the
| background to a mobile phone, so super low latency is not
| important, but less than 1000ms required.
|
| > The data is all images and videos, and no queries need to be
| performed on the data.
|
| Okay, this is a good start, but there are some other important
| factors.
|
| For every PB of data, how much bandwidth is used in a month, and
| what percentage of the data is actually accessed?
|
| Annoyingly, the services that have the best warm/"cold" storage
| offerings also tend to be the services that overcharge the most
| for bandwidth.
| qeternity wrote:
| Are you fundamentally a data storage business or are you another
| business that happens to store a tremendous amount of data?
|
| If it's the former, then investing in-house might make sense (a
| la Dropbox's reverse course).
| lostcolony wrote:
| He's the CTO of KeepSafe.
| qeternity wrote:
| Ok, so it seems that they are indeed a data storage company.
| philippb wrote:
| We are.
| saalweachter wrote:
| Do you want to grow your business by finding other ways
| to sell people storage, or by adding features to your app
| that may or may not require any additional storage to
| develop?
| peterthehacker wrote:
| Can you elaborate on what the >10PB of data is and why it's
| important to your startup? Is it archived customer data, like
| backups? Or is it data purchased from vendors for analysis and
| ML?
| philippb wrote:
| See updated question. Thanks for asking
| jtchang wrote:
| I would host in a datacenter of your choice and do a cross
| connect into AWS: https://aws.amazon.com/directconnect/pricing/
|
| This allows you to read the data into AWS instances at no cost
| and process it as needed since there is 0 cost for ingress into
| AWS. I have some experience with this (hosting using Equinix)
| philippb wrote:
| Thanks for the pointer. Never thought about this as an option.
| Great stuff!!!
| pickle-wizard wrote:
| I had a similar problem at a past job. Though we only had a
| PB of data. We used a products called SwiftStack. It is open
| source, but they have paid support. I recommend getting
| support, as their support is really good. It is an object
| store like S3, but it has its own API. Though I think they
| now have an S3 compatible gateway now.
|
| We had about 25 Dell R730xd servers. When the cluster would
| start to fill up, we would just replace drives with larger
| drives. Upgrading drives with SwiftStack is a piece of cake.
| When I left we were upgrading to 10TB drives as that was the
| best pricing. We didn't buy the drives from Dell as they were
| crazy expensive. We just bought drives from Amazon/New Egg,
| and kept some spares onsite. We got a better warranty that
| way too. Dell only had a 1 year warranty, but the drives we
| were buying had a 5 year warranty.
| TechBro8615 wrote:
| I'm not an AWS pricing expert, but you should be aware you're
| still on the hook for S3 requests even if you can get out of
| paying for bandwidth. Is AWS direct connect a pure peering
| arrangement? I wonder what their requirements are for that.
| Guess I'll read the link :)
|
| Idk what your team's expertise is, but I'd advise avoiding
| the cloud as long as possible. If you can build out an on-
| premise infrastructure, it will be a huge competitive
| advantage for your company because it will allow you to offer
| features that your competitors can't.
|
| Examples of this:
|
| - Cloudflare built up their own network and infrastructure
| and it's always been their biggest asset. They set the
| standard for free tier of CDN pricing, and nobody who builds
| a CDN on top of an existing cloud provider will ever beat it.
|
| - Zoom. By hosting their own servers and network, Zoom is
| similarly able to offer a free tier where they are not
| subject to variable costs from free customers losing them
| money on bandwidth charges.
|
| - WhatsApp. They scaled to hundreds of millions of users with
| less than a dozen engineers, a few dozen (?) servers, and
| some Erlang code.
|
| IMO defaulting to the cloud is one of the worst mistakes a
| young company can make. If your app is not business critical,
| you can probably afford up to a day of downtime or even some
| data loss. And that is unlikely to happen anyway, as long as
| you've got a capable team looking after it who chooses
| standard and robust software.
| lamontcg wrote:
| > as long as you've got a capable team looking after it who
| chooses standard and robust software.
|
| And cheap.
|
| If you put people in charge who are looking for ways of
| expanding their empire and budget through spending money on
| EMC/VMWare/Oracle/etc/etc then you can quickly wind up
| spending a lot more money.
|
| Simplistic network designs, simplistic server designs,
| simplistic storage designs with mostly open source software
| used everywhere can be highly competitive with Cloud
| services.
|
| Mostly all that Amazon did to create AWS/EC2 was to fire
| anyone who said words like SAN or EMC and do everything
| very cheaply using open source software, and evolved away
| from Enterprise vendors and towards commodity hardware.
|
| If you make "frugality" a core competency in your
| datacenter design like Amazon did, then you can easily beat
| the cloud.
|
| You also need to have [dev]ops people who are inclined to
| say "yes" to the business and who know how to debug things
| and can operate independently of needing to phone up EMC.
| derefr wrote:
| > fire anyone who said words like SAN
|
| Is EBS not, itself, a SAN?
| throwaway823882 wrote:
| I run cloud infra for a living. Have been managing
| infrastructure for 20 years. I would never for one second
| consider building my own hosting for a start-up. It would
| be like a grocery delivery company starting their own farm
| because seeds are cheap.
| TechBro8615 wrote:
| Depends what you're doing I suppose. I think the three
| companies I mentioned (CloudFlare, Zoom and WhatsApp) are
| good examples of infrastructure investment as a
| competitive advantage.
| derefr wrote:
| None of those are _start-ups_ , though. They've either
| IPOed (CloudFlare, Zoom) or been acquired by publicly-
| traded companies (WhatsApp).
|
| A startup is a company that might still need to pivot to
| find its final business model, potentially shedding its
| entire existing infrastructure base in the process.
| Start-ups are why IaaS providers don't default to
| instance reservations -- because, as a startup, you might
| suddenly realize that you won't be needing that $10k/hr
| of compute, but rather $10k/hr of something else.
| throwaway823882 wrote:
| Or suppose you run the most successful/profitable Fantasy
| Sports League start-up on the internet (used to work for
| 'em) and host your own gear. Every year you have to
| analyze trends in use and predict future load, to build
| the capital needed to buy all new racks of servers every
| 2-3 years, pay for all the IT staff, datacenter costs.
|
| That was before the cloud existed. They had to poach
| experts from hosting companies to build and maintain
| their gear. They built a 24/7 NOC, did server repair,
| became network experts, storage experts, database
| experts. Besides being incredibly complex and burdensome,
| it was financially risky. If they missed their
| projections they could over-invest by 1-2 million bucks,
| or even worse, not have the capacity needed to meet
| demand.
|
| If somebody told us back then that we could pay a premium
| to be able to scale at any time as much as we needed,
| when we needed it? We would have flipped out. We had
| heard about Amazon building some kind of "grid computing"
| thing, but it seemed like a pipe dream for universities,
| like parallel computing. Turns out it was a different
| kind of grid.
| maestroia wrote:
| There are four hidden costs which not many have touched upon.
|
| 1) Staff You'll need at least one, maybe two, to build, operate,
| and maintain any self-hosted solution. A quick peek on Glassdoor
| and Salary show the unloaded salary for a Storage Engineer runs
| $92,000-130,000 US. Multiply by 1.25-1.4 for loaded cost of an
| employee (things like FICA, insurance, laptop, facilities, etc).
| Storage Administrators run lower, but still around $70K US
| unloaded. Point is, you'll be paying around $100K+/year per
| storage staff position.
|
| 2) Facilities (HVAC, electrical, floor loading, etc) If you host
| on-site (not hosting facility), you'd better make certain your
| physical facilities can handle it. Can your HVAC handle the
| cooling, or will you need to upgrade it? What about your
| electrical? Can you get the increased electrical in your area?
| How much will your UPS and generator cost? Can the physical
| structure of the building (floor loading, etc) handle the weight
| of racks and hundreds of drives, the vibration of mechanical
| drives, the air cycling?
|
| 3) Disaster Recovery/Business Continuity Since you're using S3
| One Zone IA, you have no multi-zone duplicated redundancy. It's
| use case is for secondary backup storage for data, not the
| primary data store for running a startup. When there is an
| outage/failure (and it will happen), the startup may be toast,
| and investors none too happy. So this is another expense you're
| going to have to seriously consider, whether you stick with S3 or
| roll-your-own.
|
| 4) Cost of money With rolling-your-own, you're going to be doing
| CAPEX and OPEX. How much upfront and ongoing CAPEX can the
| startup handle? Would the depreciation on storage assets be
| helpful financially? You really need to talk to the CPA/finance
| person before this. There may be better tax and financial
| benefits by staying on S3 (OPEX). Or not.
|
| Good luck.
| dublin wrote:
| Actual answer: There is almost NO company that really needs that
| much data. This has mostly just become a pissing match. In
| general, companies ( _especially_ startups) are way better off
| making sure they have a small amount of high-quality, accurate,
| data than a huge pile-o-dung that they think they 're going to
| use magical AI/ML pixie dust to do something with.
|
| That said, if you really think you _must_ , spend effort on good
| deduping/transcoding (relatively easy with images/video), and
| consider some far lower-cost storage options than S3, which is
| pretty pricey no matter what you do. If S3 is a good fit, I hear
| good things about Wasabi, but haven't used it myself.
|
| If you have the technical ability (non-trivial, you need someone
| who _really_ understands, disk and system I /O, RAID Controllers,
| PCI lane optimization, SAN protocols and network performance (not
| just IP), etc.) and the wherewithal to invest, then putting this
| on good hardware with something like say, ZFS at your site or a
| good co-lo will be WAY cheaper and probably offer higher
| performance than any other option, especially combined with
| serious deduping. (Look carefully at everything that comes in
| _once_ and you never have to do it again.) Also, keep in mind
| that even-numbered RAID levels can make more sense for video
| streaming, if that 's a big part of the mix.
|
| The MAIN thing: Keep in mind that understanding your data flows
| is _way_ more important than just "designing for scale". And
| _really_ try to not need so much data in the first place.
|
| (Aside: I'm was cofounder and chief technologist of one of the
| first onsite storage service providers - we built a screamer of a
| storage system that was 3-4x as fast, and scaled 10x larger than
| IBM's fastest Shark array, at less than 10% of the cost. The bad
| news - we were planning to launch the week of 9/11 and, as self-
| funded, ran out of money before the economy came back. The system
| kicked ass, though.)
| SrslyJosh wrote:
| > no queries need to be performed on the data.
|
| cat >/dev/null, obviously. ;-)
| bonoboTP wrote:
| Check whether you really need 10 PB or you can make do with
| several orders of magnitude less. I wouldn't be surprised if it
| was some sort of perverse incentive CV building thing, like
| engineers building a Kubernetes cluster for every tiny thing. If
| you really do need 10 PB, then still you probably should check
| again because you probably don't need 10 PB.
| staticassertion wrote:
| It's going to depend entirely on a number of factors.
|
| How are you storing this data? Is it tons of small objects, or a
| smaller number of massive objects?
|
| If you can aggregate the small objects into larger ones, can you
| compress them? Is this 10PB compressed or not? If this is video
| or photo data, compression won't buy you nearly as much. If you
| have to access small bits of data, and this data isn't something
| like Parquet or JSON, S3 won't be a good fit.
|
| Will you access this data for analytics purposes? If so, S3 has
| querying functionality like Athena and S3 Select. If it's instead
| for serving small files, S3 may not be a good fit.
|
| Really, at PB scale these questions are all critically important
| and any one of them completely changes the article. There is no
| easy "store PB of data" architecture, you're going to need to
| optimize heavily for your specific use case.
| philippb wrote:
| Great question. I updated the original post. It's user
| generated images and videos. We download those to the phones in
| the background.
|
| We don't touch the data at all.
| staticassertion wrote:
| > Update: Should have mentioned earlier, data needs to be
| accessible at all time. It's user generated data that is
| downloaded in the background to a mobile phone, so super low
| latency is not important, but less than 1000ms required.
|
| > The data is all images and videos, and no queries need to
| be performed on the data.
|
| OK, so this definitely helps a bit.
|
| At 10PB my assumption is that storage costs are the major
| thing to optimize for. Compression is an obvious must, but as
| it's image and video you're going to have some trouble there.
|
| Aggregation where you can is probably a good idea - like if a
| user has a photo album, it might make sense to store all of
| those photos together, compressed, and then store an index of
| photo ID to album. Deduplication is another thing to consider
| architecting for - if the user has the same photo, across N
| albums, you should ensure it's only stored the one time.
| Depending on what you expect to be more or less common this
| will change your approach a lot.
|
| Of course, you want to avoid mutating objects in S3 too - so
| an external index to track all of this will be important. You
| don't want to have to pull from S3 just to determine that
| your data was never there. You can also store object metadata
| and query that first.
|
| AFAIK S3 is the cheapest way to store a huge amount of data
| other than running your own custom hardware. I don't think
| you're at that scale yet.
|
| Latency is probably an easy one. Just don't use Glacier,
| basically, or use it sparingly for data that is extremely
| rare to access ie: if you back up disabled user accounts in
| case they come back or something like that.
|
| I think this'll be less of a "do we use S3 or XYZ" and more
| of a "how do we organize our data so that we can compress as
| much of it together, deduplicate as much of it as possible,
| and access the least bytes necessary".
| VHRanger wrote:
| Isn't Backblaze B2 cheaper than S3?
| staticassertion wrote:
| Yeah, I guess I shouldn't say S3 is the cheapest option
| there, I was thinking 'In AWS' but Backblaze is cheaper.
| pkb wrote:
| 1) For hardware you want cheap, expendable, bare metal. Look up
| posts about how Google built their own servers for reference. 2)
| For RAID, go with software only RAID. You will sidestep problems
| caused by hardware RAID controllers having custom data format
| each (i.e. non-swapable for different model/make). 3) For
| filesystem, look for OpenAFS. CERN is using OpenAFS to store
| petabytes of data from LHC. 4) For operating system, look at
| Debian. Coupled with FAI (fully automated installation), it will
| enable you to deploy multiple servers in an automated way, to
| host your files.
| ufmace wrote:
| 10PB is a crazy amount of data. Far more than any normal business
| would ever have to deal with. Presuming you aren't crazy, you
| must have an unusual business plan to legitimately need to handle
| that much data. That means it's tough for us to say much - any
| assumptions we might have about it could be invalid depending on
| your actual business needs. You're just going to have to tell us
| some more about your business case before we can say anything
| useful about it.
| notyourday wrote:
| Unless they do video. In that case 10PB is not much at all.
| erulabs wrote:
| I'd go with Ceph and dedicated hardware. Something like Hetzner
| or Datapacket, or built it yourself and go big with something
| like SoftIron. We've built and maintain a number of these types
| of clusters - using S3 compatible APIs (CephObjectStore).
| SoftIron is probably overkill but good lord is it fun to play
| with that much thruput!
|
| If you're looking for a partner/consultant to get things going,
| feel free to reach out! This stuff is sort of our wheelhouse, as
| me and my co-founder were previously Ops at Imgur, you can
| imagine the kinds of image hosting problems we've seen :P
___________________________________________________________________
(page generated 2021-04-23 23:01 UTC)