https://randomoracle.wordpress.com/2019/12/07/filecoin-storj-and-the-problem-with-decentralized-storage/ Random Oracle Building and breaking systems Menu Widgets Search Skip to content * Home * About Standard Disclaimer The opinions and views expressed here are my own, and do not reflect those of my employer. December 2019 M T W T F S S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 << Nov Jan >> Search for: [ ] [Search] * Filecoin, StorJ and the problem with decentralized storage (part I) * Filecoin, StorJ and the problem with decentralized storage (part II) * CVV1, CVV2, CVV3: Demystifying credit card data (1/2) * About * Smart-cards vs USB tokens: optimizing for logical access (part II) * Extracting OTP seeds from Authy * Trading cryptocurrency without trusted third-parties (part I) * Using cloud services as glorified drive: virtual disks (part II) * Off-by-one: the curious case of 2047-bit RSA keys * CVV3: Demystifying credit card verification (part 2) Search for: [ ] [Search] Filecoin, StorJ and the problem with decentralized storage (part I) Blockchains for everything Decentralized storage services such as Filecoin and StorJ seek to disrupt the data-storage industry, using blockchain tokens to create a competitive marketplace that can offer more space at lower cost. They also promise to bring a veneer of legitimacy to the Initial Coin Offering (ICO) space. At a time when ICOs were being mass-produced as thinly-veiled, speculative investment vehicles that are likely to run afoul of the Howey test as unregistered securities, file-storage looks like a shining example of an actual utility tokens, for having some utility. Instead of betting on the "greater fool" theory of offloading the token on the next person willing to pay a higher price, these tokens are good for a useful service: paying someone else to store your backups. This blog post looks at some caveats and overlooked problems in the design. Red-herring: privacy A good starting point is to dispel the alleged privacy advantage. Decentralized storage system often tout their privacy advantage: data is stored encrypted by its owner, such that the storage provider can not read it even if they wanted to. That may seem like an improvement over the current low bar which relies on service providers swearing on a stack of pre-IPO shares that, pinky-promise, that they never not dip into customer data for business advantage, a promise more often honored in the breach as the examples of Facebook and Google repeatedly demonstrate. But there is no reason to fundamentally alter the data-storage model to achieve E2E security against rogue providers. While far from being the path of least resistance, there is a long history of alternative remote backup services such as tarsnap for privacy-conscious users. (All 17 of them.) Previous blog posts here have demonstrated that it is possible to implement bring-your-own-encryption with vanilla cloud storage services such as AWS such that the cloud service is a glorified remote drive storing random noise it can not make sense of. These models are far more flexible than arbitrary, one-size-fits-all encryption model hard-coded into protocols such as StorJ. Users are free to adopt their preferred scheme, compatible with their existing key management model. For example with AWS Storage Gateway, Linux users can treat cloud storage as an iSCSI volume with LUKS encryption while those on Windows can apply Bitlocker-To-Go to protect that volume exactly as they would encrypt a USB thumb-drive. Backing up rarely accessed data in an enterprise is even easier: nothing more fancy than scripts to GPG-sign and encrypt backups before uploading them to AWS/Azure/GCP is necessary. Facing the competition Once we accept the premise that privacy alone can not be a differentiator for backup services--users can already solve that problem without depending on the service provider--the competitive landscape reverts to that of a commodity service. Roughly speaking, providers compete on three dimensions: reliability, cost and speed. * Cost is the price paid for storing each gigabyte of data for a given period of time. * Speed refers to how quickly that data can be downloaded when necessary and to a lesser extent, how quickly it can be uploaded during the backup process. * Reliability is the probability of being able to get all of your data back whenever you need it. A company that retains 99.999% of customer data while irreversibly losing the remaining 0.001% will not stay in business long. Even 100% retention rate is not great if the service only operates from 9AM-4PM. The economic argument against decentralized storage can be stated this way: it is very unlikely that a decentralized storage market can offer an alternative that can compete against centralized providers-- AWS, Google, Azure-- when measured on any of these dimensions. (Of course nothing prevents Amazon or MSFT from participating in the decentralized marketplace to sell storage, but this would be another example of doing with increased friction something on a blockchain that can be done much more efficiently via existing channels.) Among the three criteria, cost is easiest one to forecast. Here is the pitch from StorJ website: "Have unused hard drive capacity and bandwidth? Storj pays you for your unused hard drive capacity and bandwidth in STORJ tokens!" Cloud services are ruled by a ruthless economy of scales. This is where Amazon, Google, MSFT and a host of other cloud providers shine, reaping the benefits of investment in data-centers and petabytes of storage capacity. Even if we ignore the question of reliability, it is very unlikely that the hobbyist with a few spare drives sitting in their basement can have a lower, per gigabyte cost. The standard response to this criticism is pointing out that decentralized storage can unlock spare, unused capacity at zero marginal cost. Returning to our hypothetical hobbyist, he need not add new capacity to compete with AWS. Let us assume he already owns excess storage already paid for that sits underutilized; there is only so much space you can take up with vacation pictures. Disks consume about the same energy whether they are 99% of 1% full. Since the user is currently getting paid exactly $0 for that spare capacity, any value above zero is a good deal, according to this logic. In that case, any non-zero price point is achievable, including one that undercuts even the most cost-effective cloud provider. Our hobbyist can temporarily boot up those ancient PCs, stash away data someone on the other side of the world is willing to pay to safeguard and shutdown the computer once the backups are written. The equipment remains unplugged from the wall until such time as the buyer comes calling for their data. Proof-of-storage and cost of storage The problem with this model is that decentralized storage demands much more than mere inert storage of bits. They must achieve reliability in the absence of the usual contractual relationship, namely, someone you can sue for damages if the data disappears. Instead the blockchain itself must enforce fairness in the transaction: the service provider gets paid only if they are actually storing the data entrusted for safeguarding. Otherwise the provider could pocket the payment, discard uploaded data and put that precious disk space to some other use. Solving this problem requires a cryptographic technique called proofs-of-data-possession (PDP) or alternatively proofs-of-storage. Providers periodically run a specific computation over the data they promised to store-- a computation that is only possible if they still have 100% of that data-- and publish the results on the blockchain, which in turn facilitates payment conditional on periodic proofs. Because the data-owner can observe these proofs, they are assured their their precious data is still around. The key property is that the owner does not need access to the original file to check correctness: only a small "fingerprint" about the uploaded data is retained. That in a nutshell is the point of proof-of-storage; if the owner needed access to the entire dataset to verify the calculation, it would defeat the point of outsourcing storage. While proofs of storage may keep service providers honest, it breaks one of the assumptions underlying the claimed economic advantage: leveraging idle capacity. Once we demand periodically going over the bits and running cryptographic calculations, the storage architecture can not be an ancient PC unplugged from the wall. There is a non-zero marginal cost to implementing proof-of-storage. In fact there is an inverse relationship between latency and price. Tape archives sitting on a shelf a much lower cost per gigabyte than spinning disks attached to a server. These tradeoffs are even reflected in the pricing model charged by Amazon: AWS offers a storage tier called Glacier which is considerably cheaper than S3 but comes with significant latency-- on the order of hours-- for accessing data. Requiring periodic proof-of-storage undermines precisely the one model-- offline media gathering dust in a vault-- that has the best chance of undercutting large-scale centralized providers. Beyond the economics, there is a more subtle problem with proof-of-storage: knowing your data is there does not mean that you can get it back when needed. This is the subject of the next blog post. [continued] CP Share this: * * Tweet * Like this: Like Loading... Related December 7, 2019Cem Paya Post navigation - - Leave a Reply Cancel reply Enter your comment here... [ ] Please log in using one of these methods to post your comment: * * * * Gravatar Email (required) (Address never made public) [ ] Name (required) [ ] Website [ ] WordPress.com Logo You are commenting using your WordPress.com account. ( Log Out / Change ) Google photo You are commenting using your Google account. ( Log Out / Change ) Twitter picture You are commenting using your Twitter account. ( Log Out / Change ) Facebook photo You are commenting using your Facebook account. ( Log Out / Change ) Cancel Connecting to %s [ ] Notify me of new comments via email. [ ] Notify me of new posts via email. [Post Comment] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] Standard Disclaimer The opinions and views expressed here are my own, and do not reflect those of my employer. December 2019 M T W T F S S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 << Nov Jan >> Search for: [ ] [Search] * Filecoin, StorJ and the problem with decentralized storage (part I) * Filecoin, StorJ and the problem with decentralized storage (part II) * CVV1, CVV2, CVV3: Demystifying credit card data (1/2) * About * Smart-cards vs USB tokens: optimizing for logical access (part II) * Extracting OTP seeds from Authy * Trading cryptocurrency without trusted third-parties (part I) * Using cloud services as glorified drive: virtual disks (part II) * Off-by-one: the curious case of 2047-bit RSA keys * CVV3: Demystifying credit card verification (part 2) Randomness in real time * RT @MIT_CSAIL: The 10 most energy-efficient programming languages, according to a team from @UMinho_Oficial. From left, energy use, execu... 3 hours ago * #Facebook software engineers are the Phillips Morris tobacco "scientists" of our generation buzzfeednews.com/article/craigs... 3 hours ago * Deja vu? QuadrigaCX story playing out in Turkey right now bloomberg.com/news/articles/... 6 hours ago * Alternative research idea for University of Minnesota CS researchers looking to back-door a kernel: #MSFT is always... twitter.com/i/web/status/1... 6 hours ago * From the how-did-this-get-past-IRB-review department: University of Minnesota CS department in damage-control mode... twitter.com/i/ web/status/1... 6 hours ago * RT @KimZetter: Is this the equivalent of herd immunity for data breaches? 1 day ago * RT @moxie: A few months ago Cellebrite announced that they would begin parsing data from Signal in their extraction tools. It seems they're... 1 day ago * RT @kurmus: You found a classic stack-based buffer overflow, but you can't exploit it due to stack cookies. Fear not, we have the solution... 2 days ago * RT @Gemini: Congratulations to our partners @CIGlobalAsset $ETHX, @PurposeInvest $ETHH, @EvolveETFs $ETHR on the listing of the first #ethe... 2 days ago * RT @QuinnyPig: At many companies if you die your RSU vesting is accelerated; all outstanding shares instantly vest for your next of kin. G... 2 days ago Follow @randomoracle Blog at WordPress.com. %d bloggers like this: [b]