https://randomoracle.wordpress.com/2019/12/07/filecoin-storj-and-the-problem-with-decentralized-storage/

Random Oracle

Building and breaking systems

Menu
Widgets
Search
Skip to content

  * Home
  * About

Standard Disclaimer

The opinions and views expressed here are my own, and do not reflect
those of my employer.

   December 2019
M  T  W  T  F  S  S
                  1
2  3  4  5  6  7  8
9  10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31  

<< Nov   Jan >>
Search for: [                    ] [Search]
  * Filecoin, StorJ and the problem with decentralized storage (part
    I)
  * Filecoin, StorJ and the problem with decentralized storage (part
    II)
  * CVV1, CVV2, CVV3: Demystifying credit card data (1/2)
  * About
  * Smart-cards vs USB tokens: optimizing for logical access (part
    II)
  * Extracting OTP seeds from Authy
  * Trading cryptocurrency without trusted third-parties (part I)
  * Using cloud services as glorified drive: virtual disks (part II)
  * Off-by-one: the curious case of 2047-bit RSA keys
  * CVV3: Demystifying credit card verification (part 2)

Search for: [                    ] [Search]
Filecoin, StorJ and the problem with decentralized storage (part I)


Blockchains for everything

Decentralized storage services such as Filecoin and StorJ seek to
disrupt the data-storage industry, using blockchain tokens to create
a competitive marketplace that can offer more space at lower cost.
They also promise to bring a veneer of legitimacy to the Initial Coin
Offering (ICO) space. At a time when ICOs were being mass-produced as
thinly-veiled, speculative investment vehicles that are likely to run
afoul of the Howey test as unregistered securities, file-storage
looks like a shining example of an actual utility tokens, for having
some utility. Instead of betting on the "greater fool" theory of
offloading the token on the next person willing to pay a higher
price, these tokens are good for a useful service: paying someone
else to store your backups. This blog post looks at some caveats and
overlooked problems in the design.

Red-herring: privacy

A good starting point is to dispel the alleged privacy advantage.
Decentralized storage system often tout their privacy advantage: data
is stored encrypted by its owner, such that the storage provider can
not read it even if they wanted to. That may seem like an improvement
over the current low bar which relies on service providers swearing
on a stack of pre-IPO shares that, pinky-promise, that they never not
dip into customer data for business advantage, a promise more often
honored in the breach as the examples of Facebook and Google
repeatedly demonstrate. But there is no reason to fundamentally alter
the data-storage model to achieve E2E security  against rogue
providers. While far from being the path of least resistance, there
is a long history of alternative remote backup services such as
tarsnap for privacy-conscious users. (All 17 of them.) Previous blog
posts here have demonstrated that it is possible to implement
bring-your-own-encryption with vanilla cloud storage services such as
AWS such that the cloud service is a glorified remote drive storing
random noise it can not make sense of. These models are far more
flexible than arbitrary, one-size-fits-all encryption model
hard-coded into protocols such as StorJ. Users are free to adopt
their preferred scheme, compatible with their existing key management
model. For example with AWS Storage Gateway, Linux users can treat
cloud storage as an iSCSI volume with LUKS encryption while those on 
Windows can apply Bitlocker-To-Go to protect that volume exactly as
they would encrypt a USB thumb-drive. Backing up rarely accessed data
in an enterprise is even easier: nothing more fancy than scripts to
GPG-sign and encrypt backups before uploading them to AWS/Azure/GCP
is necessary.

Facing the competition

Once we accept the premise that privacy alone can not be a
differentiator for backup services--users can already solve that
problem without depending on the service provider--the competitive
landscape reverts to that of a commodity service. Roughly speaking,
providers compete on three dimensions: reliability, cost and speed.

  * Cost is the price paid for storing each gigabyte of data for a
    given period of time.
  * Speed refers to how quickly that data can be downloaded when
    necessary and to a lesser extent, how quickly it can be uploaded
    during the backup process.
  * Reliability is the probability of being able to get all of your
    data back whenever you need it. A company that retains 99.999% of
    customer data while irreversibly losing the remaining 0.001% will
    not stay in business long. Even 100% retention rate is not great
    if the service only operates from 9AM-4PM.

The economic argument against decentralized storage can be stated
this way: it is very unlikely that a decentralized storage market can
offer an alternative that can compete against centralized providers--
AWS, Google, Azure-- when measured on any of these dimensions. (Of
course nothing prevents Amazon or MSFT from participating in the
decentralized marketplace to sell storage, but this would be another
example of doing with increased friction something on a blockchain
that can be done much more efficiently via existing channels.)

Among the three criteria, cost is easiest one to forecast. Here is
the pitch from StorJ website:

    "Have unused hard drive capacity and bandwidth?
    Storj pays you for your unused hard drive capacity and bandwidth
    in STORJ tokens!"

Cloud services are ruled by a ruthless economy of scales. This is
where Amazon, Google, MSFT and a host of other cloud providers shine,
reaping the benefits of investment in data-centers and petabytes of
storage capacity. Even if we ignore the question of reliability, it
is very unlikely that the hobbyist with a few spare drives sitting in
their basement can have a lower, per gigabyte cost.

The standard response to this criticism is pointing out that
decentralized storage can unlock spare, unused capacity at zero
marginal cost. Returning to our hypothetical hobbyist, he need not
add new capacity to compete with AWS. Let us assume he already owns
excess storage already paid for that sits underutilized; there is
only so much space you can take up with vacation pictures. Disks
consume about the same energy whether they are 99% of 1% full. Since
the user is currently getting paid exactly $0 for that spare
capacity, any value above zero is a good deal, according to this
logic. In that case, any non-zero price point is achievable,
including one that undercuts even the most cost-effective cloud
provider. Our hobbyist can temporarily boot up those ancient PCs,
stash away data someone on the other side of the world is willing to
pay to safeguard and shutdown the computer once the backups are
written. The equipment remains unplugged from the wall until such
time as the buyer comes calling for their data.

Proof-of-storage and cost of storage

The problem with this model is that decentralized storage demands
much more than mere inert storage of bits. They must achieve
reliability in the absence of the usual contractual relationship,
namely, someone you can sue for damages if the data disappears.
Instead the blockchain itself must enforce fairness in the
transaction: the service provider gets paid only if they are actually
storing the data entrusted for safeguarding. Otherwise the provider
could pocket the payment, discard uploaded data and put that precious
disk space to some other use. Solving this problem requires a
cryptographic technique called proofs-of-data-possession (PDP) or
alternatively proofs-of-storage. Providers periodically run a
specific computation over the data they promised to store-- a
computation that is only possible if they still have 100% of that
data-- and publish the results on the blockchain, which in turn
facilitates payment conditional on periodic proofs. Because the
data-owner can observe these proofs, they are assured their their
precious data is still around. The key property is that the owner
does not need access to the original file to check correctness: only
a small "fingerprint" about the uploaded data is retained. That in a
nutshell is the point of proof-of-storage; if the owner needed access
to the entire dataset to verify the calculation, it would defeat the
point of outsourcing storage.

While proofs of storage may keep service providers honest, it breaks
one of the assumptions underlying the claimed economic advantage:
leveraging idle capacity. Once we demand periodically going over the
bits and running cryptographic calculations, the storage architecture
can not be an ancient PC unplugged from the wall. There is a non-zero
marginal cost to implementing proof-of-storage. In fact there is an
inverse relationship between latency and price. Tape archives sitting
on a shelf a much lower cost per gigabyte than spinning disks
attached to a server. These tradeoffs are even reflected in the
pricing model charged by Amazon: AWS offers a storage tier called
Glacier which is considerably cheaper than S3 but comes with
significant latency-- on the order of hours-- for accessing data.
Requiring periodic proof-of-storage undermines  precisely the one
model-- offline media gathering dust in a vault-- that has the best
chance of undercutting large-scale centralized providers.

Beyond the economics, there is a more subtle problem with
proof-of-storage: knowing your data is there does not mean that you
can get it back when needed. This is the subject of the next blog
post.

[continued]

CP

Share this:

  * 
  * Tweet
  * 

Like this:

Like Loading...

Related

December 7, 2019Cem Paya

Post navigation

-
-

Leave a Reply Cancel reply

Enter your comment here...
[                    ]

Please log in using one of these methods to post your comment:

  *  
  * 
  *  
  *  

Gravatar
Email (required) (Address never made public)
[                    ]
Name (required)
[                    ]
Website
[                    ]
WordPress.com Logo

You are commenting using your WordPress.com account. ( Log Out / 
Change )

Google photo

You are commenting using your Google account. ( Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. ( Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. ( Log Out /  Change )

Cancel

Connecting to %s

[ ] Notify me of new comments via email.

[ ] Notify me of new posts via email.

[Post Comment] 

[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
[                                             ]
Standard Disclaimer

The opinions and views expressed here are my own, and do not reflect
those of my employer.

   December 2019
M  T  W  T  F  S  S
                  1
2  3  4  5  6  7  8
9  10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31  

<< Nov   Jan >>
Search for: [                    ] [Search]
  * Filecoin, StorJ and the problem with decentralized storage (part
    I)
  * Filecoin, StorJ and the problem with decentralized storage (part
    II)
  * CVV1, CVV2, CVV3: Demystifying credit card data (1/2)
  * About
  * Smart-cards vs USB tokens: optimizing for logical access (part
    II)
  * Extracting OTP seeds from Authy
  * Trading cryptocurrency without trusted third-parties (part I)
  * Using cloud services as glorified drive: virtual disks (part II)
  * Off-by-one: the curious case of 2047-bit RSA keys
  * CVV3: Demystifying credit card verification (part 2)

Randomness in real time

  * RT @MIT_CSAIL: The 10 most energy-efficient programming
    languages, according to a team from @UMinho_Oficial. From left,
    energy use, execu... 3 hours ago
  * #Facebook software engineers are the Phillips Morris tobacco
    "scientists" of our generation buzzfeednews.com/article/craigs...
    3 hours ago
  * Deja vu? QuadrigaCX story playing out in Turkey right now
    bloomberg.com/news/articles/... 6 hours ago
  * Alternative research idea for University of Minnesota CS
    researchers looking to back-door a kernel: #MSFT is always...
    twitter.com/i/web/status/1... 6 hours ago
  * From the how-did-this-get-past-IRB-review department: University
    of Minnesota CS department in damage-control mode... twitter.com/i/
    web/status/1... 6 hours ago
  * RT @KimZetter: Is this the equivalent of herd immunity for data
    breaches? 1 day ago
  * RT @moxie: A few months ago Cellebrite announced that they would
    begin parsing data from Signal in their extraction tools. It
    seems they're... 1 day ago
  * RT @kurmus: You found a classic stack-based buffer overflow, but
    you can't exploit it due to stack cookies. Fear not, we have the
    solution... 2 days ago
  * RT @Gemini: Congratulations to our partners @CIGlobalAsset $ETHX,
    @PurposeInvest $ETHH, @EvolveETFs $ETHR on the listing of the
    first #ethe... 2 days ago
  * RT @QuinnyPig: At many companies if you die your RSU vesting is
    accelerated; all outstanding shares instantly vest for your next
    of kin. G... 2 days ago

Follow @randomoracle
Blog at WordPress.com.
<span>%d</span> bloggers like this:

[b]