[HN Gopher] Cryptographers solve decades-old privacy problem
___________________________________________________________________
Cryptographers solve decades-old privacy problem
Author : Brajeshwar
Score : 226 points
Date : 2023-11-18 15:37 UTC (7 hours ago)
(HTM) web link (nautil.us)
(TXT) w3m dump (nautil.us)
| j2kun wrote:
| It's an exciting time to be working in homomorphic encryption!
| noman-land wrote:
| Homomorphic encryption and zero knowledge proofs are the most
| exciting technologies for me for the past bunch of years
| (assuming they work (I'm not qualified enough to know)).
|
| Having third parties compute on encrypted, private data, and
| return results without being able to know the inputs or outputs
| is pretty amazing.
| drakenot wrote:
| Can you give an example of a useful computation someone would
| do against encrypted data?
|
| Trying to understand where this would come in handy.
| hedora wrote:
| 100% of the things you use Google for (drive, docs, search,
| personalization, maps, etc).
| fidotron wrote:
| In an extreme case could Google use a mechanism like this
| to deny themselves direct access to the data they collect
| while still bidding in ad auctions based on that
| information?
| janandonly wrote:
| Proving you have enough money without divulging the exact
| amount ?
| daveguy wrote:
| It would come in handy with not requiring any trust of the
| cloud that is running your servers. For protected health
| information this could potentially be a major breakthrough
| in patient privacy.
| NortySpock wrote:
| The general concept is "I want to keep this data private
| and confidential, but I want to be able to rent elastic
| compute for it". Previously, at the end of the day, the
| processor could only do useful work on unencrypted data.
|
| But something like "analyze this list of customer records
| for fraud" or "analyze this proprietary data for trends"
| has previously either required a lot of trust that your
| cloud provider isn't going to siphon your data, or just
| required on-prem hardware that you cannot scale as easily.
|
| If "math on encrypted data" works, we could keep our data
| confidential while still sending of batches of it to be
| number-crunched at a cloud provider.
|
| Or even start talking about distributed / peer-to-peer
| computing, where a swarm of users (or, say, businesses that
| wish to cooperate) can send compute jobs to each other
| without having to trust that the other members weren't
| going to go snooping through the data that was sent.
| alwa wrote:
| And for that matter if I were a provider of some kind of
| computational service for hire, I (and my insurers) might
| feel a great deal of relief at the idea that we're no
| longer having to sit on a big attractive ransomable pile
| of our clients' data.
| Obscurity4340 wrote:
| Has any world government implemented this in a
| similar/relevant fashion that illustrates this in action?
| fragmede wrote:
| No because this is the very bleeding edge of technology
| so there are only very limited (but very cool) tech demos
| of the technology? Nothing at scale yet.
|
| https://spiralwiki.com/
| calamari4065 wrote:
| Wait, so they're doing computation on encrypted data and
| inferring some information about the relationships
| between data in the encrypted set?
|
| How is that even possible? That seems to defeat the
| purpose of encryption, or at least show that our
| encryption is severely flawed.
|
| This is the first time I've heard of this and it's kind
| of blowing my mind
| fiddlerwoaroof wrote:
| The idea is the result is encrypted too such that, when
| the person that holds the key gets it back they can
| decrypt it and see the result. So you can run an
| inference algorithm on data without ever having to
| decrypt it and see what the data actually is.
|
| It seems to me that this intrinsically is vulnerable to
| side-channel attacks, but it will be interesting to see
| if we can avoid those with constant-time algorithms or
| something.
| l33t7332273 wrote:
| What about this seems vulnerable to side channels?
| fiddlerwoaroof wrote:
| Because, if you have a mechanism to run arbitrary
| computations on encrypted data, I'd be a bit concerned
| about an attacker running carefully crafted computations
| on the data and deducing information about the encrypted
| data based on how the ciphertext changes or the
| time/memory usage of the computation.
|
| This isn't really a particularly well-informed suspicion:
| it's partly based on a sense that FHE is a "have your
| cake and eat it too" sort of technology that's too good
| to be true and partly based on the recent discoveries
| that fairly well-understood things like speculative
| execution in CPUswere vulnerable to serious side-channel
| attacks.
| l33t7332273 wrote:
| A homomorphism is a function so that f(ab) = f(a) f(b).
| So you can imagine the cloud customer wants to compute
| ab, but the use a homomorphic encryption function, f, to
| ask the cloud provider to computer f(ab) given f(a) and
| f(b).
|
| Then the customer decrypts f(ab). This doesn't imply any
| weakness in encryption.
|
| FHE is a bit stronger than what I've described, but
| that's the idea.
| trompetenaccoun wrote:
| In addition to the other answers, any situation where you
| want to prove identity but want to preserve privacy at the
| same time. For example you could prove you're an adult
| citizen with ID without revealing your ID number, picture,
| or any private information whatsoever.
| azeemba wrote:
| I had written an intro level blog post exploring image
| analysis use HME: https://azeemba.com/posts/homomorphic-
| encryption-with-images...
|
| One paper I used showed how X-rays can be analyzed without
| exposing the X-ray data itself.
|
| This can be generalized more to any situation where one
| party owns a proprietary algorithm and another party owns
| sensitive input data but they don't trust each other.
|
| Modern LLMs are actually the perfect example. You want to
| use chatgpt but they don't want to give you their model and
| you don't want them to see your input. If HME was more
| efficient, you could use it so that the person executing
| the model never sees your real input.
| koolba wrote:
| > One paper I used showed how X-rays can be analyzed
| without exposing the X-ray data itself.
|
| Is there any hope of getting the performance to the point
| where something like that would be feasible? I'd imagine
| the raw data itself would be so big that the performance
| for anything non trivial would be unworkable.
| wanderingbort wrote:
| If it's just that the parties don't trust each other then
| the cost of HME has to be compared to the current "state
| of the art" which is contracts and enforcement thereof.
|
| In practice, I don't think those costs are that high
| because the rate of incident is low and the average
| damage is also low.
|
| Yes there are outlier instances of large breaches but
| these seem like high profile aircraft crashes considering
| how many entities have sensitive data.
| brendoelfrendo wrote:
| Data incidents cause more problems than can easily be
| resolved with a contract lawsuit. Perhaps the data was
| siphoned by a 3rd party that hacked your vendor, or a
| malicious insider at your vendor sold it to a competitor.
| Sure, you can recoup some losses by suing your vendor for
| breach of contract, but once the data is leaked, it's
| never secret again.
|
| And then there's the example of businesses that work with
| lots of confidential customer data, like banks or
| doctors. Again, you can sue your vendor for breach of
| contract if they behave irresponsibly with your data, but
| your customers may not care; you're going to suffer a hit
| to your reputation regardless of whether or not the
| breach was your fault.
| wanderingbort wrote:
| You can say it's insufficient but it is what it costs
| them today.
|
| I guess the better comparison is that cost in a financial
| statement plus some expected increase in revenue due to a
| "better" product.
|
| Again, I think you are correct in your analysis of the
| improvements but that contributes little to the revenue
| as explaining the benefit to most customers requires
| framing your existing product as potentially harmful to
| them. Educating them will be hard and it may result in an
| offsetting realization that they were unsafe before and
| as a result were paying too much.
| alwa wrote:
| I feel like trust is a spectrum, and the promise of these
| techniques is that they reduce the need for trust in the
| first place.
|
| We should consider what kinds of computational tasks
| today's responsible parties (or their regulators, or
| their insurers) think of as too risky to casually trust
| to third parties under the status quo. For example with
| my block storage provably unintelligible if you don't
| have the HSM I keep safely in my corporate dungeon, I'm
| comfortable not caring whose racks the encrypted blocks
| sit on. I'd have to vet those vendors a lot harder if
| they could read all my super secret diaries or whatever.
|
| And, for that matter, it's on the service provider side
| too, right? Even the contractual, spit-and-handshake
| pinky-swear-based mode of enforcement comes with
| significant compliance costs for service providers,
| especially ones operating in regulated industries.
| Perhaps it's not too much to hope that effective and
| efficient HME techniques might reduce those service
| providers' compliance costs, and lower the barrier to
| entry for new competitors.
|
| I'm reminded how even non-tech people in my life became
| much more willing to trust their credit card details to
| online retailers once they felt like a little green lock
| icon made it "safe". Of course a LOT changed over that
| same period, but still: the underlying contractual
| boundaries didn't substantially change--in the US the
| customer, then as now, has only ever been responsible for
| a certain amount of fraud/theft loss--but people's risk
| attitudes updated when the security context changed, and
| it opened up vast new efficiencies and lines of business.
| wanderingbort wrote:
| It's not too much to hope that HME reduces those
| compliance costs. However, I believe it is too much to
| assume there will be any material adoption before it can
| demonstrate that reduction.
|
| Reduction of trust is not a value add, it is a cost
| reduction. Maybe that cost is blocking a valuable
| product/service but either that product/service's value
| is less than the current cost of trust OR trust has to be
| far more costly in the context of the new
| product/service.
|
| It's only the latter that I find interesting which is why
| tend to be pretty hard on suggestions that this will do
| much for anything that currently exists. At best, it will
| improve profits marginally for those incumbents.
|
| What is something where the price of trust is so
| catastrophically high in modern society AND HME can
| reduce that cost by orders of magnitude? Let's talk about
| that rather than HME.
| noman-land wrote:
| There are a few that I've thought about but the first one
| that comes to mind is proving you're 21+ to an alcohol
| store cashier without handing over your ID with all your
| personal info. You could just give them a ZK proof that
| returns a true/false and they could check it against
| encrypted data (signed by the DMV).
| wanderingbort wrote:
| Why is that preferable over a message attesting "over 21"
| signed by the DMV?
|
| The hard parts here are retrofitting society to use a
| digital ID and how to prove that the human in front of
| you is attached to that digital ID.
|
| The solutions there all seem like dystopias where now
| instead of a bouncer looking at your ID for a few
| seconds, technology is taking pictures of you everywhere
| and can log that with location and time trivially.
| noman-land wrote:
| It doesn't have to be a digital ID, it can just be
| encrypted data encoded on a regular ID on a QR code.
|
| Age depends on timestamp. The encrypted data is stored on
| the ID and signed by the DMV, with a function that can be
| run by the bouncer's scanning machine that plugs in a
| now() timestamp, and receives a boolean in return. The
| DMV doesn't even need to be involved after the issuance
| of the ID and no network access is needed for this
| calculation.
|
| No one's location was tracked and no one's picture was
| taken and now a bouncer who fancies you can't turn up at
| your house after glancing at your ID.
| captn3m0 wrote:
| Age verification (without leaking other PII) was the
| illustrative scenario for W3C Verified Credentials (it
| lets you use a validating authority to sign specific
| subset of your schema).
|
| There's lots of other ways to solve the problem for
| verification/signing use cases tbh. Homomorphic
| encryption shines best when you are looking at more
| complex calculations than just a Boolean result - such as
| tax calculations.
|
| Can you submit your financial information and have your
| taxes calculated without revealing the amounts involved?
| Can you apply filters to an image without having the
| image readable by the server? It essentially allows us to
| "trust a remote server" in scenarios where one wouldn't
| usually.
| wanderingbort wrote:
| How do you know that the bouncers scanning machine didn't
| log the interaction?
|
| The whole value prop is built on not trusting that
| bouncer and by extension their hardware.
|
| Everything would have to be encrypted leading to the
| bouncer also needing to establish that this opaque
| identifier actually belongs to you. This is where some
| picture or biometric comes into play and since the
| bouncer cannot evaluate it with their own wetware you are
| surrendering more data to a device you cannot trust.
|
| They also cannot trust your device. So, I don't see a
| scenario where you can prove ownership of the ID to a
| person without their device convincing them of it.
| adastra22 wrote:
| It's not. And since it is also 10000x slower than merely
| checking a signed number, nobody is interested in doing
| this.
| dimastopel wrote:
| Imagine a search engine that gives you results without
| knowing your search query.
| westurner wrote:
| Grading students' notebooks on their own computers without
| giving the answers away.
|
| https://news.ycombinator.com/item?id=37981190 :
|
| > _How can they be sure what 's using their CPU?_
|
| Firefox, Chrome: <Shift>+<Escape> to open about:processes
|
| Chromebook: <Search>+<Escape> to open Task Manager
| OJFord wrote:
| All sorts - anything where you today give your data to some
| SaaS (and they encrypt it at rest, sure), and they then
| provide you some insight based on it or allow you to do
| something with it (by decrypting it, crunching numbers,
| spitting out the answer, and then encrypting the source
| data again) could potentially be done without the
| decryption (or the keys to make it possible).
|
| Concretely (and simplistically) - I give you two numbers, 2
| & 2, but they're encrypted so you don't know what they are,
| and you add them together for me, but that's also encrypted
| so you don't even know that the sum of the inputs I gave
| was 4. It's 'computation on encrypted data', basically.
| SilasX wrote:
| It lets you have someone add up your numbers, and give you
| the sum, without knowing the input numbers or their sum.
| Basically, any time you want someone else to host the data,
| while you can also do queries/computations on it.
|
| That now generalizes to any computation (not just
| addition), because there is a logically complete set of
| primitive operations for which fully homomorphic encryption
| (FHE) can be done.
|
| Caveats:
|
| 1) You can't actually do e.g. while loops with run time
| unknown. All such FHE computations are done by generating a
| fixed size boolean circuit for a given input size. It's
| "Turing-complete" in the sense that you can size up the
| circuit to any input size, but it wouldn't directly
| implement an unbounded while loop -- you have to generate a
| different "program" (circuit) for each input size.
|
| 2) All such computations must have a fixed output size --
| else it leaks information about the data inside. So
| allowable computations would be like, "give me the first 30
| bytes of the result of querying for names in the set that
| begin with A". If there are no results, the output still
| has to be 30 bytes.
|
| 3) For similar reasons, any FHE computation must look at
| all bits of the input (otherwise, it leaks info about
| what's in them). "See if this value is in the set" can't
| just jump to the relevant section.
|
| 4) The untrusted third party knows what computation you're
| doing (at least at a low level), just not the data it's
| being performed on or the result.
|
| 5) As you might have expected from 3), there's a massive
| blowup in resources to do it. There can be improvements,
| but some blowups are inherent to FHE.
| podnami wrote:
| There are examples in cryptocurrency, where ZK proofs are
| all the rage. One use case is validating ownership of some
| amount of currency without knowing the actual amount, or
| verifying that a smart contract was executed correctly
| without revealing the specifics of the contracts internal
| state. Some blockchains use this in production, such as
| Polygons ZkEvm.
| littlestymaar wrote:
| Functional encryption is even cooler than FHE (but even more
| prohibitively expensive)
| noman-land wrote:
| Got any cool links to read about it?
| j2kun wrote:
| It's also not complete: only a handful of functions are
| known to have functional encryption schemes.
| collsni wrote:
| Ok
| ForkMeOnTinder wrote:
| It's a published paper about homomorphic encryption, so...
| unlikely.
| InCityDreams wrote:
| Yes, or no?
| ChrisArchitect wrote:
| The paper:
|
| _Doubly Efficient Private Information Retrieval and Fully
| Homomorphic RAM Computation from Ring LWE_
|
| https://eprint.iacr.org/2022/1703
| kelsey9876543 wrote:
| Word salad with no cryptographic proof, why is this even on HN
| l33t7332273 wrote:
| Well, for starters it won the best paper award at ACM
| Symposium on Theoretical Computing.
|
| Can you elaborate on which particular part you consider to be
| "word salad" as opposed to just unfamiliar?
| ChrisArchitect wrote:
| Original article submitted a few times weeks ago:
|
| https://news.ycombinator.com/item?id=38164732
| desdenova wrote:
| Good to see some people are actually solving this problem, unlike
| all those startups using it as a marketing buzzword.
| hanniabu wrote:
| I assume you're referring to the blockchain industry, but
| they've advanced cryptography a lot, specifically with
| zkproofs.
| lucb1e wrote:
| I know of the concept of zero-knowledge proofs, but didn't
| know that the blockchain industry advanced cryptography a lot
| in that area. What are the practical applications of those
| new things? Or which new things are there to begin with? The
| Wikipedia article on zero-knowledge proof doesn't seem to say
| dumbfounder wrote:
| With zk you can prove you own something or sign
| transactions without people tracking your entire history.
| It is also used to help scale chains by rolling up multiple
| transactions into one proof.
| hugodutka wrote:
| One of the applications are ZK-Rollups [1] which allow
| developers to move heavy computation off a blockchain. The
| blockchain receives the results and only validates proofs
| that they are valid. This is especially useful on Ethereum
| because its computational throughput is pretty low.
|
| There's also ZCash [2], which is a cryptocurrency that lets
| you make untraceable transactions. This is in stark
| contrast to Bitcoin or Ethereum, where transaction
| information is available publicly to everyone. They have a
| series of blog posts [3] on the math that actually makes it
| work under the hood.
|
| [1] https://ethereum.org/en/developers/docs/scaling/zk-
| rollups/
|
| [2] https://z.cash
|
| [3] https://electriccoin.co/blog/snark-explain/
| captn3m0 wrote:
| The first time I can across PIR was via the 2014 blog post from
| Signal on how Private Contact Discovery was a hard problem and
| how it required PIR to be solved.
| https://signal.org/blog/contact-discovery/
|
| Maybe this will help Signal get to a viable solution in a few
| years.
| lucb1e wrote:
| Note that Signal decided not to use that:
|
| > Unfortunately, for a reasonably large user base, this
| strategy doesn't work because the bloom filters themselves
| are too large to transmit to mobile clients. For 10 million
| TextSecure users, a bloom filter like this would be ~40MB,
| requested from the server 116 times a second if every client
| refreshes it once a day.
|
| They decided to run computations inside a 'secure' hardware
| environment instead (SGX specifically) so that they can't get
| access to the computation themselves but it also doesn't need
| to be run client side. I assume you meant the former thing,
| but the approach they actually use is fundamentally different
| from homomorphic encryption / PIR.
| nly wrote:
| The problem with all these privacy preserving cryptographic
| breakthroughs is they are never deployed in practice.
|
| Just look at cryptocurrency. We've known how to create a privacy
| preserving, truly distributed, cryptographic replacement to cash
| for decades, and what we end up with instead is Bitcoin and the
| like, which is pseudo-anonymous only and ends up being
| centralised anyway to interact with the fiat world.
|
| Theres no demand for this tech in current society.
|
| _Sigh_
| persnickety wrote:
| > a privacy preserving, truly distributed, cryptographic
| replacement to cash
|
| I didn't realize this was known. Could you explain or provide
| an example?
| Aerbil313 wrote:
| Monero. Disagree with the GP though, these things certainly
| weren't around or known decades ago.
| ric2b wrote:
| Monero is the common example, since it solves the pseudo-
| privacy issues that Bitcoin has while otherwise being very
| similar.
| raincom wrote:
| David Chaum [1], a famous Cryptographer, founded International
| association of Cryptologic Research (IACR). He published so
| many articles on digital-cash, anonymous cash, etc. He had
| patents on them; he even founded a company on that concept.
| However, that company failed.
|
| [1] https://en.wikipedia.org/wiki/David_Chaum
| j2kun wrote:
| South Korea used it for their COVID contact tracing system.
| thewanderer1983 wrote:
| That's not true. Sometimes these technologies get pulled up
| high side or are only developed there so the public doesn't
| hear about them. You should read the ietf paper on the crypto
| wars. Crypto and ZKPs are some of the known attempts to keep
| these technologies out of the public.
| krater23 wrote:
| Turns out: Paper reveales how they removed google from the
| planet...
| llwj wrote:
| How efficient is it now? The last time I checked, FHE required
| minutes of computation and gigabytes of memory to store tiny
| amounts of data, and since it does not hold IND-CCA, I could not
| find any use cases.
| sangel wrote:
| Very inefficient. Like wildly so. Specifically if you have a
| very small database and you preprocess it with their
| techniques, the resulting database is petabytes in size. But
| the results are very beautiful.
|
| There are no obvious ways to improve on this work, so it is not
| a matter of engineering. We really do need another breakthrough
| result to get us closer to practicality.
| rhindi wrote:
| FHE in general is efficient enough for many applications now.
| You can see some benchmarks here: https://docs.zama.ai/tfhe-
| rs/getting-started/benchmarks
| jquery wrote:
| 2 seconds to perform a single division on an a 16-bit? int?
| Am I reading that chart correctly?
| catilinam wrote:
| Isn't FHE by definition not INDCCA? Weak objection imo
| jengels_ wrote:
| Pretty efficient! E.g. a recent paper describes a system to do
| fully private search over the common crawl (360 million web
| pages) with an end to end latency of 2.7 seconds:
| https://dl.acm.org/doi/10.1145/3600006.3613134
| vlovich123 wrote:
| Yeah, but that's 1 request using 100% capacity of a 45
| machine cluster. Relatively efficient but cost prohibitive.
| avmich wrote:
| How does it compare to the search done without any
| encryption?
| seeknotfind wrote:
| Many many orders of magnitude. It really depends on the
| search algorithm. If you're looking up a single term,
| there are many standard ways to index this, but for
| instance using a trie or hash table, the lookup cost
| would be a number of operations proportional to the
| search term length (e.g. for "foo", it's a of length 3,
| times some small number of operations for each letter).
| Plus then if you want to combine the results and sort by
| frequency, to estimate the cost of this, add up the
| number of pages from each term, p_1 + p_2 + ... = P. Then
| the number of operations might be P*log(P). This
| dominates the cost. So if your search terms appear in
| 1,000,000 of these pages, you're talking about on the
| order of ~6,000,000 small operations. An implementation
| would be able to execute this many operations on a single
| CPU in 10s or 100s of milliseconds maybe, for an
| elementary unoptimized algorithm.
| godelski wrote:
| Not an answer, but a question that I hope someone can answer.
| Is the lack of speed because of a lack of optimization or
| compute? Is this something that could be fixed by an
| accelerator?
|
| It's often hard to compare things in an accurate way. I mean
| many current methods might already have hardware specific
| acceleration and researchers aren't always good at optimization
| (why should they be?). So is the problem of FHE a bigger lack
| of just not enough people (specialists) putting in time to
| optimize it or is it so computationally intensive that we need
| hardware to get better before it becomes practical? I mean look
| at llama.cpp and how much faster it is or stable diffusion?
| Sure, both kinda slow (diffusion's no GAN and SD's not straight
| diffusion) but very usable for many applications.
| asasidh wrote:
| Would a solution like mixing address the issue with acceptable
| tradeoffs ? https://arxiv.org/abs/1905.12264
| I_am_tiberius wrote:
| As most likely lots of cryptographers read this, I have a
| question. What's an efficient way to store encrypted data in a
| database (only specific table columns) and decrypt them on the
| fly while querying the data? Using postgres with
| pgp_sym_encrypt(data, key) seems very slow.
| fyokdrigd wrote:
| so, not private nor efficient nor solution to homomorphic data.
|
| basically a polynomial factor hash of the index keys... basically
| you will need the entire db plus the larger hashed keys. doesn't
| help at all implementing any of the outlandish claims.
|
| guess the merit is in proving the poly fact they build on as
| secure. but that is not a game changer as there are better
| alternatives and nobody solved the claimed problems with them.
| wolverine876 wrote:
| It understand it's inefficient, but could it be used by a well-
| resourced organization where confidentiality is extremely high-
| value and the database is small, such as intelligence, military
| plans, Apple's product plans, etc.?
| narinxas wrote:
| there are much cheaper ways, chief amongst them is to use paper
| and pencils
| godelski wrote:
| How do people with pen and paper operate on 10k pieces of
| data? Which is honestly a rather small number. There's a
| reason we use computers and why statistics accelerated after
| its invention.
| jdefr89 wrote:
| This is super misleading and sort of ambiguous. Are they claiming
| they eliminated any side channel information that can be used? I
| am confused here. The problem is anyone can collect various side
| channel information and use it as Meta data... Fully Homo.
| Encryption only helps after you've established secure links and
| to do that you will inevitably have to use a less secure link or
| system that which can be used in and of itself to make inferences
| about queries... The real issue is we don't know if the "secure"
| systems we use that we don't control actually give us the privacy
| they claim to...
| parentheses wrote:
| ++ it's unclear what data this breakthrough is helping to keep
| private and how.
| l33t7332273 wrote:
| You can use differential privacy to protect from most (timing,
| memory, etc) side channels in distributed analytics.
___________________________________________________________________
(page generated 2023-11-18 23:00 UTC)