[HN Gopher] Show HN: Read Wikipedia privately using homomorphic ...
___________________________________________________________________
Show HN: Read Wikipedia privately using homomorphic encryption
Author : blintz
Score : 193 points
Date : 2022-06-08 15:29 UTC (7 hours ago)
(HTM) web link (spiralwiki.com)
(TXT) w3m dump (spiralwiki.com)
| raxxorraxor wrote:
| Does homophobic in this case mean that I can edit the content of
| an article and the diff is directly applied to the crypt?
| teraflop wrote:
| This has nothing to do with editing Wikipedia. The problem this
| demo is solving is "private information retrieval" -- that is,
| you send a query for a particular article, and the server sends
| back a response containing that data, but the server does not
| learn which article you asked for. Homomorphic encryption is
| just one of the building blocks used to achieve this.
|
| A trivial solution to this problem would be for the client to
| just download a copy of the entire Wikipedia database and pick
| out the particular article it's interested in. With a proper
| implementation of PIR, the server still needs to scan through
| the entire encrypted dataset (this is unavoidable, otherwise
| its I/O patterns would leak information) but the amount of
| information that needs to be sent to the client is much
| smaller.
| calvinmorrison wrote:
| Unfortunately, it's not that trivial. I tried a few offline
| wikis with varying success, but haven't had a ton of success
| using it day to day.
| teraflop wrote:
| Oh, certainly, I didn't mean to imply that there would be
| no implementation difficulties -- just that if you're
| willing to download the entire dataset, the _privacy_
| aspect is trivially solved, without requiring any fancy
| cryptography.
| ReaLNero wrote:
| Quite the curious typo you have there.
| jesushax wrote:
| I wouldn't be surprised if they originally wrote it correctly
| and a phone's autocorrect "fixed" it for them
| iFire wrote:
| I wonder if this can be done on sqlite?
|
| http://static.wiki/
|
| See the previous news article.
| https://news.ycombinator.com/item?id=28012829
| lucb1e wrote:
| You mean distributing 43GB of data to everyone rather than
| uploading a few megabytes of data one time and getting 250KB
| answer chunks? I'm not sure it compares.
| rakoo wrote:
| If you're going the sqlite route, the much easier path is to
| distribute the whole database and do everything on the client
| 0cVlTeIATBs wrote:
| Could this be used for DNS?
| freemint wrote:
| Yes but it would be quiet expensive infrastructure wise. Also i
| think numbers of website grows faster then processor io meaning
| one would need more and more processors with time.
| blintz wrote:
| This is a great idea, and we think it would be relatively
| practical assuming some aggressive caching. However, I couldn't
| think of a threat model where this is useful, since presumably
| your ISP can in the end always see which sites you visit by
| simply reversing the IPs you connect to.
|
| Do you think that people would want private DNS? I suppose it
| would still be an improvement over the what we have today, but
| I'm not sure that it will make it meaningfully harder for ISPs
| to collect and sell data to advertisers.
| gojomo wrote:
| There are ways to hide IP endpoints from relaying nodes, as
| well.
| 0cVlTeIATBs wrote:
| On threat models, a malicious DNS server might also be one
| compromised by a party demanding wiretap access.
|
| Regardless, a person today has a choice of which DNS server
| to use but they all could track the requests made. Tracking
| site visits via IP is a different link in that chain.
|
| Would people pay? I don't know, but I could see it being a
| feature used to different a VPN service from its competitors.
| blintz wrote:
| That's a good point, I could see this being a
| differentiating feature for a VPN provider. The only way to
| know if people would pay is for someone to offer it, I
| guess...
| captn3m0 wrote:
| OCSP would be a good target in the similar space: https://en.
| m.wikipedia.org/wiki/Online_Certificate_Status_Pr...
| blintz wrote:
| I do think OCSP and certificate transparency stuff is a
| clear application for PIR that makes a lot of sense. There,
| you currently basically have to volunteer your browsing
| patterns to a 3rd party (beyond your ISP).
| captn3m0 wrote:
| For SCT audits on the browser, see https://www.lightest.e
| u/static/papers/10-Revisiting%20User%2...
|
| Google's CT design docs also state PIR Audits as the long
| term goal, so this would be a good area to focus on. http
| s://docs.google.com/document/d/1G1Jy8LJgSqJ-B673GnTYIG4b.
| ..
| barbazoo wrote:
| Can anyone recommend an explanation of this concept geared
| towards people with only a superficial knowledge of encryption?
|
| This seems to be some kind of search applied on an encrypted
| dataset, is that right?
| SilasX wrote:
| For homomorphic encryption in general I wrote this blog post to
| make the idea accessible:
|
| http://blog.tyrannyofthemouse.com/2013/05/i-added-your-numbe...
| Engineering-MD wrote:
| I find it deeply ironic that the page is http not https!
| extr wrote:
| It's like, I send the server an encrypted math problem. The
| server has no idea what the math problem is, but homomorphic
| encryption allows it to compute an (encrypted) result and send
| that back to me. I get the result and decrypt it for the
| correct answer. It's novel because you don't have to trust the
| server with your math problems.
| gojomo wrote:
| Interesting! But, it'd be helpful to clarify further the strength
| of the following claim:
|
| _> This demo allows private access to 6 GB (~30%) of English
| Wikipedia. In theory, even if the server is malicious, it will be
| unable to learn which articles you request. All article title
| searches are performed locally, and no images are available._
|
| In this demo, the number of article-titles is relatively small -
| a few million - & enumerable.
|
| If the server is truly malicious, and it issues _itself_ requests
| for every known title, does it remain true that this "Private
| Information Retrieval" (PIR) scheme still gives it _no_ hints
| that subsequent requests from others for individual articles
| retrieve particular data?
|
| (Presumably: _every_ request touches every byte of the same full
| 6GB of data, and involves every such byte in constant-run-time
| calculations that vary per request, and thus have the effect of
| returning only what each request wanted - but not at all in any
| way correlatable with other requests for the exact same article,
| from the same or different clients?)
| blintz wrote:
| Indeed! The encryptions of the client queries are fully
| semantically secure - under relatively solid lattice-based
| cryptographic assumptions, a server would need to do more than
| 2^128 work to recover the plaintext of someone's query. One
| query for one item in the database is indistinguishable
| (without the client's key) from another query for the same item
| later; in other words, it's similar to something like the
| guarantee of CBC or GCM modes, where as long as you use it
| correctly, it is secure even if the attacker can see many
| encryptions of its choosing.
| danuker wrote:
| Would this be vulnerable to a side-channel attack as follows?
|
| 1. Record what item was retrieved from disk for a query
|
| 2. Run a dictionary through the query system, and see which
| item matches the record
| [deleted]
| [deleted]
| blintz wrote:
| The server processing has to scan the entire database to
| answer a query- so the entire database stays in memory, and
| access patterns tell you nothing about the query.
| gojomo wrote:
| Notably to some who may misinterpret the "stays in
| memory" aspect: even the copy in memory is fully scanned
| by every query. So it's not just _disk_ access patterns
| that don 't give anything away, but _RAM_ access
| patterns.
| benlivengood wrote:
| How indistinguishable is a Not Found result by the server? It
| seems like user behavior would be to re-request the article a
| second time, so the client should probably protect the user
| against this kind of server (which could bisect on article
| popularity to find the requested article in ~20 tries) by
| throwing up a warning about an article in the index not being
| retrievable.
| blintz wrote:
| Indeed, client behavior when an article fails to be
| retrieved is very important. A malicious server could guess
| what article is being retrieved, remove it from the
| database, and see if the client 'reacts' in some observable
| way.
|
| The mitigations for this are somewhat heuristic; the client
| should perhaps 'pace'/'debounce' subsequent requests when a
| retrieval fails. For example, enforce that the re-retrieval
| will happen at a uniformly random time between 1-5 seconds
| later.
|
| It's good to note that this kind of attack is somewhat
| unavoidable, since denial-of-service is (generally) always
| possible. An interesting mitigation might be for the server
| to produce a ZKP that the full dot product was computed
| correctly, and have the client always check the ZKP before
| displaying the article. This is mostly a theoretical fix
| for now I believe, since ZKP's over homomorphic computation
| are in the realm of theory more than practice.
| benlivengood wrote:
| If the protocol is that the server returns a specific
| "index N deleted" result for deleted N then the client
| can at least trust a valid response from the server as
| opposed to a DDoS or unmasking attempt. Any server that
| returns something other than a valid document or "N
| deleted" should no longer be trusted or communicated
| with, but retries for communication failures or timeouts
| should still be safe.
|
| Edit: this assumes that the client gets a trusted index
| from a set of trusted servers who are at least as up to
| date as the latest index that is made available, which
| timestamped signatures or similar should suffice for. Or
| the client can supply the timestamped index signature and
| the server can reply with a different message if it's too
| old.
| throwaway2016a wrote:
| > without the client's key
|
| Thank you, these 4 words really helped with my understanding
| so I'm calling it out incase it helps others. So I was
| thinking, what prevents you from replaying the query and
| getting the same page back? But it seems the answer is: that
| would only produce a gibberish response because you don't
| have the key.
| mihaitodor wrote:
| Last year, there was a detailed presentation with several
| speakers on state of the art Secure Multi-Party Computation for
| practical applications in healthcare, fighting financial crime
| and machine learning from CWI (Centrum Wiskunde & Informatica)
| Netherlands. The recording is here (2,5h):
| https://www.youtube.com/watch?v=gE7-S1sEf6Q
| f38zf5vdt wrote:
| Extremely cool. Now we can serve content without any ability to
| observe what people are being served exactly. I was hoping that
| someday soon such technology could be used to serve search
| results and give us a _truly_ private search engine experience.
| blintz wrote:
| Hi, creator here.
|
| This is a demo of our recent work presented at Oakland (IEEE
| S&P): https://eprint.iacr.org/2022/368. The server and client
| code are written in Rust and available here:
| https://github.com/menonsamir/spiral-rs. The general aim of our
| work is to show that homomorphic encryption is practical today
| for real-world applications. The server we use to serve this
| costs $35/month!
|
| A quick overview: the client uses homomorphic encryption to
| encrypt the article number that they would like to retrieve. The
| server processes the query and produces an encrypted result
| containing the desired article, and sends this back to the
| client, who can decrypt and obtain the article. A malicious
| server is unable to determine which article the client retrieved.
| All search and autocomplete is down locally. The technical
| details are in the paper, but the high level summary is that the
| client creates a large one-hot vector of encrypted bits (0's
| except for the index of the desired article, where they place a
| 1) and then the server computes something like a 'homomorphic dot
| product' between the query and the plaintext articles.
|
| I'd like to caveat that this is an in-browser demo to show it is
| practical to use homomorphic encryption at this scale. As a real
| product, you'd probably want to distribute a signed client
| executable (or Electron app) since otherwise, a malicious server
| could simply deliver bad client JS on the fly.
|
| Happy to answer any questions!
| Labo333 wrote:
| I understand that you do some kind of dot product (with two
| steps, Regev and GSW). However, it looks to me that those steps
| involve fixed dimension vectors.
|
| - How do you handle variable length data? Do you need to pad
| it?
|
| - What is the memory overhead of the storage of encrypted data?
|
| I think that at least for video data, the streaming scheme
| "leaks" the size of the encrypted data with the number of
| streaming packets.
| blintz wrote:
| Yeah, every record needs to be the same size. For the demo,
| we batch the articles into 100KB chunks. We do a good job
| packing, where we put many small articles into a 100KB chunk,
| and we split large articles into many 100KB chunks. This
| packing is pretty efficient, roughly 90% of space is not
| wasted.
|
| The memory overhead is significant but not prohibitive... we
| have to keep something like a 5-8x larger encoded database in
| memory, but this overhead is from encoding the plaintext in a
| special format to allow a fast dot product, not from
| inefficiency in the packing.
| lucb1e wrote:
| Is it not possible to determine which article(s) the user
| downloaded based on the memory locations read? Of course,
| multiple small articles from within the same 100KB cannot
| be said, but for any medium to large article, you'd be able
| to make a good guess (if there are a handful of articles
| there) or an exact match (if there is <=1 article in that
| chunk) no?
|
| Or does the server go through a large chunk of its memory
| (say, at least a quarter of all of Wikipedia) and perform
| some oblivious computation on _all of that data_ (applying
| the result modulo this 100KB return buffer)? That sounds
| very resource-intensive, at least for something large like
| Wikipedia (a doctor 's office with some information pages
| of a few KB each could more easily do such a thing).
|
| In the latter case, is each request unique (does it involve
| some sort of IV that the client can xor out of the data
| again) or could an index be built similar to a list of
| hashed PIN codes mapped back to plain text numbers?
|
| Edit: I had already read some comments but just two
| comments further would have been my answer... :)
| https://news.ycombinator.com/item?id=31669630
|
| > > With a proper implementation of PIR, the server still
| needs to scan through the entire encrypted dataset (this is
| unavoidable, otherwise its I/O patterns would leak
| information)
|
| Now let's see if I can also find the answer regarding the
| ability to build a lookup table...
|
| Edit #2: found that also!
| https://news.ycombinator.com/item?id=31669924
|
| > One query for one item in the database is
| indistinguishable (without the client's key) from another
| query for the same item later; in other words, it's similar
| to something like the guarantee of CBC or GCM modes, where
| as long as you use it correctly, it is secure even if the
| attacker can see many encryptions of its choosing.
|
| That is some cool stuff indeed. I'm going to have to up my
| game when building or reviewing privacy-aware applications.
| Sure, a file sharing service is not going to practically
| allow this, but I'm sure that with this knowledge, I will
| come across places where it makes sense from both a
| usefulness (e.g. medical info) and practicality (data set
| size) perspective.
| MauranKilom wrote:
| > Sure, a file sharing service is not going to
| practically allow this
|
| Well, as the author points out here [0], it doesn't
| actually translate to exorbitant cost increases when
| almost all the cost is in bandwidth rather than compute.
| A file sharing service rather seems like an ideal example
| (assuming you can derive additional revenue from the
| privacy promises).
|
| [0]: https://news.ycombinator.com/item?id=31673122
| j2kun wrote:
| Do you have a blog or Twitter? I'd like to keep up with any
| other cool projects you're working on!
| blintz wrote:
| Not at the moment, but will probably make a blog post or
| something to explain how it all works at a slightly higher
| level than the paper.
|
| When it's done it'll be at https://samirmenon.com/.
| jl6 wrote:
| In another comment you've said:
|
| > With a proper implementation of PIR, the server still needs
| to scan through the entire encrypted dataset (this is
| unavoidable, otherwise its I/O patterns would leak information)
|
| Is this technique therefore practical only when the server side
| dataset is relatively small (or full scans for every query are
| tolerable)?
|
| (edit: sorry, misattributed the quote)
| blintz wrote:
| Wasn't me, but it was accurate!
|
| Indeed, except in some (exciting!) new theoretical
| constructions, server work is always linear in the database
| size.
|
| However, I'd emphasize that our work shows that the constabt
| factor on this linear operation is really high. We process at
| 300MB/s to 1.9GB/s on a single core, which is fast enough for
| relatively large databases. Remember that the computation is
| embarrassingly parallel, so you can always throw more compute
| at larger databases. To summarize, we think the speed is now
| fast enough that it really can be feasible to scan the whole
| database to answer a single query.
| teraflop wrote:
| > except in some (exciting!) new theoretical constructions
|
| Sorry if this is too much of a tangent, but I would love to
| know what these are!
| sangel wrote:
| They are called PIR with sublinear online computation or
| offline/online PIR.
|
| Key idea is the client issues a query that is independent
| of what they really want. This is the "offline" query.
| This query is linear (unavoidable) and the client gets
| some hints as a result
|
| Then later, the client can issue one or more queries for
| elements they want and those queries can be sublinear in
| terms of computation. The client uses hints to make that
| happen.
|
| Some papers on this are:
|
| https://eprint.iacr.org/2019/1075.pdf
|
| https://eprint.iacr.org/2021/1438
|
| https://eprint.iacr.org/2022/081.pdf
| blintz wrote:
| Sure! This paper has people really excited from a
| theoretical standpoint:
| https://eprint.iacr.org/2022/081.pdf. It's a bit far from
| practical for single-server, but I'm sure people are
| working on it.
| wbeckler wrote:
| Maybe the I/O pattern could be hideable using confidential
| computing, like with a Nitro Enclave.
| teraflop wrote:
| Maybe, but if you have a secure enclave that can be trusted
| not to leak data, then you don't really need PIR. You can
| just have clients encrypt their queries with a key that can
| only be decrypted by the code running in the enclave.
| teraflop wrote:
| Clarification: that was my comment, not OP's. I'm not a
| cryptography expert, just an interested amateur. But my
| understanding is that O(n) query times are inevitable if you
| want information-theoretic security. Maybe it's possible to
| do better with a weaker security property.
|
| And there are clever ways you can make a system like this
| "scale" even if the overall dataset size is limited. For
| instance, the authors cite another interesting paper [1] that
| uses a similar technique to build a fully-private voice chat
| system. The basic idea seems to be that you build a
| "database" consisting of the most recent snippet of audio
| from every ongoing conversation, and let each client
| privately query for the segment that's addressed to it. And
| every fraction of a second when new audio data arrives, you
| just throw away the old database and build a new one, so the
| amount of data doesn't depend on the length of the
| conversation.
|
| Even if this doesn't scale to arbitrary numbers of users, it
| could still be used to run a "cell" of a few thousand people,
| in which it's not possible for an adversary to reconstruct
| communication patterns within the cell.
|
| [1]:
| https://www.usenix.org/conference/osdi21/presentation/ahmad
| Canada wrote:
| Can this be applied usefully to non-public datasets?
|
| Would it be feasible to add some other zero knowledge proof to
| this that would confirm a user has paid a subscription for
| access? For example, if this were a news site, the user would
| have to prove a valid subscription to read articles, but the
| site would not be able to know which articles any subscriber
| decided to read?
|
| If that is possible, what could the site to to prevent a paying
| subscriber from sharing their access to an unreasonable number
| of others? Would it be possible to impose a rate limit per
| subscriber?
| phh wrote:
| I'm no cryptographer, but it seems to me you could implement
| this using the same algorithm as cloudfare for tor, which
| generates anonymous tokens from an adhoc webpage
| blintz wrote:
| The simplest approach would be to just charge people per-
| query (or charge in levels, depending on the number of
| queries). This could be done in the standard way (you have to
| log in, pay the site, and then it gives you an API key or
| just session token, and logs how many queries you make). I
| think you can avoid having to use a ZKP this way, since that
| will make things much more complicated and possibly costly.
| JanisErdmanis wrote:
| > A malicious server is unable to determine which article the
| client retrieved.
|
| This sounds like magic :O. How does it behave when new articles
| (elements) are added, does it need to rebuild the whole
| database and distribute new parameters?
|
| I wonder how practical it would be for clients to synchronize
| content without server not being able to deduce the
| synchronization state at which the client is.
| blintz wrote:
| It does sound like magic! This is what got me into this
| field; it seems like something that should intuitively be
| impossible... but it's not!
|
| Parameters only need to be changed based on the number of
| items in the database (not the content of the items). Also,
| they don't really need to be modified as long as we are
| within the same power of two number of items. So, I think
| client and server agreement on parameters seems feasible.
| Right now, it's just hard coded :)
| syrrim wrote:
| What is the maximum throughput the server can maintain? Or, in
| other words, how much does it cost per query?
| blintz wrote:
| The $35/month server uses 6 cores to answer a single query in
| ~2.5-3 seconds. So it's 0.33 QPS :-)
|
| Not high, which is why it might not be working well for folks
| right now...
|
| Time scales almost perfectly linearly with cores, since the
| computation is embarrassingly parallel.
|
| In terms of cost, we're still talking only 18 CPU*s and
| ~300KB of outgoing bandwidth, which is not a ton at todays
| prices.
| IshKebab wrote:
| It's embarrassing parallel... but you also do N times more
| work than a non-homomorphic system so that's not saying
| much!
|
| This doesn't seem like a particularly compelling
| application - can you give some practical problems that
| homomorphic encryption solves. I've always heard vote
| counting as the example.
| blintz wrote:
| Yeah, the computational overhead over no-privacy
| retrieval will always be high. Something we point out in
| the paper is that this does not necessarily translate to
| server monetary costs; for example, on AWS, where
| outgoing bandwidth is expensive, using our system to
| stream a movie is less than 2x the monetary cost of
| direct, no-privacy streaming. This is because the
| computational costs are just very small in comparison to
| the bandwidth costs.
|
| As far as more compelling applications, several folks
| have suggested DNS, OCSP, and certificate transparency as
| potential examples. Using PIR as a building block, it is
| also possible to build more complex systems, like
| metadata-private messaging (systems like 'Pung') and even
| private voice calling (systems like 'Addra').
| rkagerer wrote:
| Not able to read the full paper at the moment, and confused
| about something:
|
| If the server needs to go pull the article from Wikipedia, how
| is it blind to which one is being requested?
|
| If you've pre-seeded the server with an encrypted 30% of
| Wikipedia, how can I trust you haven't retained information
| that would enable you to derive what I requested?
|
| The only way I understand this works is if the client itself
| seeded the encrypted data in the first place (or at least an
| encrypted index if all the server pushes back is article
| numbers).
|
| Maybe I'm ignorant of something; if so thanks for ELI5.
| gojomo wrote:
| The server already has a full copy of (its subset of)
| Wikipedia.
|
| _Every_ query touches every byte of that snapshot, in the
| same way... but the homomorphic-encryption math distills that
| down to just the fragment the client requested.
|
| And it does so _without_ giving even the CPU executing that
| math any hint of what bytes /ranges/topics survive the math.
| The much-smaller - but still, at every server step, encrypted
| - result is then sent to the client, which performs the
| decryption.
|
| It's moon-math magic.
| blintz wrote:
| The server has a snapshot of Wikipedia, sitting in memory.
| The server is blind to which article is requested because it
| computes a homomorphic dot product between an encrypted one-
| hot vector (encrypted under a key that _only_ the client
| knows) and the total set of articles in plaintext. The
| encrypted query does not reveal anything about the requested
| index (ie it is semantically secure).
|
| The 'magic' of homomorphic encryption is that the server is
| able to take an encrypted one-hot vector of bits, and a
| vector of plaintext articles, compute a homomorphic dot
| product, and produce a single ciphertext encoding the single
| desired plaintext article. This single output ciphertext is
| crucially also encrypted under the _client 's_ secret key, so
| it reveals nothing to the server.
| philipkglass wrote:
| Don't you have to pad the output ciphertext size to match
| the largest article you could possibly request from the set
| of articles? Or is a fixed-size output an inherent property
| of homomorphic encryption schemes? Otherwise it seems to
| reveal something to the server just by the size of the
| ciphertext (since WP articles vary in size).
| blintz wrote:
| Yes, every item needs to be the same size. We batch the
| articles into 100KB chunks, and we split articles larger
| than this into multiple chunks.
| aildours wrote:
| Wouldn't you then have to send out multiple ciphertexts
| (for articles >100 KB)? Which would leak something about
| the size of the article...
| blintz wrote:
| You would. It's important for the client to pace it's
| requests in a way that does not reveal too much (for
| example, the client should not just request both chunks
| of the article at the same time). The best thing to do
| would probably be to have a 'load more' button at the
| bottom of a very long article that makes a separate
| request.
|
| If you think about it, the pacing of user queries could
| always reveal something (if there's a long gap between
| quarries, perhaps it's a dense mathematics article?). So
| the best we can hope for is pacing requests either
| randomly, or for even further protection, perhaps making
| dummy requests on a fixed cadence.
| Dylan16807 wrote:
| The output is always going to be fixed size.
|
| The nicer way to look at homomorphic encryption is like
| math, but the more literal way is like a bunch of logic
| gates. There are N output wires. You control the bits on
| the input wires. Nothing you do will change the number of
| wires.
| Dylan16807 wrote:
| > If you've pre-seeded the server with an encrypted 30% of
| Wikipedia, how can I trust you haven't retained information
| that would enable you to derive what I requested?
|
| With homomorphic encryption, the client sends a series of
| encrypted numbers. Nobody can decrypt them except the client.
| The server can do arithmetic with them, making new secret
| numbers that nobody can decrypt except the client.
|
| There is no usable information _to_ retain.
|
| So the question becomes: what can you calculate using
| arithmetic on secret numbers?
|
| Well, for this demo, treat every article as a number. Then
| multiply all the articles you don't want by 0, and the
| article you want by 1, and add them all together.
|
| The server just sees that it's multiplying every article by a
| secret number. It can't tell what the number is. It can't
| tell if the output is "encrypted article" or "encrypted
| 000000..."
|
| Then the server adds them all up. If the client asked for no
| articles, the result will be "encrypted 000000..." If the
| client asked for one article, the result will be that
| article, encrypted. If the client asked for multiple, the
| result will be a garbled mush of overlapping articles,
| encrypted. The server can't tell the difference. It just
| knows it has an encrypted number.
| barkingcat wrote:
| does revealing that you accessed something from the index
| vs nothing a leak of information?
| dheera wrote:
| Thank you. I _really_ wish the paper started with exactly
| what you wrote as the abstract.
| jerf wrote:
| This is the first thing out of homomorphic encryption _I
| personally have seen_ that seems to be in the ballpark of
| useful for some practical use, which is impressive. Have I
| missed out on any other such things of interest?
|
| (And this is not a criticism; this is a compliment. You start
| so far behind the eight-ball with homomorphic encryption with
| regard to the resources it consumes I wasn't convinced it was
| ever going to be even remotely useful for much of anything.
| Precisely because I was so skeptical, I am that impressed to
| see something work this well. It's not the fastest Wikipedia
| mirror, but... honestly... I've been on slower websites!
| Websites with _far_ less excuse.)
| blintz wrote:
| There has recently been a lot of great work on making
| homomorphic encryption more practical for particular
| applications. We definitely stand on the shoulders of all
| that work!
|
| One reason homomorphic encryption has a reputation as
| absurdly slow is that people typically talk about "fully"
| homomorphic encryption, which essentially means that you can
| compute an arbitrarily large function on the encrypted data.
| This involves a very expensive process called bootstrapping,
| which incurs a roughly ~15 ms cost per binary operation
| evaluated. As you can imagine, that adds up.
|
| This work uses "leveled" homomorphic encryption, where we
| only perform a function of 'bounded' size (that is, the
| homomorphic dot product). So, we do not have to perform
| bootstrapping, and thus avoid some of the more extreme costs.
|
| The other reason this work is practical is that we 'tailored'
| our construction to the particular problem of private
| information retrieval. When people try to apply homomorphic
| encryption _generically_ to their problem, they typically end
| up with disappointingly slow and expensive results. Cool work
| on better 'FHE compilers' so in progress, so hopefully that
| will also help!
| feanaro wrote:
| > can compute an arbitrarily large function on the
| encrypted data
|
| What is the meaning of the word "large" here? How are we
| defining the size of a function?
| Confiks wrote:
| Either the amount of Boolean logic operations or the
| amount of integer additions and multiplications, as these
| are the operations FHE schemes support.
| halJordan wrote:
| Apple's Differential Privacy uses homomorphic encryption for
| parts of its functioning.
| j2kun wrote:
| Citation?
| kccqzy wrote:
| You missed the widely panned Apple iCloud child sexual abuse
| imagery detection feature. The private set intersection is
| basically doing homomorphic encryption. In raising some very
| valid policy critiques, people forget that it's actually a
| nifty piece of engineering. (This is not an endorsement of
| that feature.) https://www.apple.com/child-
| safety/pdf/Apple_PSI_System_Secu...
|
| I'm also closely working together with a team at $WORK who's
| using a protocol very similar to Apple's but not doing CSAM
| detection. We are seeing some severe pushback on this
| technology. I wouldn't be surprised if there are multiple
| homomorohic encryption based products at Big Tech that have
| yet to see the light of day.
| [deleted]
| lifthrasiir wrote:
| I think it is very counterintuitive---even to professional
| programmers---that you can compute things without letting
| the computer know them. I literally had to go through the
| entire paper to understand that this can actually work as
| long as human in the loop doesn't screw up (see my summary
| [1], which was revised multiple times at that time).
|
| [1] https://news.ycombinator.com/item?id=28223141
| ddjsn111 wrote:
| How does the server select the article in a way that we can be
| sure they don't record the article sent back? Are the articles
| encrypted on the server too?
| sp332 wrote:
| Yes, in fact the article number is not even decrypted by the
| server, so the server doesn't know which article you asked
| for!
| kaoD wrote:
| How is this not vulnerable to side-channel attacks like
| disk-access patterns?
|
| Could I, as a malicious server, request myself a target
| article and correlate that with legitimate user requests?
| sodality2 wrote:
| > With a proper implementation of PIR, the server still
| needs to scan through the entire encrypted dataset (this
| is unavoidable, otherwise its I/O patterns would leak
| information)
|
| https://news.ycombinator.com/item?id=31669370 (not
| original poster)
| dontbenebby wrote:
| This is very cool OP! I interviewed to be a privacy engineer with
| Wikimedia a while back.
|
| I suggested that my goal would be to add a v3 onion service. They
| actually had listed years of "homomorphic encryption" as a
| requirement. I phoned up the recruiter and basically said it's ok
| if there is a personality conflict, but the role as written was
| impossible to fill, and it scared me that very good suggestions
| for privacy as well as the health of the Tor network were
| discarded.
|
| (If you set up a dot onion, that frees up traffic on exit nodes,
| whose capacity are limited.)
|
| Big thanks to the OP for being willing to share this work, it's
| very cool and I'm about to read your eprint.
|
| I'm excited about the potential of homomorphic encryption, though
| I worry about things like CPU cost -- I recall when folks had to
| really be nudged not to encrypt huge blocks of data with PGP, but
| instead use it to encrypt the passphrase to a Truecrypt volume
| using a symmetric cipher like AES.
|
| (I'd love how to know we got to a point Twitter added an onion
| service then banned me, but Wikipedia continues to not even
| support MFA for logins -- I recently registered an account
| intending to eventually upload some art to the commons, but the
| perpetual refusal to allow folks to make healthy choices disturbs
| me.
|
| In fact, after reading articles like these ones[1][2], it makes
| me question the integrity of the folks I interacted with during
| the interview process.
|
| On my end, it was especially disturbing since prior to enrolling
| in my PhD, the alternative path I discussed was becoming an FBI
| agent focused on counter intelligence in the "cyber" realm.
|
| The agent I spoke with told me I'd serve "at the needs of the
| bureau", so that would mean probably not using my computer
| skills, which would then languish, then after a couple years I
| might still not get my desired position, and gave me a card,
| which I eventually lost.
|
| Years later, prior to the insurrection, I had to walk down to
| Carnegie Mellon and ask if anyone had his contact information,
| and was shocked that folks refused to even point me at a link to
| the lecture, which had been listed as open to the public.
|
| I'm someone who _reads_ Wikipedia, not really edits, but the vast
| majority of users are readers not editors, and this perpetual
| pattern of refusing to enable privacy enhancing technologies,
| paired with using privileges access to make hiring decisions
| against folks who lack the physical ability to make good privacy
| decisions offended me on a deep, personal level, and is why I
| often post in brash, erratic manner.
|
| Because I see zero incentive to stay silent -- if I'm quiet,
| people will slowly drain my bank account.
|
| If I post, there is a chance someone will see what I say, notice
| my skills, and offer full time employment. So I have to continue
| risking offending folks until I find a full time job, which I
| have not had since I left the Center for Democracy and Technology
| under duress following a series of electronic and physical
| attacks, paired with threats and harassment by staffers in the
| organization.
|
| TL;DR: Great research, but I hope they also add an onion service
| rather than jump straight to using this :-)
|
| [1]
| https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@list...
|
| [2] https://slate.com/technology/2021/10/wikipedia-mainland-
| chin...
| MauranKilom wrote:
| I tried, but I simply can't follow your train of thought. You
| keep going back and forth between criticizing Wikimedia hiring
| and technology choices, advertising yourself, and deliberating
| over onion services. And it all seems extremely tangential to
| the article (which is really not about the future of Wikipedia,
| or Tor, or your career).
| badrabbit wrote:
| Very nice! Great against snoopers that lack authority but for
| when they do have some authority (bosses, government) without
| plausible deniability it can do more harm than good.
| ArkaneSkye wrote:
| ajconway wrote:
| Theoretically, can this scheme be turned into a generic O(N) key-
| value retrieval for non-static content (in this example --
| supporting adding, removing and replacing articles without re-
| encrypting the whole database and re-sending the client setup
| data)?
| blintz wrote:
| We never encrypt the database. Only the query is encrypted. The
| client setup data is only dependent on the client key and the
| size of the database (not the content). Adding and replacing
| articles can happen whenever the server wants, and clients do
| not need to "re-sync" or something like that.
|
| For arbitrary key-value retrieval, a hashing scheme would work
| pretty well, modulo some imbalances that will occur.
___________________________________________________________________
(page generated 2022-06-08 23:00 UTC)