[HN Gopher] Building a Curve25519 Hardware Accelerator
___________________________________________________________________
Building a Curve25519 Hardware Accelerator
Author : parsecs
Score : 163 points
Date : 2021-07-15 14:37 UTC (8 hours ago)
(HTM) web link (www.bunniestudios.com)
(TXT) w3m dump (www.bunniestudios.com)
| upofadown wrote:
| >The "double ratchet" algorithm is integral to modern end-to-end-
| encrypted chat apps, such as Signal, WhatsApp, and Matrix.
|
| Signal protocol uses a hash ratchet for the message to message
| forward secrecy. So you don't need to do the expensive key
| agreement stuff unless that hash ratchet falls out of
| synchronization.
|
| It isn't really clear if any messaging application really needs
| message to message forward secrecy in the first place. You really
| only need to do that expensive key agreement operation when you
| want to get rid of some old messages that someone has captured
| off the wire. That can be reasonably done at the end of a session
| or even weekly. Very few people need messages that have not even
| scrolled off the screen to be forward secret, assuming they need
| it at all.
| tptacek wrote:
| _You really only need to do that expensive key agreement
| operation when you want to get rid of some old messages that
| someone has captured off the wire._
|
| Could you perhaps share more of your analysis here? I'd like to
| be able to better follow what you are trying to say.
| upofadown wrote:
| If you are using a temporary shared secret then after you
| forget that shared secret you will need a new secret which
| you will have to generate. I mean nothing more than that.
| tptacek wrote:
| Right, why would anyone ever be better off holding that
| temporary secret over many messages and many days?
| upofadown wrote:
| My entire point is that it is not really necessary to do
| shared secret generation at any but a very slow rate.
| Even if you don't so it the Signal protocol way it is
| still perfectly reasonable to only do it once in a while.
| That might be a short while but it could reasonably be
| quite long.
| tptacek wrote:
| Why is it reasonable to only do shared secret generation
| once in awhile? What's the win to doing that? The win to
| not doing it is that it narrows the blast radius of a
| single key compromise to a small number of messages (or,
| in full ratchets, to a single message). Why would I ever,
| ever want to _increase_ the blast radius of a key?
| admax88q wrote:
| I'm not sure you gain much by moving the forward secrecy to a
| weekly or session based level, and I think you lose a lot in
| terms of simplicity.
|
| It is simpler to design and analyze if it applies at every
| message. It's a design for the best case. No longer need to
| make judgement calls on if 1 week is enough, or maybe 2, or
| maybe some middle ground.
| toast0 wrote:
| > That can be reasonably done at the end of a session or even
| weekly.
|
| At the end of a session might be reasonable cryptographically,
| but isn't really reasonable in terms of application
| development; you don't consistently get to have use CPU and
| network when a session ends and you may not get access again
| until a new session begins. A weekly trigger is more likely to
| be practical, but that may not work consistently either. Better
| to do the ratchetting as you're doing the messaging to
| guarantee it gets done.
| sliken wrote:
| Say it takes a $5,000 machine 24 hours to break the code.
| Clearly it's much more attractive to an attacker if they can
| break a weeks worth of messages than 10 seconds worth.
| OJFord wrote:
| > It's a weird thing about academics -- they like to write papers
| and "share ideas", but it's very hard to get source code from
| them.
|
| I suppose at least part of the reason, especially for things like
| crypto and statistics, is that they don't want to get bogged down
| in plausibly-correct claims of fatal flaws, and be perhaps
| repeatedly (over years hence) put in a defensive position of
| having to prove the criticism _in_ correct or have everyone
| assume the work was wrong?
|
| I can sort of understand that. The same's true for the papers
| themselves of course, but I can see it being more annoying for
| code. ('But it's never going to actually be in that state', etc.)
| JJMcJ wrote:
| Donald Knuth once wrote a paper with some source code.
|
| He warned his readers about just copying the code, perhaps half
| jokingly.
|
| He said, approximately, "this code has been proven correct, but
| not tested."
| bunnie wrote:
| You're probably right. And assuming you are, then I think maybe
| the core problem may be that academics even tolerate the petty
| undercutting of reputation based not on the merit of the big
| ideas, but rather minor flaws in the proofs of concept.
|
| There's also a problem in allowing papers to get away with
| making arbitrary claims that are by definition irreproducible.
| Every paper always includes some section about how many gates a
| design took up and how fast it ran -- but there aren't entirely
| consistent standards for reporting these things; it's often too
| easy to hide significant setup costs when running benchmarks.
|
| This leads to an incentive to overstate results, because
| without any threat of checks it would be disadvantageous to not
| cherry-pick the absolute best numbers you could get through a
| peer review committee.
|
| I get the desire to escape petty criticism: but it seems like a
| weak argument against the myriad of reproducibility and
| integrity hazards such a lack of transparency begets,
| especially for a line of work that ostensibly distinguishes
| itself on high standards for merit and integrity.
|
| It also would have been nice to just receive a polite response
| of the form of "sorry, I don't support this anymore, I've moved
| on..." from anyone I had reached out to. Before starting on
| this project, I emailed four or five groups that had all
| published various papers on Curve25519 accelerators hoping to
| get even a broken non-compiling repo that could have helped me
| sanity-check my understanding of the thornier concepts
| presented in the papers.
|
| Not even a single response...hence the motivation to write up
| notes and share them. Seems a waste of funding to have so many
| implementations, but no way to actually use any of them...
| staticassertion wrote:
| Are you saying this based on experience? My assumption was that
| it was a number of factors.
|
| 1. Researchers are often weak coders, and they may be self
| conscious about their code.
|
| 2. My own experience is that researcher code is totally
| undocument, or broken. I've repeatedly found that the code
| linked to from a paper has no straightforward way to be built,
| or appears to have compiler errors etc.
|
| 3. It's more work, and researchers have the incentive to put
| out a compelling paper, not to put out code (because we don't
| hold them to that standard).
|
| My own experiences have led to me to mostly dismiss papers
| where I can't find the code or can't get it to build.
| kzrdude wrote:
| Given all that, it's understandable that it's not released
| and completely reasonable - a support structure is needed if
| the code should be released.
| staticassertion wrote:
| I'm setting the bar super low. A `./configure` and a readme
| with basic info about the project and how to build/run it
| would be quite exceptional. I don't need them to be
| triaging github issues and merging in PRs.
| gww wrote:
| In the biological sciences there is a huge push for the code to
| be released alongside the paper. Many publications (but not
| enough yet) include a complete pipeline in a workflow language
| that downloads, processes the data and generates the figures
| used in the paper.
| toomuchtodo wrote:
| What's the language de jour typically leveraged to declare
| these workflows?
| semi-extrinsic wrote:
| Since GP mentions biology, I would guess R is the first
| choice.
| jlrubin wrote:
| Hi Bunnie!
|
| Curious if you also have a power usage comparison, which I'm
| guessing is also relevant in the mobile context.
| korethr wrote:
| This is neat. The ever-present admonition against rolling one's
| own crypto lurks in the back of my mind, though. It is not my
| intent here to impugn Bunnie's competence, though I do wonder if
| he's run the design or implementation past any cryptographers to
| try to verify that the design or implementation are sound, and
| don't subtly break critical assumptions vital to the security of
| the algorithm.
| pinewurst wrote:
| But he's not rolling his own crypto - in an algorithm sense -
| purely trying to speed the execution of an existing algorithm
| which can be easily validated against other existing
| implementations
| korethr wrote:
| Sure, but is it not possible that a hardware acceleration, if
| not also proven correct, could subtly undermine an otherwise
| valid software implementation?
|
| I do not assert that this is the case generally, nor
| particularly here -- I would be quite happy to be wrong about
| this. But if there's one thing I've taken from my admittedly
| minor study of crypto is that the most trivial of details can
| be critically important, and failure to get a detail exactly
| correct can compromise an entire algorithm and anything built
| thereupon.
|
| I would love to see hardware acceleration for ed25519, just
| like there's been hardware acceleration for AES. It could
| help drive adoption of an algorithm that is less fiddly and
| easy to get wrong than RSA, and maybe in another 15 years,
| I'd be able to purchase IT equipment that supports ed25519
| and not just hardware accelerated RSA and AES.
| IfOnlyYouKnew wrote:
| Excellent deep dive on this beast that had me similarly question
| my intelligence in the past.
|
| As a former academic: the reason sharing source code is/was rare
| is that it tends to be extremely ugly, and that you don't get any
| credit for publishing it where it matters (I. e. tenure
| committees). To go from something that works to something that
| you would allow people to see takes about as much time as the
| original work.
|
| (As a side note: I feared for the worst when I saw, among tags
| like "hacking" and "open source", the tag "feminism" in the side
| bar. I enjoyed being wrong.)
| vlovich123 wrote:
| I don't think that's a good reason. Production code is also
| ugly.
| sowbug wrote:
| This work is associated with Precursor
| (https://www.bunniestudios.com/blog/?p=5921), which is Bunnie's
| project to build a mobile hardware platform from the ground up.
| jazzyjackson wrote:
| TIL 255 is divisible by 17
|
| Edit: thanks to my replies I know several more things now lol big
| day for me
| pavpanchekha wrote:
| 256 = 16 * 16
|
| 255 = 256 - 1 = 16 * 16 - 1 = 15 * 17
| Sesse__ wrote:
| 2^p - 1 can only be prime (a so-called Mersenne prime) if p is
| prime. 2^8 - 1 thus cannot be a prime, since 8 is composite.
|
| (Note that is it not _sufficient_ that p is prime, but it is
| necessary.)
| pfortuny wrote:
| (a^2-1)=(a+1)(a-1).
| jandrese wrote:
| It is also divisible by 5.
| ChuckMcM wrote:
| This is an excellent read[1]. It reminded me of the process I
| went through putting together an FFT engine in an FPGA although I
| think the small execution engine is just stellar, and not
| something I had considered.
|
| [1] To be fair I find most of the stuff Bunnie writes about to be
| worth reading! :-)
| staticassertion wrote:
| > Wait, what? So many articles and journals I read on the topic
| just talk about prime fields, modular reduction and blah blah
| blah like it's asking someone to buy milk and eggs at the grocery
| store.
|
| No kidding, it's so frustrating! I have to read some articles/
| papers like 20x and open a bunch of wikipedia tabs to understand
| wtf they're talking about. If they gave a simple, high level
| explanation it would save tons of time - while a wikipedia
| article is going to be very in depth, it's not tailored to the
| context of what I'm reading, so I often have to look at a whole
| bunch of things to start to build a good model in my head for wtf
| they're trying to convey.
|
| This post, on the other hand, is perfect for me. Thank you so
| much for writing it I'm learning a ton.
| dragontamer wrote:
| https://www.youtube.com/watch?v=xE4jEKx9fTM
|
| GF5 is the 3rd smallest finite field (GF2 and GF3 are smaller),
| but GF2 and GF3 are too small to really see where the beauty of
| all this math lies.
|
| ------
|
| GF5 (which is simply "arithmetic modulo 5") has all the
| properties of Real-numbers "we care about", and none of the
| hassles of decimals or partial numbers.
|
| GF5 is simply arithmetic modulo 5. Its really easy to do. 4 + 2
| == 1. (Aka: 4 + 2 mod 5 == 6 mod 5 == 1). Addition and
| multiplication are easily proven to be the same.
|
| The fun starts when you start doing roots, exponents and
| logarithms (!!!). Take the definition of addition /
| multiplication above (aka: addition / multiplication modulo 5)
| and redefine square-root, exponents, and logarithms to be
| consistent with it.
|
| That is to say: 2 * 3 == 1. Which means that 3 == (1/2) (yes,
| "one half" is now 3 in GF5). Which means 3 is 2^(-1).
|
| * 2^2 == 4.
|
| * 2^2 * (2^-1) == 4 * 3 == 12 mod 5 == 2.
|
| Which means 2^2 * 2^-1 == 2^(2-1) and look, exponents work as
| expected. Feel free to play around with all the numbers: you'll
| find that exponents work once you match this new "definition"
| of exponents.
|
| ---------
|
| Because exponents work, logarithms work. Because everything
| works, we can have complicated ellipical curves defined over
| the numbers {0, 1, 2, 3, 4} (aka: the numbers mod 5) and have
| everything work out mathematically.
|
| --------
|
| The hard part is proving that all of this works in GF(5^2) aka
| GF25. It turns out that you can't just "mod 25" the stuff, you
| need to do an extension field (and basically make a complex
| number out of things).
|
| Once you understand GF25 extension fields and its relationship
| with GF5, then it becomes easy to generalize that to GF2 and
| its extension fields GF(2^8) (aka: binary numbers of 8 bits) or
| GF(2^32) (aka: 32-bit numbers).
|
| Turns out for extension fields (like GF(5^2) aka GF25) you need
| to redefine multiplication and addition as well and then
| rebuild all the math from the ground up off of this new
| definition of addition and new definition of multiplication.
| And again, this is most important for GF(2^32), aka the 32-bit
| number field that's common on today's computers.
|
| Bonus points: GF(2^32)'s new addition operator is just an xor.
| Only multiplication is difficult.
|
| But once you do that, it all __JUST WORKS__ and is amazing.
|
| ----------
|
| Galois Fields are a way to "cheat" a definition of logarithm,
| exponents, square roots, multiplication, division, and addition
| (by redefining these operations).
|
| It turns out that these operations continue to work "as
| expected" in the Galois Field... as long as you're cool with
| the new definitions of them.
|
| ------
|
| EDIT: I'm slightly wrong. Seems like Curve25519 is a prime
| field and not an extension field. So this is more similar to
| GF5 than it is to GF(2^32). Still, other elliptical curves are
| in GF(2^32) or other extension fields, so the study of
| extension fields is useful for other curves.
|
| Curve25519 can be understood with just the GF(5) description I
| gave earlier in this post. Prime fields are really simple: just
| mod(the prime number) and you're set. There's some weird
| properties that can accelerate speed (Any division by Foo is
| equivalent to multiplying by (1/Foo), which might be faster).
| fogof wrote:
| > GF5 is the 3rd smallest finite field (GF2 and GF3 are
| smaller)
|
| There is also a field of 4 elements, since 4 is a power of a
| prime.
| dragontamer wrote:
| Fair. I guess I was implicitly staying away from extension
| fields because people's attention span will start drifing
| away the minute I start to attempt to explain how to
| perform modulo polynomial arithmetic over irreducible
| polynomials.
|
| Prime fields (GF2, GF3, GF5...) are just easier to explain
| than extension fields (GF (2^2), GF(3^2), GF(5^2)...
| GF(2^8)...)
| Kalium wrote:
| This feels, to me, like the complaint of someone with a general
| technical grounding going into a narrow specialization and
| discovering they understand none of the terminology.
|
| This is _exactly_ how I felt when I was learning Haskell and
| hearing about eta conversions and pointfreeness and so on. Each
| new bit of terminology had a precise definition, but
| specialists writing primarily for one another were not inclined
| to offer definitions each time. They didn 't need them and
| neither did their anticipated audience. I had to learn the
| terminology and jargon so I could understand what they were
| talking about, and high-level abstracted explanations were not
| going to be enough to participate usefully. I had to learn the
| basics.
|
| Similarly, Watson & Crick's seminal paper might not have been
| the best of all possible times to stop and define an X-ray or
| how crystallography worked.
|
| So yes, just instantiate the metaclass, implement the
| interface, and go buy milk. Easy. You just have to learn what
| each of those things actually means, especially if the details
| might be important (and in cryptography, that's always the
| case).
| bunnie wrote:
| Thanks! It's good to know I'm not alone in feeling this way.
|
| This is actually an abridged version, if you click into the
| link for the "datasheet" (https://ci.betrusted.io/betrusted-
| soc/doc/engine.html) you can find all the gory details down to
| the register level...
| staticassertion wrote:
| Awesome, much appreciated, I'll dig into it. I was actually
| just using the dalek crate myself quite heavily this week, so
| this is great timing.
| amirhirsch wrote:
| I liked your struggle with the corner-case :)
___________________________________________________________________
(page generated 2021-07-15 23:00 UTC)