https://lab.whitequark.org/notes/2020-04-06/synthesizing-optimal-8051-code/ * whitequark's lab notebook * * Atom feed * E-mail * Twitter * GitHub 2020-04-06 Synthesizing optimal 8051 code Tags: * software While working on an application targeting Nordic nRF24LE1, a wireless SoC with a fairly slow 8051 core, I was wondering if I can have fast, or at least not unusably slow, cryptography. Most cryptographic algorithms involve wide rotates, and the 8051 only has instructions for rotating a 8-bit accumulator by one bit at a time. In this note I explore deriving optimal code for rotating values in registers (that may be bigger than 8 bits) by multiple bits. * Introduction * Code generator * Results + 8-bit rotates + 16-bit rotates Introduction My chosen approach (thanks to John Regehr for the suggestion) is to implement an interpreter for an abstract 8051 assembly representation in Racket and then use Rosette to translate assertions about the results of interpreting an arbitrary piece of code into a query for an SMT solver. Rosette greatly simplifies this task because it lets me avoid learning anything about SMT solvers, and only requires me to understand the constraints of its symbolic execution approach. (Only a small subset of Racket is safe to use in Rosette, and functions outside of that subset are hard to use correctly without an in-depth understaning of how Rosette works.) Code generator The following code generates all possible optimal (more on that below) 8-bit and 16-bit rotates. It uses a rather hacky and complicated scheme where it runs several solvers in parallel, one per CPU, each aiming for a particular fixed number of instructions, and then picks the smallest result as the solvers finish. This is because at the time of writing it, I did not understand that Rosette allows optimizing exists-forall problems. (It is quite easy to do so, as I describe in a later note.) However, that turned out to be a blessing in disguise; when writing this note, I rewrote the query as an optimization problem for the solver, and it doesn't seem like that would work for this use case. First, of the solvers that can be used by Rosette, only Z3 supports quantified formulas, whereas Boolector had the best performance with the simpler queries. Second, even for very small programs (such as 8-bit rotates, which all fit in 4 instructions, and even restricting the usable registers to 2 out of 8), the memory footprint of Z3 grows extremely quickly, and I always ran out of memory before getting a solution. By "optimal" here I mean "optimal within the limited model being used", of course. The model I'm using specifically omits any memory access (preventing the use of the XCHD instruction among other things), and in general has a very limited number of instructions to make solver runtime manageable. It is possible (but unlikely) that some of the instructions missing in the model but present in every 8051 CPU provide a faster way to do rotates. It is possible (and fairly likely) that your specific flavor of 8051 CPU provides a faster way to do rotates that involves memory-mapped I/O; indeed, nRF24LE1 does, but I was interested in more portable code. synth51.rkt (download) 1 #lang rosette/safe 2 3 (require (only-in racket hash in-range for for/list with-handlers flush-output 4 thread thread-wait break-thread exn:break? 5 make-semaphore semaphore-wait semaphore-post call-with-semaphore/enable-break 6 processor-count)) 7 (require rosette/solver/smt/z3 8 rosette/solver/smt/boolector 9 rosette/solver/smt/yices) 10 ;(current-solver (z3 #:logic 'QF_BV #:options (hash 11 ; ':parallel.enable 'true 12 ; ':parallel.threads.max 4))) 13 ;(current-solver (yices #:logic 'QF_BV)) 14 (current-solver (boolector #:logic 'QF_BV)) 15 16 (require rosette/lib/angelic 17 rosette/lib/match) 18 (current-bitwidth 5) 19 20 ; bit operations 21 (define (rotate-right s i x) 22 (cond 23 [(= i 0) x] 24 [else (concat (extract (- i 1) 0 x) (extract (- s 1) i x))])) 25 (define (rotate-left s i x) 26 (cond 27 [(= i 0) x] 28 [else (concat (extract (- s i 1) 0 x) (extract (- s 1) (- s i) x))])) 29 30 (define (replace-bit s i x y) 31 (define m (bvshl (bv 1 s) (integer->bitvector i s))) 32 (cond 33 [(bveq y (bv 0 1)) (bvand x (bvnot m))] 34 [(bveq y (bv 1 1)) (bvor x m)] 35 [else (assert #f)])) 36 37 ; CPU state 38 (struct state (A C Rn) #:mutable #:transparent) 39 40 (define (state-Rn-ref S n) 41 (vector-ref (state-Rn S) n)) 42 (define (state-Rn-set! S n v) 43 (vector-set! (state-Rn S) n v)) 44 (define (state-R0 S) (state-Rn-ref S 0)) 45 (define (state-R1 S) (state-Rn-ref S 1)) 46 (define (state-R2 S) (state-Rn-ref S 2)) 47 (define (state-R3 S) (state-Rn-ref S 3)) 48 (define (state-R4 S) (state-Rn-ref S 4)) 49 (define (state-R5 S) (state-Rn-ref S 5)) 50 (define (state-R6 S) (state-Rn-ref S 6)) 51 (define (state-R7 S) (state-Rn-ref S 7)) 52 53 (define-symbolic A R0 R1 R2 R3 R4 R5 R6 R7 (bitvector 8)) 54 (define-symbolic C (bitvector 1)) 55 (define (make-state) 56 (state A C (vector R0 R1 R2 R3 R4 R5 R6 R7))) 57 58 ; instructions 59 (struct MOV-A-Rn (n) #:transparent) 60 (struct MOV-Rn-A (n) #:transparent) 61 (struct ANL-A-Rn (n) #:transparent) 62 (struct ORL-A-Rn (n) #:transparent) 63 (struct XRL-A-Rn (n) #:transparent) 64 (struct XCH-A-Rn (n) #:transparent) 65 (struct MOV-A-i (i) #:transparent) 66 (struct ANL-A-i (i) #:transparent) 67 (struct ORL-A-i (i) #:transparent) 68 (struct SWAP-A () #:transparent) 69 (struct CLR-C () #:transparent) 70 (struct MOV-C-An (n) #:transparent) 71 (struct MOV-An-C (n) #:transparent) 72 (struct RLC-A () #:transparent) 73 (struct RRC-A () #:transparent) 74 (struct RL-A () #:transparent) 75 (struct RR-A () #:transparent) 76 77 (define (print-insn insn) 78 (match insn 79 [(MOV-A-Rn n) (printf "MOV A, R~s~n" n)] 80 [(MOV-Rn-A n) (printf "MOV R~s, A~n" n)] 81 [(ANL-A-Rn n) (printf "ANL A, R~s~n" n)] 82 [(ORL-A-Rn n) (printf "ORL A, R~s~n" n)] 83 [(XRL-A-Rn n) (printf "XRL A, R~s~n" n)] 84 [(XCH-A-Rn n) (printf "XCH A, R~s~n" n)] 85 [(MOV-A-i i) (printf "MOV A, #0x~x~n" (bitvector->natural i))] 86 [(ANL-A-i i) (printf "ANL A, #0x~x~n" (bitvector->natural i))] 87 [(ORL-A-i i) (printf "ORL A, #0x~x~n" (bitvector->natural i))] 88 [(SWAP-A) (printf "SWAP A~n")] 89 [(CLR-C) (printf "CLR C~n")] 90 [(MOV-C-An n) (printf "MOV C, ACC.~s~n" n)] 91 [(MOV-An-C n) (printf "MOV ACC.~s, C~n" n)] 92 [(RLC-A) (printf "RLC A~n")] 93 [(RRC-A) (printf "RRC A~n")] 94 [(RL-A) (printf "RL A~n")] 95 [(RR-A) (printf "RR A~n")])) 96 97 ; sketches 98 (define (??insn) 99 (define n (choose* 0 1)); 2 3 4 5 6 7)) 100 (define-symbolic* i (bitvector 8)) 101 ;(define i (choose* (bv #xf0 8) (bv #x0f 8))) 102 (choose* (MOV-A-Rn n) 103 (MOV-Rn-A n) 104 (ANL-A-Rn n) 105 (ORL-A-Rn n) 106 (XRL-A-Rn n) 107 (XCH-A-Rn n) 108 (MOV-A-i i) 109 (ANL-A-i i) 110 (ORL-A-i i) 111 (SWAP-A) 112 (CLR-C) 113 (MOV-C-An n) 114 (MOV-An-C n) 115 (RLC-A) 116 (RRC-A) 117 (RL-A) 118 (RR-A))) 119 120 (define (??prog fuel) 121 (if (= fuel 0) null 122 (cons (??insn) (??prog (- fuel 1))))) 123 124 ; symbolic interpreter 125 (define (run-insn S insn) 126 (match insn 127 [(MOV-A-Rn n) 128 (set-state-A! S (state-Rn-ref S n))] 129 [(MOV-Rn-A n) 130 (state-Rn-set! S n (state-A S))] 131 [(ANL-A-Rn n) 132 (set-state-A! S (bvand (state-A S) (state-Rn-ref S n)))] 133 [(ORL-A-Rn n) 134 (set-state-A! S (bvor (state-A S) (state-Rn-ref S n)))] 135 [(XRL-A-Rn n) 136 (set-state-A! S (bvxor (state-A S) (state-Rn-ref S n)))] 137 [(XCH-A-Rn n) 138 (let ([A (state-A S)] [Rn (state-Rn-ref S n)]) 139 (set-state-A! S Rn) (state-Rn-set! S n A))] 140 [(MOV-A-i i) 141 (set-state-A! S i)] 142 [(ANL-A-i i) 143 (set-state-A! S (bvand (state-A S) i))] 144 [(ORL-A-i i) 145 (set-state-A! S (bvor (state-A S) i))] 146 [(SWAP-A) 147 (let ([A (state-A S)]) 148 (set-state-A! S (concat (extract 3 0 A) (extract 7 4 A))))] 149 [(CLR-C) 150 (set-state-C! S (bv 0 1))] 151 [(MOV-C-An n) 152 (set-state-C! S (extract n n (state-A S)))] 153 [(MOV-An-C n) 154 (set-state-A! S (replace-bit 8 n (state-A S) (state-C S)))] 155 [(RLC-A) 156 (let ([A (state-A S)] [C (state-C S)]) 157 (set-state-A! S (concat (extract 6 0 A) C)) 158 (set-state-C! S (extract 7 7 A)))] 159 [(RRC-A) 160 (let ([A (state-A S)] [C (state-C S)]) 161 (set-state-A! S (concat C (extract 7 1 A))) 162 (set-state-C! S (extract 0 0 A)))] 163 [(RL-A) 164 (let ([A (state-A S)]) 165 (set-state-A! S (concat (extract 6 0 A) (extract 7 7 A))))] 166 [(RR-A) 167 (let ([A (state-A S)]) 168 (set-state-A! S (concat (extract 0 0 A) (extract 7 1 A))))] 169 )) 170 171 ; program verifier 172 (define (verify-prog prog asserts) 173 (define S (make-state)) 174 (define S* (make-state)) 175 (define solution 176 (verify 177 #:guarantee 178 (begin 179 (for-each (curry run-insn S*) prog) 180 (asserts S S*)))) 181 (if (unsat? solution) #t 182 (begin 183 (displayln (evaluate S solution)) 184 (displayln (evaluate S* solution)) 185 #f))) 186 187 ; program synthesizer 188 (define (synthesize-prog sketch asserts) 189 (define S (make-state)) 190 (define S* (make-state)) 191 (define solution 192 (synthesize 193 #:forall S 194 #:guarantee 195 (begin 196 (for-each (lambda (insn) (run-insn S* insn)) sketch) 197 (asserts S S*)))) 198 (if (unsat? solution) #f 199 (evaluate sketch solution))) 200 201 (define (optimize-prog max-fuel sketch-gen asserts) 202 (define (worker fuel) 203 (define prog (synthesize-prog (sketch-gen fuel) asserts)) 204 (if prog 205 (begin 206 (eprintf "sat! ~s~n" fuel) 207 (for-each print-insn prog)) 208 (begin 209 (eprintf "unsat! ~s~n" fuel) 210 (if (>= fuel max-fuel) #f 211 (worker (+ fuel 1)))))) 212 (worker 0)) 213 214 (define (optimize-prog/parallel max-fuel sketch-gen asserts) 215 (define solved (box #f)) 216 (define solved-fuel (box 1000)) 217 (define threads (box '())) 218 (define report-sema (make-semaphore 1)) 219 (define (worker fuel) 220 (cond 221 [(or (not (unbox solved)) (< fuel (unbox solved-fuel))) 222 (define prog (synthesize-prog (sketch-gen fuel) asserts)) 223 (call-with-semaphore/enable-break report-sema 224 (lambda () 225 (if prog 226 (begin 227 (eprintf "sat! ~s~n" fuel) 228 (for-each (lambda (thd-fuel) 229 (if (> (cdr thd-fuel) fuel) 230 (break-thread (car thd-fuel)) 231 (void))) (unbox threads)) 232 (if (or (not (unbox solved)) (< fuel (unbox solved-fuel))) 233 (begin 234 (set-box! solved-fuel fuel) 235 (set-box! solved prog)) 236 (void))) 237 (eprintf "unsat! ~s~n" fuel))))])) 238 (define core-sema (make-semaphore (processor-count))) 239 (for ([fuel (in-range (add1 max-fuel))]) 240 (semaphore-wait core-sema) 241 (define thd 242 (thread (lambda () 243 (with-handlers ([exn:break? (lambda (x) (void))]) 244 (worker fuel)) 245 (semaphore-post core-sema)))) 246 (set-box! threads (cons (cons thd fuel) (unbox threads)))) 247 (for-each (lambda (thd-fuel) (thread-wait (car thd-fuel))) (unbox threads)) 248 (if (not (unbox solved)) (void) 249 (begin 250 (for-each print-insn (unbox solved)) 251 (flush-output)))) 252 253 (define (assert-preserve S S* . regs) 254 (define (assert-preserve-reg n) 255 (assert (bveq (state-Rn-ref S n) (state-Rn-ref S* n)))) 256 (for-each assert-preserve-reg regs)) 257 258 ; examples 259 (define (optimize-8b-rotate-right n) 260 (optimize-prog/parallel 261 4 262 (lambda (fuel) (??prog fuel)) 263 (lambda (S S*) 264 (assert (bveq (rotate-right 8 n (state-R0 S)) (state-R0 S*))) 265 (assert-preserve S S* 1 2 3 4 5 6 7)))) 266 (for ([n (in-range 8)]) 267 (time 268 (printf "; rotate right R0 by ~a~n" n) (flush-output) 269 (optimize-8b-rotate-right n) 270 (printf "; "))) 271 272 (define (optimize-16b-rotate-right n) 273 (optimize-prog/parallel 274 20 275 (lambda (fuel) 276 (if (= fuel 0) '() 277 (append 278 ; help the synthesizer out a bit 279 (list (MOV-A-Rn (choose* 0 1))) 280 (??prog fuel) 281 (list (MOV-Rn-A (choose* 0 1)))))) 282 (lambda (S S*) 283 (define R10 (concat (state-R1 S ) (state-R0 S ))) 284 (define R10* (concat (state-R1 S*) (state-R0 S*))) 285 (assert (bveq (rotate-right 16 n R10) R10*)) 286 (assert-preserve S S* 2 3 4 5 6 7)))) 287 (for ([n (in-range 16)]) 288 (time 289 (printf "; rotate right R1:R0 by ~a~n" n) (flush-output) 290 (optimize-16b-rotate-right n) 291 (printf "; "))) Results Generating the optimal 8-bit and 16-bit rotates took about half a day on a modern laptop (Dell XPS13 9360, using all cores and with mitigations disabled). Because of that I have not attempted generating wider ones so far. 8-bit rotates The following code lists all optimal 8-bit rotates, by 0 to 7 bits. rot8.asm (download) 1 ; rotate right R0 by 0 2 ; rotate right R0 by 1 3 MOV A, R0 4 RR A 5 MOV R0, A 6 ; rotate right R0 by 2 7 MOV A, R0 8 RR A 9 RR A 10 MOV R0, A 11 ; rotate right R0 by 3 12 XCH A, R0 13 RL A 14 SWAP A 15 MOV R0, A 16 ; rotate right R0 by 4 17 MOV A, R0 18 SWAP A 19 MOV R0, A 20 ; rotate right R0 by 5 21 MOV A, R0 22 SWAP A 23 RR A 24 XCH A, R0 25 ; rotate right R0 by 6 26 MOV A, R0 27 RL A 28 RL A 29 MOV R0, A 30 ; rotate right R0 by 7 31 MOV A, R0 32 RL A 33 MOV R0, A 16-bit rotates The following code lists all optimal 16-bit rotates, by 0 to 15 bits. I find the approach the solver used for the rotate by 10 nothing short of brilliant, and the approach it took for rotate by 3/5/11/13 pretty neat as well. rot16.asm (download) 1 ; rotate right R1:R0 by 0 2 ; rotate right R1:R0 by 1 3 MOV A, R1 4 MOV C, ACC.0 5 XCH A, R0 6 RRC A 7 XCH A, R0 8 RRC A 9 MOV R1, A 10 ; rotate right R1:R0 by 2 11 MOV A, R1 12 MOV C, ACC.0 13 XCH A, R0 14 RRC A 15 XCH A, R0 16 RRC A 17 MOV C, ACC.0 18 XCH A, R0 19 RRC A 20 XCH A, R0 21 RRC A 22 MOV R1, A 23 ; rotate right R1:R0 by 3 24 MOV A, R0 25 XRL A, R1 26 MOV R1, A 27 ANL A, #0xf8 28 XRL A, R0 29 SWAP A 30 RL A 31 XCH A, R1 32 SWAP A 33 RL A 34 XRL A, R1 35 MOV R0, A 36 ; rotate right R1:R0 by 4 37 MOV A, R0 38 XRL A, R1 39 ANL A, #0xf0 40 XCH A, R1 41 XRL A, R1 42 SWAP A 43 XCH A, R0 44 XRL A, R1 45 SWAP A 46 MOV R1, A 47 ; rotate right R1:R0 by 5 48 MOV A, R1 49 XRL A, R0 50 ANL A, #0x1f 51 XCH A, R0 52 XRL A, R0 53 RR A 54 SWAP A 55 XCH A, R0 56 XRL A, R1 57 SWAP A 58 RR A 59 MOV R1, A 60 ; rotate right R1:R0 by 6 61 MOV A, R1 62 RLC A 63 XCH A, R0 64 RLC A 65 XCH A, R1 66 RLC A 67 MOV C, ACC.7 68 XCH A, R1 69 RLC A 70 XCH A, R1 71 RLC A 72 MOV R0, A 73 ; rotate right R1:R0 by 7 74 MOV A, R0 75 MOV C, ACC.7 76 MOV A, R1 77 RLC A 78 XCH A, R0 79 RLC A 80 MOV R1, A 81 ; rotate right R1:R0 by 8 82 MOV A, R1 83 XCH A, R0 84 MOV R1, A 85 ; rotate right R1:R0 by 9 86 MOV A, R1 87 RRC A 88 MOV A, R0 89 RRC A 90 XCH A, R1 91 RRC A 92 MOV R0, A 93 ; rotate right R1:R0 by 10 94 MOV A, R1 95 XRL A, R0 96 ANL A, #0x3 97 XRL A, R1 98 RR A 99 RR A 100 XCH A, R0 101 XRL A, R1 102 RR A 103 RR A 104 XRL A, R0 105 MOV R1, A 106 ; rotate right R1:R0 by 11 107 MOV A, R1 108 XRL A, R0 109 MOV R1, A 110 ANL A, #0x7 111 XRL A, R0 112 RL A 113 SWAP A 114 XCH A, R1 115 RL A 116 SWAP A 117 XRL A, R1 118 MOV R0, A 119 ; rotate right R1:R0 by 12 120 MOV A, R0 121 XRL A, R1 122 ANL A, #0xf0 123 XCH A, R1 124 XRL A, R1 125 SWAP A 126 XCH A, R1 127 XRL A, R0 128 SWAP A 129 MOV R0, A 130 ; rotate right R1:R0 by 13 131 MOV A, R1 132 XRL A, R0 133 ANL A, #0xe0 134 XCH A, R0 135 XRL A, R0 136 SWAP A 137 RR A 138 XCH A, R0 139 XRL A, R1 140 RR A 141 SWAP A 142 MOV R1, A 143 ; rotate right R1:R0 by 14 144 MOV A, R1 145 MOV C, ACC.7 146 XCH A, R0 147 RLC A 148 XCH A, R0 149 RLC A 150 MOV C, ACC.7 151 XCH A, R0 152 RLC A 153 XCH A, R0 154 RLC A 155 MOV R1, A 156 ; rotate right R1:R0 by 15 157 MOV A, R1 158 MOV C, ACC.7 159 XCH A, R0 160 RLC A 161 XCH A, R0 162 RLC A 163 MOV R1, A --------------------------------------------------------------------- Want to discuss this note? Drop me a letter. Drafts whitequark 0 2014-2018 CC0