https://sanket.tech/posts/invert_binary_tree_assembly/ * Home * Posts * About * * * * Previous post Next post Back to top Share post Home * * * * * * * * * * Some ground rules * Pseudocode * Basic setup + Function prologue and epilogue + Stack alignment * Base case * Struct alignment + Storing local variables on the stack * Do the recursive calls * Submitting to leetcode * Conclusion * HN Front page update Inverting a binary tree using x64 assembly Sanket Shanbhag Nov 13 2022 While browsing twitter, I came across this tweet: Tweet asking you to invert merkle tree Challenge accepted Some ground rules The task is to invert a binary tree. A binary tree node is defined as follows: struct TreeNode{ int val; struct TreeNode * left; struct Treenode * right; }; This definition is from Leetcode 226. After we are done, we can submit our code there to see if it works. I will be using x64 assembly with the AT&T syntax as it is objectively superior than the Intel syntax. Technically the tweet asks to invert a merkle tree, however you can easily extend the final code to recompute the hash at each node after its two subtrees are inverted. Pseudocode The function we need to implement is as follows: struct TreeNode * invertTree(struct TreeNode * root); We will begin by writing some pseudocode 1 function invert(root) { 2 let local_root = root; 3 if local_root is null { 4 return null 5 } 6 7 let local_left = local_root.left; 8 let local_right = local_root.right; 9 10 root.right = invert(local_left); 11 root.left = invert(local_right); 12 13 return local_root; 14 } This pseudocode is intentionally verbose so we can easily track all local variables we have to create and store on the stack. We will be translating this pseudocode as it is to assembly. Basic setup 1 .globl invert_tree 2 .type invert_tree, @function 3 .section .text 4 invert_tree: 5 # Our function goes here Lines 1 and 2 tell the assembler that we are defining an invert_tree symbol of type function. The globl modifier allows our function to be called from other source files. The .text section is the Code segment where we will be writing our function. Function prologue and epilogue To define this function in assembly, we first need to understand the x86-64 calling convention. This is documented in the System V ABI available here. 1 invert_tree: 2 # Function prologue 3 push %rbp 4 mov %rsp, %rbp 5 sub $32, %rsp 6 7 # Function epilogue 8 leave 9 ret First, we push the current value of %rbp on the stack and copy %rsp to %rbp. We then subtract 32 bytes from the stack pointer to make space to store our local variables. Doing this sets up a stack frame for this function, with %rbp as the "base". We can now use offsets from %rbp to access data on the stack. There is an enter instruction which sets up the stack frame for us, but it is very slow. On modern processors enter decodes to some 10 to 20 uops, while the three instruction sequence is about 4 to 6, depending on the architecture. Source The leave instruction does exactly the opposite of lines 3 and 4. It restores %rsp and pops the value from stack into %rbp. Stack alignment In our pseudocode, there are 3 local variables local_root, local_left and local_right. Each of these is a pointer type, thus they require a total of 24 bytes to be stored on the stack (A pointer type has size 8 bytes in x64). However, we instead make space for 32 bytes to ensure our stack is "aligned" correctly. From the System V ABI document: "The stack needs to be 16 byte aligned immediately before any call instruction is executed." Base case The function accepts a single pointer as input, thus according to the calling convention, we will be receiving it as a 64 bit value in the register %rdi. To return a pointer from our function, we have to store it in the register %rax before we ret from our function. We check if the root is NULL, and if it is, we return NULL. 1 invert_tree: 2 # Function prologue 3 push %rbp 4 mov %rsp, %rbp 5 sub $32, %rsp 6 7 # --------- Base case --------- 8 # %rdi contains the input parameter (8 byte pointer to the root) 9 cmp $0, %rdi 10 # Root not NULL, continue with execution 11 jne continue 12 # Root is NULL, store NULL in %rax and return 13 xor %rax, %rax 14 jmp done 15 continue: 16 17 done: 18 # Function epilogue 19 leave 20 ret In x64, NULL is the value 0. We compare %rdi with 0 and if it succeeds, we put 0 in %rax and return from our function. To do this, using xor is faster than mov or sub. Struct alignment On most ISA's each data type in a struct has an alignment requirement. What this means is that the address of a type must start on an address which is a multiple of their size. (There are some exceptions, but this is the general rule) So int's must start on an address which is a multiple of 4, and pointers must start on an address which is a multiple of 8. Given this, our original struct is laid out in memory something like so: struct TreeNode{ int val; char pad[4]; // 4 extra bytes to align the next pointer struct TreeNode * left; struct Treenode * right; }; This wastes 4 bytes for every struct in memory. We can save this space by declaring the pointers first, however we will continue to use the original struct declaration as we want to submit this to Leetcode. Storing local variables on the stack Once we know the address offsets in our struct, we can then store them on the stack. We will store local_root, left and right pointers on the stack. Each of them has a size of 8 bytes. We store them as follows Variable location start location end local_root %rbp - 8 %rbp local_left %rbp - 16 %rbp - 8 local_right %rbp - 24 %rbp - 16 Also, to make referring to them easier, we can define these offsets as constants using the equ assembler directive. 1 # ... 2 .section .text 3 .equ local_root, -8 4 .equ local_left, -16 5 .equ local_right, -24 6 7 invert_tree: 8 # ... function here We can then use them in our code. 1 invert_tree: 2 # ... 3 continue: 4 # --------- Save local variables to stack --------- 5 # Save the root parameter as local_root on stack 6 movq %rdi, local_root(%rbp) 7 8 # Advance 8 bytes from root, to get left pointer in struct. 9 movq 8(%rdi), %rdx 10 # Store this as local_left to stack 11 movq %rdx, local_left(%rbp) 12 13 # Advance 16 bytes from root, to get to right pointer in struct. 14 movq 16(%rdi), %rdx 15 # Store this as local_right to stack 16 movq %rdx, local_right(%rbp) 17 18 done: 19 # ... Do the recursive calls At this point, looking at the pseudocode, we simply need to perform 2 recursive calls. To do this, we store the correct parameter to %rdi and use the call instruction. 1 invert_tree: 2 # Function prologue 3 push %rbp 4 mov %rsp, %rbp 5 sub $32, %rsp 6 7 # --------- Base case --------- 8 # %rdi contains the input parameter (8 byte pointer to the root) 9 cmp $0, %rdi 10 # Root not NULL, continue with execution 11 jne continue 12 # Root is NULL, store NULL in %rax and return 13 xor %rax, %rax 14 jmp done 15 continue: 16 # --------- Save local variables to stack --------- 17 # Save the root parameter as local_root on stack 18 movq %rdi, local_root(%rbp) 19 20 # Advance 8 bytes from root, to get left pointer in struct. 21 movq 8(%rdi), %rdx 22 # Store this as local_left to stack 23 movq %rdx, local_left(%rbp) 24 25 # Advance 16 bytes from root, to get to right pointer in struct. 26 movq 16(%rdi), %rdx 27 # Store this as local_right to stack 28 movq %rdx, local_right(%rbp) 29 30 # --------- Recursive calls --------- 31 # Recursively call invert_tree on local_right 32 movq local_right(%rbp), %rdi 33 call invert_tree 34 35 # Calculate address of root.left 36 movq local_root(%rbp), %rdx 37 addq $8, %rdx 38 # Assign root.left to value returned from recursive call 39 movq %rax, (%rdx) 40 41 # Recursively call invert_tree on local_left 42 movq local_left(%rbp), %rdi 43 call invert_tree 44 45 # Calculate address of root.right 46 movq local_root(%rbp), %rdx 47 addq $16, %rdx 48 # Assign root.right to value returned from recursive call 49 movq %rax, (%rdx) 50 51 # Return root 52 movq local_root(%rbp), %rax 53 54 done: 55 # Function epilogue 56 leave 57 ret To assign the results of the recursive calls, we use the %rdx register to compute the correct addresses of root.left and root.right. Most of this is a straight translation of the pseudocode. Submitting to leetcode We cannot submit assembly to Leetcode, so we will use C instead. Leetcode uses gcc to compile our code, so we can use the __asm__ directive to add our assembly. We need to also declare our function with extern before we can call it. 1 typedef struct TreeNode TreeNode; 2 extern TreeNode* invert_tree(TreeNode*); 3 4 __asm__(".global invert_tree \n\t" 5 ".type invert_tree, @function \n\t" 6 ".section .text \n\t" 7 // Rest of our code here... 8 ); 9 10 // Leetcode func 11 TreeNode* invertTree(TreeNode* root){ 12 return invert_tree(root); 13 } And we are done! [leetcode_a] The leetcode performance and memory numbers are very unreliable, so we don't really know how much faster our program is compared to other submissions, but I suspect it is pretty fast. [leetcode_s] Conclusion This was a fun excursion to brush up on my assembly skills. It wasn't difficult, but definitely was tedious (This is generally how I feel about most leetcode style questions). Max Howell rejected from Google At least now I can relate to more memes on r/ProgrammerHumor Max Howell rejected from Google HN Front page update Since a lot of people from HN and reddit are here, time for a shameless plug: I am actively looking for a python/rust/quant job. Here's my Github Email and LinkedIn. Thanks for reading. Happy Thanksgiving! * Home * Posts * About * Some ground rules * Pseudocode * Basic setup + Function prologue and epilogue + Stack alignment * Base case * Struct alignment + Storing local variables on the stack * Do the recursive calls * Submitting to leetcode * Conclusion * HN Front page update * * * * * * * * * Menu TOC share Top Copyright (c) 2022 Sanket Shanbhag * Home * Posts * About