https://sanket.tech/posts/invert_binary_tree_assembly/

   

  * Home
  * Posts
  * About


  *  
  *  
  *  
  *  

Previous post Next post Back to top Share post Home

  *  
  *  
  *  
  *  
  *  
  *  
  *  
  *  
  *  

  * Some ground rules
  * Pseudocode
  * Basic setup
      + Function prologue and epilogue
      + Stack alignment
  * Base case
  * Struct alignment
      + Storing local variables on the stack
  * Do the recursive calls
  * Submitting to leetcode
  * Conclusion
  * HN Front page update

Inverting a binary tree using x64 assembly

Sanket Shanbhag
Nov 13 2022

While browsing twitter, I came across this tweet:

Tweet asking you to invert merkle tree

Challenge accepted

Some ground rules

The task is to invert a binary tree. A binary tree node is defined as
follows:

struct TreeNode{
    int val;
    struct TreeNode * left;
    struct Treenode * right;
};

This definition is from Leetcode 226. After we are done, we can
submit our code there to see if it works.

I will be using x64 assembly with the AT&T syntax as it is
objectively superior than the Intel syntax.

Technically the tweet asks to invert a merkle tree, however you can
easily extend the final code to recompute the hash at each node after
its two subtrees are inverted.

Pseudocode

The function we need to implement is as follows:

struct TreeNode * invertTree(struct TreeNode * root);

We will begin by writing some pseudocode

 1 function invert(root) {
 2     let local_root = root;
 3     if local_root is null {
 4         return null
 5     }
 6 
 7     let local_left = local_root.left;
 8     let local_right = local_root.right;
 9 
10     root.right = invert(local_left);
11     root.left = invert(local_right);
12 
13     return local_root;
14 }

This pseudocode is intentionally verbose so we can easily track all
local variables we have to create and store on the stack. We will be
translating this pseudocode as it is to assembly.

Basic setup

1 .globl invert_tree
2 .type invert_tree, @function
3 .section .text
4 invert_tree:
5     # Our function goes here

Lines 1 and 2 tell the assembler that we are defining an invert_tree
symbol of type function. The globl modifier allows our function to be
called from other source files. The .text section is the Code segment
where we will be writing our function.

Function prologue and epilogue

To define this function in assembly, we first need to understand the
x86-64 calling convention. This is documented in the System V ABI
available here.

1 invert_tree:
2     # Function prologue
3     push %rbp
4     mov %rsp, %rbp
5     sub $32, %rsp
6 
7     # Function epilogue
8     leave
9     ret

First, we push the current value of %rbp on the stack and copy %rsp
to %rbp. We then subtract 32 bytes from the stack pointer to make
space to store our local variables.

Doing this sets up a stack frame for this function, with %rbp as the
"base". We can now use offsets from %rbp to access data on the stack.

There is an enter instruction which sets up the stack frame for us,
but it is very slow. On modern processors enter decodes to some 10 to
20 uops, while the three instruction sequence is about 4 to 6,
depending on the architecture. Source

The leave instruction does exactly the opposite of lines 3 and 4. It
restores %rsp and pops the value from stack into %rbp.

Stack alignment

In our pseudocode, there are 3 local variables local_root, local_left
and local_right. Each of these is a pointer type, thus they require a
total of 24 bytes to be stored on the stack (A pointer type has size
8 bytes in x64).

However, we instead make space for 32 bytes to ensure our stack is
"aligned" correctly. From the System V ABI document:

"The stack needs to be 16 byte aligned immediately before any call
instruction is executed."

Base case

The function accepts a single pointer as input, thus according to the
calling convention, we will be receiving it as a 64 bit value in the
register %rdi.

To return a pointer from our function, we have to store it in the
register %rax before we ret from our function.

We check if the root is NULL, and if it is, we return NULL.

 1 invert_tree:
 2     # Function prologue
 3     push %rbp
 4     mov %rsp, %rbp
 5     sub $32, %rsp
 6 
 7     # --------- Base case ---------
 8     # %rdi contains the input parameter (8 byte pointer to the root)
 9     cmp $0, %rdi
10     # Root not NULL, continue with execution
11     jne continue
12     # Root is NULL, store NULL in %rax and return
13     xor %rax, %rax
14     jmp done
15 continue:
16 
17 done:
18     # Function epilogue
19     leave
20     ret

In x64, NULL is the value 0. We compare %rdi with 0 and if it
succeeds, we put 0 in %rax and return from our function. To do this,
using xor is faster than mov or sub.

Struct alignment

On most ISA's each data type in a struct has an alignment
requirement. What this means is that the address of a type must start
on an address which is a multiple of their size. (There are some
exceptions, but this is the general rule)

So int's must start on an address which is a multiple of 4, and
pointers must start on an address which is a multiple of 8.

Given this, our original struct is laid out in memory something like
so:

struct TreeNode{
    int val;
    char pad[4];  // 4 extra bytes to align the next pointer
    struct TreeNode * left;
    struct Treenode * right;
};

This wastes 4 bytes for every struct in memory. We can save this
space by declaring the pointers first, however we will continue to
use the original struct declaration as we want to submit this to
Leetcode.

Storing local variables on the stack

Once we know the address offsets in our struct, we can then store
them on the stack. We will store local_root, left and right pointers
on the stack. Each of them has a size of 8 bytes. We store them as
follows

                Variable   location start location end
               local_root  %rbp - 8       %rbp
               local_left  %rbp - 16      %rbp - 8
               local_right %rbp - 24      %rbp - 16

Also, to make referring to them easier, we can define these offsets
as constants using the equ assembler directive.

1 # ...
2 .section .text
3 .equ local_root, -8
4 .equ local_left, -16
5 .equ local_right, -24
6 
7 invert_tree:
8 # ... function here

We can then use them in our code.

 1 invert_tree:
 2     # ...
 3 continue:
 4     # --------- Save local variables to stack ---------
 5     # Save the root parameter as local_root on stack
 6     movq %rdi, local_root(%rbp)
 7 
 8     # Advance 8 bytes from root, to get left pointer in struct.
 9     movq 8(%rdi), %rdx
10     # Store this as local_left to stack
11     movq %rdx, local_left(%rbp)
12 
13     # Advance 16 bytes from root, to get to right pointer in struct.
14     movq 16(%rdi), %rdx
15     # Store this as local_right to stack
16     movq %rdx, local_right(%rbp)
17 
18 done:
19     # ...

Do the recursive calls

At this point, looking at the pseudocode, we simply need to perform 2
recursive calls.

To do this, we store the correct parameter to %rdi and use the call
instruction.

 1 invert_tree:
 2     # Function prologue
 3     push %rbp
 4     mov %rsp, %rbp
 5     sub $32, %rsp
 6 
 7     # --------- Base case ---------
 8     # %rdi contains the input parameter (8 byte pointer to the root)
 9     cmp $0, %rdi
10     # Root not NULL, continue with execution
11     jne continue
12     # Root is NULL, store NULL in %rax and return
13     xor %rax, %rax
14     jmp done
15 continue:
16     # --------- Save local variables to stack ---------
17     # Save the root parameter as local_root on stack
18     movq %rdi, local_root(%rbp)
19 
20     # Advance 8 bytes from root, to get left pointer in struct.
21     movq 8(%rdi), %rdx
22     # Store this as local_left to stack
23     movq %rdx, local_left(%rbp)
24 
25     # Advance 16 bytes from root, to get to right pointer in struct.
26     movq 16(%rdi), %rdx
27     # Store this as local_right to stack
28     movq %rdx, local_right(%rbp)
29 
30     # --------- Recursive calls ---------
31     # Recursively call invert_tree on local_right
32     movq local_right(%rbp), %rdi
33     call invert_tree
34 
35     # Calculate address of root.left
36     movq local_root(%rbp), %rdx
37     addq $8, %rdx
38     # Assign root.left to value returned from recursive call
39     movq %rax, (%rdx)
40 
41     # Recursively call invert_tree on local_left
42     movq local_left(%rbp), %rdi
43     call invert_tree
44 
45     # Calculate address of root.right
46     movq local_root(%rbp), %rdx
47     addq $16, %rdx
48     # Assign root.right to value returned from recursive call
49     movq %rax, (%rdx)
50 
51     # Return root
52     movq local_root(%rbp), %rax
53 
54 done:
55     # Function epilogue
56     leave
57     ret

To assign the results of the recursive calls, we use the %rdx
register to compute the correct addresses of root.left and
root.right. Most of this is a straight translation of the pseudocode.

Submitting to leetcode

We cannot submit assembly to Leetcode, so we will use C instead.
Leetcode uses gcc to compile our code, so we can use the __asm__
directive to add our assembly.

We need to also declare our function with extern before we can call
it.

 1 typedef struct TreeNode TreeNode;
 2 extern TreeNode* invert_tree(TreeNode*);
 3 
 4 __asm__(".global invert_tree  \n\t"
 5 ".type invert_tree, @function \n\t"
 6 ".section .text               \n\t"
 7 // Rest of our code here...
 8 );
 9 
10 // Leetcode func
11 TreeNode* invertTree(TreeNode* root){
12     return invert_tree(root);
13 }

And we are done! [leetcode_a]

The leetcode performance and memory numbers are very unreliable, so
we don't really know how much faster our program is compared to other
submissions, but I suspect it is pretty fast. [leetcode_s]

Conclusion

This was a fun excursion to brush up on my assembly skills. It wasn't
difficult, but definitely was tedious (This is generally how I feel
about most leetcode style questions).

Max Howell rejected from Google

At least now I can relate to more memes on r/ProgrammerHumor

Max Howell rejected from Google

HN Front page update

Since a lot of people from HN and reddit are here, time for a
shameless plug:

I am actively looking for a python/rust/quant job. Here's my Github
Email and LinkedIn.

Thanks for reading. Happy Thanksgiving!

  * Home
  * Posts
  * About

  * Some ground rules
  * Pseudocode
  * Basic setup
      + Function prologue and epilogue
      + Stack alignment
  * Base case
  * Struct alignment
      + Storing local variables on the stack
  * Do the recursive calls
  * Submitting to leetcode
  * Conclusion
  * HN Front page update

  *  
  *  
  *  
  *  
  *  
  *  
  *  
  *  
  *  

Menu TOC share Top
Copyright (c) 2022 Sanket Shanbhag

  * Home
  * Posts
  * About