[HN Gopher] Inverting a binary tree using x64 assembly
___________________________________________________________________
Inverting a binary tree using x64 assembly
Author : teknas
Score : 120 points
Date : 2022-11-24 15:05 UTC (7 hours ago)
(HTM) web link (sanket.tech)
(TXT) w3m dump (sanket.tech)
| aftbit wrote:
| What does it mean to "invert" a binary tree?
| tom_ wrote:
| "Flip" or "mirror" is probably a better term. It seems the goal
| is to swap left and right:
| https://leetcode.com/problems/invert-binary-tree/
| dekhn wrote:
| Inverting a binary tree is easy; express the tree as a matrix
| (laplacian), invert the matrix, then convert that back to a
| tree. What the canonical question is asking is not inversion.
|
| Since too many people were memorizing inversion, I switched to
| asking how to evert a binary tree. This leads naturally into a
| discussion of the 1:1 relationship between complex numbers and
| cohomology sets, I figure if somebody can get that right, they
| can be a junior programmer on the team.
| raxxorraxor wrote:
| x32 converter for GNU/Linux: sed -i 's/r//g'
| invert_tree.asm > invet_tee.asm
| junon wrote:
| x16, you mean :D
| teknas wrote:
| It's not that easy, you also have to account for the pointer
| sizes (4 bytes instead of 8). Also alignment requirements are
| different on x32.
| badrabbit wrote:
| It's not only that either. OP used rdi because of calling
| convention, in x32 you would need to arguments on the stack
| instead and reference them +ebp instead of using registers if
| you want non-asm code to call your function.
| ummonk wrote:
| This doesn't actually invert a Merkle tree though, since you have
| to recompute the hashes (except the leaf hashes) when you invert
| a Merkle tree. Gonna be a no-hire evaluation from me dawg.
| hinkley wrote:
| The tweet this is based on is a joke. To invert a Merkle tree
| would mean to invert cause and effect. I'm pretty sure the
| tweet author is implying they want you to find a hash key
| collision for each node. Hope you have a couple spare universes
| in your pockets because this is gonna take a while.
| [deleted]
| danbruc wrote:
| Assume RAX points to the root node and nodes just contain two
| child pointers and everything is aligned and whatever.
| :invert cmp rax, 0 jnz swap: ret :swap
| push rax mov rax, [rax] call invert mov
| rbx,[rsp] xchg rax, [rbx+8] mov [rbx], rax call
| invert pop rax ret
|
| The last time I used an assembler was before x86-64 was invented,
| I am not even sure I ever used one in protected mode. But that
| seems a totally reasonable whiteboard interview question. Written
| in notepad, might not assemble. Might even be totally incorrect
| and I am posting it so that the internet generates the warning
| and error messages.
|
| EDIT: After reading the article now, that seems rather
| inefficient to me, to use local variables on the stack for
| everything. And why is the function returning a node if it is
| mutating the tree in place?
| pyinstallwoes wrote:
| Did you write Forth at all?
| danbruc wrote:
| No. Why do you ask?
|
| EDIT: As Wikipedia describes it as a stack-oriented language,
| because of my comment about putting everything onto the
| stack?
| adrian_b wrote:
| The variables on the stack are the most efficient after
| registers, so you are right that a local variable should be
| kept into a register if possible, otherwise in the stack, and
| only then in other places (e.g. if it is too large for the
| stack).
|
| However, when writing in assembly one must pay attention that
| at least RBX, RBP and R12 through R15 must be preserved by any
| function (on Windows also RDI and RSI must be preserved).
|
| So in your code you should not use RBX, but a volatile
| register, e.g. RDX or RCX. If you would insist on using RBX, it
| would have to be saved and restored.
| danbruc wrote:
| I know none of the calling conventions in any detail anymore
| and just used the registers in alphabetical order. Totally
| expected that this would violate something.
| PixelOfDeath wrote:
| mov cr0, ...
| fear91 wrote:
| You can use the 32bit xor to reset the register. Also TEST
| REG,REG might be better for checking if it's zero.
| badrabbit wrote:
| Is 32 bit cheaper? Still 1 cycle.
| fear91 wrote:
| Yes, it's 1 cycle but it's longer to decode and occupies more
| of l1i cache. It's not all about execution cycles.
| nixpulvis wrote:
| I was thinking of this from the perspective of CPU pipeline
| pressure, but in reality it seems prosessors are indeed smart
| enough to avoid burdoning the ALUs execution with these kinds
| of special cases.
|
| Read more here https://stackoverflow.com/questions/17981447/m
| icroarchitectu...
|
| > [...] these zeroing instructions extremely efficient, with
| a throughput of four zeroing instructons per clock cycle.
|
| Also, the xor instruction takes up the smallest amount of
| .text space (right?).
| scheme271 wrote:
| The xor reg, reg is also special cased because it's a quick
| way for compilers to reinitialize a register and indicate
| that future uses of that register don't depend on any
| previous operations. It helps the cpu to parallelize any
| future instructions that use that register since the cpu
| knows that those instructions don't depend on anything that
| happens before the the xor.
| Someone wrote:
| It's not special cased _because_ it 's a quick way for
| compilers to reinitialize a register and indicate that
| future uses of that register don't depend on any previous
| operations, it's special cased _to give_ compilers a
| quick way to reinitialize a register and indicate that
| future uses of that register don 't depend on any
| previous operations.
|
| https://randomascii.wordpress.com/2012/12/29/the-
| surprising-... is almost ten years old and thus likely
| dated, but still may be educational.
| bugfix-66 wrote:
| Being able to read and understand x86-64 assembly (or PTX/SASS
| for an Nvidia GPU) is much more important than being able to
| write it. In practice, even when you're writing assembly, you're
| looking at reference assembly generated by a compiler from C code
| you wrote.
|
| Similarly, the reality is that as a professional programmer you
| spend no time doing work like leetcode.
|
| Instead, you spend a lot of time understanding and slightly
| modifying (fixing or enhancing/extending) code.
|
| With the rise of language model code completion systems (e.g.,
| Microsoft Copilot) even more time will be spent inspecting and
| understanding code to find problems.
|
| With these facts in mind, I have been building a new form of
| leetcode:
|
| https://BUGFIX-66.com
|
| Most puzzles are interesting algorithms that you will learn
| useful techniques from, so it's never a waste of time to think
| about them. And even though the bugs are all quite trivial, I can
| see it's very challenging for many people.
|
| It's about half-way ready to launch, needing 30 more puzzles. I
| am working my way through Knuth's The Art of Programming Volume
| 4B and today I'll see if Algorithm X (Dancing Links Exact Cover
| Backtracking) can be made to fit for Bugs 38 and 39 (or whether
| it's too complicated).
| love2read wrote:
| I love seeing people solve leetcode challenges in asm, are there
| any more blogposts like this?
| chrisseaton wrote:
| The hard bit of solving them is usually the algorithm though -
| when you know that you can code it in anything.
| kjeetgill wrote:
| I'm not well versed in assembly, so learning assembly first
| would be the hard part!
|
| I'm with GP, it's fun seeing how solutions differ between
| languages as a way to peek into other language communities I
| don't spend as much time in.
| devit wrote:
| That solution is terrible, with a bad algorithm that requires
| O(tree_height) space (the optimal one involves temporarily using
| left/right pointers as a parent pointer so that you only need
| constant space) and lacking any sort of assembly optimization,
| being worse than what a compiler would produce (e.g. it's a real
| mystery how the author managed to decide that local_right should
| be spilled on the stack).
|
| Definitely not what you want to submit to someone testing your
| programming skills.
| Kranar wrote:
| You're alluding to using the Morris traversal algorithm which
| can traverse a binary tree in O(1) space, but Morris traversal
| is actually much much slower than using a stack, especially as
| is used by this algorithm. Doing a Morris traversal requires at
| a minimum twice the number of operations as using a stack, and
| due to its cache unfriendly nature will in practice be closer
| to 4x slower.
|
| You typically only use Morris traversal on exceptionally large
| trees, and by large I mean when working with data that lives on
| a disk. It's definitely the exception, not the norm.
| [deleted]
| TheRealPomax wrote:
| This is pretty damn cool, but unfortunately you failed the
| interview as you accepted the challenge to use x86 assembler, but
| solved the problem using a different programming language from
| the one we asked you to use. We'll keep your resume on file, and
| if there are any openings in the future we encourage you to apply
| for those.
| penguin_booze wrote:
| The feedback we received indicates that you used the backspace
| key twice during coding. We expect higher precision than that.
|
| Also, we'd like to remind you that you can only re-apply after
| our cool-off period, which is 25 years.
| wheelerof4te wrote:
| Oooh. So that's why people invented C.
| cgh wrote:
| > I will be using x64 assembly with the AT&T syntax as it is
| objectively superior than the Intel syntax.
|
| This made me laugh because it must be a reference to this:
| https://news.ycombinator.com/item?id=33652023
|
| > I contend that the AT&T syntax is harmful and bad, and should
| never be used, for any reason, under any circumstances, by
| anyone.
| tux3 wrote:
| >I will be using x64 assembly with the AT&T syntax as it is
| objectively superior than the Intel syntax.
|
| Them's fighting words.
| EarlKing wrote:
| I immediately stopped reading the minute I read that in the
| text. I can't take anything they say seriously after reading
| that.
| badrabbit wrote:
| AT&T really is annoying but it feels like the big vs little
| endian debate. Fairly easy to convert between the two as well.
| efficax wrote:
| except now that everything depends on the internet, and words
| that go over networks are big endian, it seems insane to
| throw away millions and millions of cpu cycles every year
| converting them to little endian to be processed by our
| little endian cpus. sure, it's a single cpu instruction, but
| between every computer in the world, almost all of them being
| little endian arm or intel, that's billions and billions and
| billions of instructions wasted.
| sedatk wrote:
| For those who don't know, that's why big and little endian
| were called that, because the debate was so frivolous. It's a
| reference to the book Gulliver's Travels by Jonathan Swift in
| which an island folk was split about from which end you
| should crack a boiled egg. (I'm a big endian for example).
| classified wrote:
| I'm firmly in the little-endian camp for eggs, but big-
| endian for CPUs.
| monocasa wrote:
| Watch as every pitchfork gets pointed at you when you
| talk about middle endian.
|
| Which is a real thing. There are systems that would
| store, say, a 32-bit word as two 16-bit words that were
| big endian relative to each other, but little endian
| within the 16-bit word.
| sedatk wrote:
| Found my archnemesis.
| nurettin wrote:
| Do you even care about CPUs anymore?
| classified wrote:
| Well, boiled eggs definitely taste better.
| dekhn wrote:
| Wait, hasn't everybody learned about inside-out endian? The
| highest order bit is in the middle, then proceeds along a
| hilbert path to the edges
| quelsolaar wrote:
| I used to be big endian. Then i changed my mind. Have a
| look at the following:
|
| -123 -12 -1
|
| Here are three numbers. If we read them from left to right,
| they all begin with 1, but we cant tell what the 1 means,
| in one case it means 100, in one it means 10 and in one it
| means 1. We need to parse the entire number to know what
| the first number means. If we parse from right to left,
| each number always means the same thing. the first is how
| many once, the second how many 10s and so on.
|
| So it makes sense to store the smallest first. In a big
| endian architecture, if i want to read an 8, 16, 32 or 64
| bit word each byte will always mean the same, if we pad out
| the type with zeros. So little endian is right, and Arabic
| numerals are wrong.
| rags2riches wrote:
| Arabic numerals are little endian when writing right-to-
| left. Like Arabic is written.
| quietbritishjim wrote:
| It looks like the comment you're replying to was talking
| about eggs. I'm little endian by the way.
___________________________________________________________________
(page generated 2022-11-24 23:00 UTC)