https://maskray.me/blog/2023-12-17-exploring-the-section-layout-in-linker-output MaskRay Home Archives Presentations [github] [twitter] [ ] 2023-12-17 Exploring the section layout in linker output This article describes section layout and its interaction with dynamic loaders and huge pages. Let's begin with a Linux x86-64 example involving global variables exhibiting various properties such as read-only versus writable, zero-initialized versus non-zero, and more. 1 #include 2 const int ro = 1; 3 int w0, w1 = 1; 4 int *const pw0 = &w0; 5 int main() { 6 printf("%d %d %d %p\n", ro, w0, w1, pw0); 7 } 1 % clang -c -fpie a.c 2 % clang -pie -fuse-ld=lld -Wl,-z,separate-loadable-segments a.o -o a 3 % objdump -wt a | grep -P 'main|w[01]|ro$' 4 00000000000010f0 g F .text 000000000000002e main 5 0000000000003044 g O .bss 0000000000000004 w0 6 0000000000003010 g O .data 0000000000000004 w1 7 000000000000058c g O .rodata 0000000000000004 ro 8 0000000000002010 g O .data.rel.ro 0000000000000008 pw0 9 % readelf -Wl a 10 ... 11 Program Headers: 12 Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align 13 PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R 0x8 14 INTERP 0x0002a8 0x00000000000002a8 0x00000000000002a8 0x00001c 0x00001c R 0x1 15 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] 16 LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000628 0x000628 R 0x1000 17 LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x000180 0x000180 R E 0x1000 18 LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x0001e0 0x001000 RW 0x1000 19 LOAD 0x003000 0x0000000000003000 0x0000000000003000 0x000040 0x000048 RW 0x1000 20 DYNAMIC 0x002018 0x0000000000002018 0x0000000000002018 0x0001a0 0x0001a0 RW 0x8 21 GNU_RELRO 0x002000 0x0000000000002000 0x0000000000002000 0x0001e0 0x001000 R 0x1 22 GNU_EH_FRAME 0x0005a0 0x00000000000005a0 0x00000000000005a0 0x00001c 0x00001c R 0x4 23 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0 24 NOTE 0x0002c4 0x00000000000002c4 0x00000000000002c4 0x000020 0x000020 R 0x4 25 ... (We will discuss -Wl,-z,separate-loadable-segments later.) We can see that these functions and global variables are placed in different sections. * .rodata: read-only data without dynamic relocations, constant in the link unit * .text: functions * .data.rel.ro: read-only data associated with dynamic relocations, constant after relocation resolving, part of the PT_GNU_RELRO segment * .data: writable data * .bss: writable data known to be zeros Section and segment layout TODO I may write more about how linkers layout sections and segments. Anyhow, the linker will place .data and .bss in the same PT_LOAD program header (segment) and the rest into different PT_LOAD segments. (There are some nuances. If you use GNU ld's -z noseparate-code or lld's --no-rosegment, .rodata and .text will be placed in the same PT_LOAD segment.) The PT_LOAD segments have different flags (p_flags): PF_R, PF_R|PF_X, PF_R|PF_W. Subsequently, the dynamic loader, also known as the dynamic linker, will invoke mmap to map the file into memory. The memory areas have different memory permissions corresponding to segment flags. For a PT_LOAD segment, its associated memory area starts at alignDown (p_vaddr, pagesize) and ends at alignUp(p_vaddr+p_memsz, pagesize). 1 Start Addr End Addr Size Offset Perms objfile 2 0x555555554000 0x555555555000 0x1000 0x0 r--p /tmp/c/a 3 0x555555555000 0x555555556000 0x1000 0x1000 r-xp /tmp/c/a 4 0x555555556000 0x555555557000 0x1000 0x2000 r--p /tmp/c/a 5 0x555555557000 0x555555558000 0x1000 0x3000 rw-p /tmp/c/a Let's assume the page size is 4096 bytes. We'll calculate the alignDown(p_vaddr, pagesize) values and display them alongside the "Start Addr" values: 1 Start Addr alignDown(p_vaddr, pagesize) 2 0x555555554000 0x0000000000000000 3 0x555555555000 0x0000000000001000 4 0x555555556000 0x0000000000002000 5 0x555555557000 0x0000000000003000 We observe that the start address equals the base address plus alignDown(p_vaddr, pagesize). --no-rosegment This option asks lld to combine the read-only and the RX segments. The output file will consume less address space at run-time. 1 Start Addr End Addr Size Offset Perms objfile 2 0x555555554000 0x555555555000 0x1000 0x0 r-xp /tmp/c/a 3 0x555555555000 0x555555556000 0x1000 0x0 r--p /tmp/c/a 4 0x555555556000 0x555555557000 0x1000 0x1000 rw-p /tmp/c/a MAXPAGESIZE A page serves as the granularity at which memory exhibits different permissions, and within a page, we cannot have varying permissions. Using the previous example where p_align is 4096, if the page size is larger, for example, 65536 bytes, the program might crash. Typically, the dynamic loader allocates memory for the first PT_LOAD segment (PF_R) at a specific address allocated by the kernel. Subsequent PT_LOAD segments then overwrite the previous memory regions. Consequently, certain code pages or significant global variables might be replaced by garbage, leading to a crash. So, how can we create a link unit that works across different page sizes? We simply determine the maximum page size, let's say, 2097152, and then pass -z max-page-size=2097152 to the linker. The linker will set p_align values of PT_LOAD segments to MAXPAGESIZE. 1 Program Headers: 2 Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align 3 PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R 0x8 4 INTERP 0x0002a8 0x00000000000002a8 0x00000000000002a8 0x00001c 0x00001c R 0x1 5 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] 6 LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000628 0x000628 R 0x10000 7 LOAD 0x010000 0x0000000000010000 0x0000000000010000 0x000180 0x000180 R E 0x10000 8 LOAD 0x020000 0x0000000000020000 0x0000000000020000 0x0001e0 0x001000 RW 0x10000 9 LOAD 0x030000 0x0000000000030000 0x0000000000030000 0x000040 0x000048 RW 0x10000 10 DYNAMIC 0x020018 0x0000000000020018 0x0000000000020018 0x0001a0 0x0001a0 RW 0x8 11 GNU_RELRO 0x020000 0x0000000000020000 0x0000000000020000 0x0001e0 0x001000 R 0x1 12 GNU_EH_FRAME 0x0005a0 0x00000000000005a0 0x00000000000005a0 0x00001c 0x00001c R 0x4 13 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0 14 NOTE 0x0002c4 0x00000000000002c4 0x00000000000002c4 0x000020 0x000020 R 0x4 In a linker script, the max-page-size can be obtained using CONSTANT (MAXPAGESIZE). For completeness, if you need to run a prebuilt executable on a system with a larger page size, you can modify the executable by merging PT_LOAD segments and combining their permissions. It's likely there will be a sizable RWX PT_LOAD segment, reminiscent of OMAGIC. Over-aligned segment It is possible to increase the p_align value of one single PT_LOAD segment using an aligned attribute. When this value exceeds the page size, the question arises: should the kernel loader or the dynamic loader determine a suitable base address to meet this alignment requirement? In 2020, the Linux kernel loader made the decision to align the base address according to the maximum p_align. This facilitates transparent huge pages for mapped files at expense cost of reduced address randomization. 1 % cat align.c 2 #include 3 __attribute__((aligned(A))) int aligned; 4 int main() { printf("%p\n", &aligned); } 5 % cc -DA=4096 align.c -o align && ./align 6 0x55e994c13000 7 % cc -DA=2097152 align.c -o align && ./align 8 0x55639a400000 Should a userspace dynamic loader do the same? If it does, a variable with an alignment greater than the page size will indeed align accordingly. As of glibc 2.35, it has followed suit. On the other hand, the traditional interpretation dictates that a variable with an alignment greater than the page size is invalid. Most other dynamic loaders do not implement this particular logic, which has some overhead. -z separate-loadable-segments In previous examples using -z separate-loadable-segments, the p_vaddr values of PT_LOAD segments are multiples of MAXPAGESIZE. The generic ABI says "loadable process segments must have congruent values for p_vaddr and p_offset, modulo the page size." p_offset - This member gives the offset from the beginning of the file at which the first byte of the segment resides. p_vaddr - This member gives the virtual address at which the first byte of the segment resides in memory. This alignment requirement aligns with the mmap documentation. For example, Linux man-pages specifies, "offset must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE)." The p_offset values are also multiples of MAXPAGESIZE. After layouting out a PT_LOAD segment, the linker must pad the end by inserting zeros so that the next PT_LOAD segment starts at a multiple of MAXPAGESIZE. However, the alignment padding is wasteful. Fortunately, we can link a.o using different MAXPAGESIZE and different alignment settings: -z noseparate-code,-z separate-code,-z separate-loadable-segments. 1 clang -pie -fuse-ld=lld -Wl,-z,noseparate-code a.o -o a0.4096 2 clang -pie -fuse-ld=lld -Wl,-z,noseparate-code,-z,max-page-size=65536 a.o -o a0.65536 3 clang -pie -fuse-ld=lld -Wl,-z,noseparate-code,-z,max-page-size=2097152 a.o -o a0.2097152 4 5 clang -pie -fuse-ld=lld -Wl,-z,separate-code a.o -o a1.4096 6 clang -pie -fuse-ld=lld -Wl,-z,separate-code,-z,max-page-size=65536 a.o -o a1.65536 7 clang -pie -fuse-ld=lld -Wl,-z,separate-code,-z,max-page-size=2097152 a.o -o a1.2097152 8 9 clang -pie -fuse-ld=lld -Wl,-z,separate-loadable-segments a.o -o a2.4096 10 clang -pie -fuse-ld=lld -Wl,-z,separate-loadable-segments,-z,max-page-size=65536 a.o -o a2.65536 11 clang -pie -fuse-ld=lld -Wl,-z,separate-loadable-segments,-z,max-page-size=2097152 a.o -o a2.2097152 1 % stat -c %s a0.4096 a0.65536 a0.2097152 2 6168 3 6168 4 6168 5 % stat -c %s a1.4096 a1.65536 a1.2097152 6 12392 7 135272 8 4198504 9 % stat -c %s a2.4096 a2.65536 a2.2097152 10 16120 11 200440 12 6295288 We can derive two properties: * Under one MAXPAGESIZE, we have size(noseparate-code) < size (separate-code) < size(separate-loadable-segments). * For -z noseparate-code, increasing MAXPAGESIZE does not change the output size. AArch64 and PowerPC64 have a default MAXPAGESIZE of 65536. Staying with the -z noseparate-code default ensures that they will not experience unnecessary size increase. -z noseparate-code How does -z noseparate-code work? Let's illustrate this with an example. At the end of the read-only PT_LOAD segment, the address is 0x628. Instead of starting the next segment at alignUp(0x628, MAXPAGESIZE) = 0x1000, we start at alignUp(0x628, MAXPAGESIZE) + 0x628 % MAXPAGESIZE = 0x1628. Since the .text section has an alignment (sh_addralign) of 16, we start at 0x1630. Although the address is advanced beyond necessity, the file offset (congruent to the address, modulo MAXPAGESIZE) can be decreased to 0x630, merely 8 bytes (due to alignment padding) after the previous section's end. Moving forward, the end of the executable PT_LOAD segment has an address of 0x17b0. Instead of starting the next segment at alignUp (0x17b0, MAXPAGESIZE) = 0x2000, we start at alignUp(0x17b0, MAXPAGESIZE) + 0x17c0 % MAXPAGESIZE = 0x27b0. While we advance the address more than needed, the file offset can be decreased to 0x7b0, precisely at the previous section's end. 1 % readelf -WSl a0.4096 2 ... 3 [Nr] Name Type Address Off Size ES Flg Lk Inf Al 4 [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 5 [ 1] .interp PROGBITS 00000000000002a8 0002a8 00001c 00 A 0 0 1 6 ... 7 [12] .eh_frame PROGBITS 00000000000005c0 0005c0 000068 00 A 0 0 8 8 [13] .text PROGBITS 0000000000001630 000630 00011e 00 AX 0 0 16 9 ... 10 [16] .plt PROGBITS 0000000000001780 000780 000030 00 AX 0 0 16 11 [17] .fini_array FINI_ARRAY 00000000000027b0 0007b0 000008 08 WA 0 0 8 12 ... 13 [20] .dynamic DYNAMIC 00000000000027c8 0007c8 0001a0 10 WA 7 0 8 14 [21] .got PROGBITS 0000000000002968 000968 000028 00 WA 0 0 8 15 [22] .relro_padding NOBITS 0000000000002990 000990 000670 00 WA 0 0 1 16 [23] .data PROGBITS 0000000000003990 000990 000014 00 WA 0 0 8 17 ... 18 [26] .bss NOBITS 00000000000039d0 0009d0 000008 00 WA 0 0 4 19 ... 20 LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000628 0x000628 R 0x1000 21 LOAD 0x000630 0x0000000000001630 0x0000000000001630 0x000180 0x000180 R E 0x1000 22 LOAD 0x0007b0 0x00000000000027b0 0x00000000000027b0 0x0001e0 0x000850 RW 0x1000 23 LOAD 0x000990 0x0000000000003990 0x0000000000003990 0x000040 0x000048 RW 0x1000 24 DYNAMIC 0x0007c8 0x00000000000027c8 0x00000000000027c8 0x0001a0 0x0001a0 RW 0x8 25 GNU_RELRO 0x0007b0 0x00000000000027b0 0x00000000000027b0 0x0001e0 0x000850 R 0x1 -z separate-code performs the trick when transiting from the first RW PT_LOAD segment to the second, whereas -z separate-loadable-segments doesn't. When MAXPAGESIZE is larger than the actual page size Let's consider two adjacement PT_LOAD segments. The memory area associated with the first segment ends at alignUp(load [i].p_vaddr+load[i].p_memsz, pagesize) while the memory area associated with the second one starts at alignDown(load[i+1].p_vaddr, pagesize). When the actual page size equals MAXPAGESIZE, the two addresses are identical. However, if the actual page size is smaller, a gap emerges between these addresses. A typical link unit generally presents three gaps. These gaps might either be unmapped or mapped. When mapped, they necessitate struct vm_area_struct objects within the Linux kernel. As of Linux 6.3.13, the size of struct vm_area_struct is 152 bytes. For instance, 10000 mapped object files would require 10000 * 3 * sizeof(struct vm_area_struct) = 4,560,000 bytes, signifying a considerable memory footprint. You can refer to Extra struct vm_area_struct with ---p created when PAGE_SIZE < max-page-size. Dynamic loaders typically invoke mmap using PROT_READ, encompassing the whole file, followed by multiple mmap calls using MAP_FIXED and the corresponding flags. When dynamic loaders, like musl, don't process gaps, the gaps retain r--p permissions. However, in glibc's elf/dl-map-segments.h, the has_holes code employs mprotect to transition permissions from r--p to ---p. While ---p might be perceived as a security enhancement, personally, I don't believe it significantly impacts exploitability. While there might be numerous gadgets in r-xp areas, reducing gadgets in r--p areas doesn't seem notably impactful. (https://isopenbsdsecu.re/ mitigations/rop_removal/) Unmap the gap Within Linux kernel loads the executable and its interpreter (it present) (fs/binfmt_elf.c), the gap gets unmapped, thereby freeing a struct vm_area_struct object. Implementing a similar approach in dynamic loaders could yield comparable savings. However, unmapping the gap carries the risk of an unrelated future mmap occupying the gap: 1 564d8e90f000-564d8e910000 r--p 00000000 08:05 2519504 /sample/build/main 2 ================ an unrelated mmap may be placed in the gap 3 564d8e91f000-564d8e920000 r-xp 00010000 08:05 2519504 /sample/build/main It is not clear whether the potential occurrence of an unrelated mmap considered a regression in security. Personally, I don't think this poses a significant issue as the program does not access the gaps. This property can be guaranteed for direct access when input relocations to the linker use symbols with in-bounds addends (e.g. when x is defined relative to an input section, we know R_X86_64_PC32 (x) must be in-bounds). However, some programs may expect contiguous maps areas of a file (such as when glibc link_map::l_contiguous is set to 1). Does this choice render the program exploitable if an attacker can ensure a map within the gap instead of outside the file? It seems to me that they could achieve everything with a map outside of the file. Having said that, the presence of an unrelated map between maps associated with a single file descriptor remains odd, so it's preferable to avoid it if possible. Extend the memory area to cover the gap This appears the best solution. When creating a memory area, instead of setting the end to alignUp (load[i].p_vaddr+load[i].p_memsz, pagesize), we can extend the end to min(alignDown(min(load[i+1].p_vaddr), pagesize), alignUp (file_end_addr, pagesize)). 1 564d8e90f000-**564d8e91f000** r--p 00000000 08:05 2519504 /sample/build/main (the end is extended) 2 564d8e91f000-564d8e920000 r-xp 00010000 08:05 2519504 /sample/build/main For the last PT_LOAD segment, we could also just use alignDown(min (load[i+1].p_vaddr), pagesize) and ignore alignUp(file_end_addr, pagesize)). Accessing a byte beyond the backed file will result to a SIGBUS signal. A new linker option? Personally I favor the area end extending approach. I've also pondered whether this falls under the purview of linkers. Such a change seems intrusive and unsightly. If the linker extends the end of p_memsz to cover the gap, should it also extend p_filesz? * If it doesn't, we create a PT_LOAD with p_filesz/p_memsz that is not for BSS, which is weird. * If it does, we have an output file featuring overlapping file offset ranges, which is weird as well. Moreover, a PT_LOAD whose end isn't backed by a section is unusual. I'm concerned that many binary manipulation tools may not handle this case correctly. Utilizing a linker script can intentionally create discontiguous address ranges. I'm concerned that the linker might not discern such cases with intelligent logic regarding p_filesz/p_memsz. This feature request seems to be within the realm of loaders and specific information, such as the page size, is only accessible to loaders. I believe loaders are better equipped to handle this task." Transparent huge pages for mapped files Some programs optimize their usage of the limited Translation Lookaside Buffer (TLB) by employing transparent huge pages. When the Linux kernel loads an executable, it takes into account the p_align field to create a memory area. If p_align is 4096, the memory area will commence at a multiple of 4096, but not necessarily at a multiple of a huge page. Transparent huge pages for mapped files require both the start address and the start file offset to align with a huge page. To ensure compatibility with MADV_HUGEPAGE, linking the executable using -z max-page-size= with the huge page size is recommended. However, in -z noseparate-code layouts, the file content might start somewhere at the first page, potentially wasting half a huge page on unrelated content. Switching to -z separate-code allows reclaiming the benefits of the half huge page but increases the file size. Balancing these aspects poses a challenge. One potential solution is using fallocate (FALLOC_FL_PUNCH_HOLE), which introduces complexity into the linker. However, this approach feels like a workaround to address a kernel limitation. It would be preferable if a file-backed huge page didn't necessitate a file offset aligned to a huge page boundary. It'd be nice if someone can share an example that remaps the ELF file with huge pages. I have made an attempt (https://mazzo.li/posts/ check-huge-page.html#comment-1702884685852924765) but did not succeed (madvise returns 0, but I cannot verify that huge pages are applied). Cost of RELRO To accommodate PT_GNU_RELRO, the RW region will possess two permissions after the runtime linker maps the program. While GNU ld provides one RW segment split by the dynamic loader, lld employs two explicit RW PT_LOAD segments. After relocation resolving, the effects of lld and GNU ld are similar. For those curious, explore my notes on GNU ld's file size increase due to RELRO. Due to RELRO, covering the two RW PT_LOAD segments necessitates a minimum of 2 (huge) pages. In contrast, without RELRO, only one (huge) page is required at minimum. This means potentially wasting up to MAXPAGESIZE-1 bytes, which could otherwise be utilized to cover more data. Nowadays, RELRO is considered a security baseline and removing it might unsettle security-minded individuals. Share Comments * linker * linux * llvm Older Linker notes on PE/COFF Please enable JavaScript to view the comments powered by Disqus. Popular Tag Cloud adc ai9 algorithm asc automaton awesome bctf binary binutils bmc build system c c++ ccls cgc chroot clang codinsanity coffee script compiler computer security contest csv ctf data structure debug defcon desktop docker elf emacs email emoji emscripten event expect ext4 feeds firmware floating point forensics freebsd game gcc gentoo github glibc graph drawing gtk hanoi haskell hpc inotify ipsec irc isc j javascript josephus problem jq kernel kythe ld leetcode libunwind linker linux llvm lsp m68k makefile math maze mirror ml musl mutt n-body network nginx nim nlp node.js noip notmuch npm ocaml offlineimap oi oj openwrt parallel parser generator perl powerpc presentation puzzle python qq radare2 regex regular expression reverse engineering review router rtld ruby ructfe sanitizer scheme search security shell ssh stringology student festival puzzle suffix array suffix automaton summary suricata telegram telegramircd terminal tls traversal tree trendmicro udev unicode usb vim vpn vte wargame web analytics webqqircd website wechat wechatircd window manager windows xbindkeys xmonad yanshi Blogroll * BYVoid * fqj1994 * ppwwyyxx (c) 2023 MaskRay Powered by Hexo Home Archives Presentations