## title: My eBPF exploration
       ## date: "2024-01-11"
       
       /ebpf.png
 (IMG) /ebpf.png
       
       Having discovered eBPF and read a few books about it, I'm
       writing here the essentials to remember about the basics.
       It's mainly a mix of my personal notes from the books
       "Learning eBPF" by Liz Rice and "Linux Observability with
       BPF" by David Calavera. The aim is to write down the
       essentials without going into too much technical detail, a
       sort of memo.
       
       You can find my eBPF (XDP) projects at the bottom of the
       page.
       
       ## What is eBPF ?
       
       eBPF stands for extended Berkeley Packet Filter. It's a
       virtual machine with a minimalist instructions set in the
       kernel (Linux) that lets you run BPF programs from user
       space. These BPF programs are attached to objects in the
       kernel and executed when these objects are triggered by
       events.
       
       /basic_ebpf_scheme.png
 (IMG) /basic_ebpf_scheme.png
       
       eBPF mainly avoid use to rewrite the kernel source code and
       the whole process (even after writing code, waiting for
       changes can take years. It allows the kernel to be dymanic.
       
       Linux modules exist, they can be loaded dynamically, but
       there is a problem, the safety/security is too long to
       check, and it is too risky. eBPF has fixed this problem with
       the eBPF verifier.
       
       ### Important features
       
       Kernel probe is an important part of the eBPF functioning.
 (HTM) Kernel probe
       “"It enables you to dynamically break into any kernel
       routine and collect debugging and performance information
       non-disruptively."”
       
       So how can a user (user space) communicate with a BPF
       program (kernel space) ?
       
       It is possible thanks to BPF maps. A map is a key/value
       stores that resides in the kernel and that can be accessed
       from eBPF program and from the user space via bpf syscalls.
       
       ## eBPF program
       
       An eBPF program is nothing else than a set of eBPF
       instructions in a bytecode format. eBPF uses JIT compiler to
       convert the eBPF bytecode to machine code that run natively
       on the CPU.
       
       /ebpf_build_chain.png
 (IMG) /ebpf_build_chain.png
       
       eBPF program take a pointer to a context that depends of the
       type of event (defining different type of program help the
       verifier).
       
       There are a set of functions that eBPF programs can call, it
       is called bpf helper functions.
       
       Each program type cannot call every bpf helper functions,
       some are banned by the verifier. As example, an XDP program
       cannot use bpf_get_current_pid_tgid. A program type has its
       own return code meanings.
       
       BPF Kernel functions aka kfuncs allow internal kernel
       functions to be called from eBPF programs.
 (HTM) BPF Kernel functions
       
       ## BPF System call
       
       This system call allow us to perform a command on an
       extended BPF map or program.
       
       #include <linux/bpf.h>
       
       int bpf(int cmd, union bpf_attr *attr, unsigned int size);
       
       An example of the bpf system call output from strace.
       
       bpf(BPF_BTF_LOAD, ...) = 3
       bpf(BPF_MAP_CREATE,
       {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY…) = 4
       bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH...) = 5
       bpf(BPF_PROG_LOAD,
       {prog_type=BPF_PROG_TYPE_KPROBE,...prog_name="hello",...) =
       6
       bpf(BPF_MAP_UPDATE_ELEM, ...}
       ...
       
       ## BTF
       
       BTF stands for BPF Type Format, it is the metadata format
       which encodes the debug information associated with eBPF
       programs/map.
       
       “"BTF provides a standardized way to describe the data
       structures used by eBPF programs, enabling better
       interaction between user-space tools and the kernel."”
       
       The BTF is stored as a BPF map after the BPF program is
       loaded. It makes the BPF programs portable across different
       kernel versions.
       
       ## CO-RE
       
       CO-RE stands for Compile Once, Run Everywhere, the idea
       behing is to compile a program once and run it on different
       kernel version without recompiling it.
       
       We can list some CO-RE elements:
       - BTF
       - Kernel headers
         - Including individual header files
         - Generating kernel headers (vmlinux.h) with bpftool
       - Compiler support flags like -g
       - Data structure relocations support for libraries
         - Information relocation based on destination machine data
       structure difference, it is used to compensates
       - BPF skeleton
         - Generated with bpftool, it allows the programmer to call
       functions to manage the BPF program lifecycle
       
       ## eBPF verifier
       
       The verification process ensures the eBPF bytecode is safe.
       
       It tests every possible execution paths, it pushes copy of
       the regs onto a stack and explore one of the possible paths.
       
       It is optimized to avoid evaluating the instructions with
       something called state pruning, it avoids reevaluating path
       (record registers state and if it arrives on the same
       instruction with a matching state, there is no need to
       verify the rest of path).
       
       ## XDP
       
       XDP stands for eXpress Data Path, it is a programmable
       kernel-integrated packet processor in the Linux network data
       path that execute BPF programs.
       
       “"The packet processor is the in-kernel component for XDP
       programs that processes packets on the receive (RX) queue
       directly as they are presented by the NIC."”
       
       XDP programs can make decision (drop, pass, etc..) on the
       received packets.
       
       ## Important Linux concepts
       
       The capabilities are a way of dividing Linux root privileged
       into smaller "units".
       
       Seccomp means Secure Computing and is a security layer in
       Linux that allow to filter specific syscalls.
       
       ## Links
       
       Here are some of my XDP projects.
       
       https://github.com/theobori/tinyknock
       https://github.com/theobori/tinyfilter
 (HTM) https://github.com/theobori/tinyknock
 (HTM) https://github.com/theobori/tinyfilter