80x86 16-bit Compiling How-to ============================= by Alexei A. Frounze July the 4th, 2004 Table of Contents ================= * Introduction * Reviewing Memory Addressing in Real Mode of 80x86 CPU * From 8080/8085 to 8086 * 16 or 20 Bits? Meet the Segment:Offset Pair! * More Than 1 MB? * Which Segment Register? * Memory Models Employed by Realmode Compilers * NASM, assembler * Compiling with Open Watcom C/C++ * Important details on Open Watcom C/C++ compiler * Calling Conventions and Register Conventions * Work In Progress Introduction ============ The need for making 16-bit code in is primarily due to the following facts: * An 80x86 CPU starts up in the real mode, employing its 16-bit addressing scheme * An 80x86 PC BIOS (which is what the CPU starts executing after reset/power on) is mostly 16-bit and cannot be easily used in 32-bit protected mode of the 80386+ CPU * To load the OS kernel from a disk (floppy or hard) it's natural to use the BIOS, when no other I/O drivers are available * To change screen modes, perform power management, etc, it's also natural to use the BIOS functionality (for the same reason as above) So, one would want 16-bit real mode code to run on the 80x86 PC to take advantage of using the BIOS and/or prepare to switch to the 32-bit protected mode of the CPU, like in e.g. bootloaders or OS loaders. For some purposes, pure 16-bit real mode code is enough as well. And you can compile your own ROM BIOS for an embedded x86-based system! Reviewing Memory Addressing in Real Mode of 80x86 CPU ===================================================== Let's review realmode 80x86 memory addressing. From 8080/8085 to 8086 ====================== The intel 8086 CPU was derived from intel 8080/8085 CPU and inherited 16-bit ideas from it. Although being 16-bit and somewhat compatible with 8080/8085, the 8086 CPU has an enhanced memory addressing mechanism, which isn't condemned to the 16 lines of the address bus, instead the 8086 has a 20 lines-wide address bus. So, unlike 8080/8085 (which could address up to 216 = 65536 bytes of memory, i.e. 64 KB), the 8086 can address up to 220 = 1048576 bytes of memory, i.e. 1 MB. Now, let's see how intel implemented memory addressing... An 8080/8085 would access its worth of 64 KB memory using direct and indirect forms of address specifications in the CPU instructions. For example: Instruction: LDA 2050H Action: Load A (8-bit accumulator register) with byte from memory location 2050H. Instruction: LHLD 0A00H Action: Load HL (16-bit register) with word from memory location 0A00H (byte at 0A00H would go to L (least signigicant half of HL) and byte at 0A01H would go to H (most significant half of HL)). Instuction: MOV A, M Action: Load A (8-bit accumulator register) with byte from memory location specified in the 16-bit register HL (M designates accessing memory indirectly through the HL register). Instruction: LDAX B Action: Load A (8-bit accumulator register) with byte from memory location specified in the 16-bit register BC. Hence, it's very simple with 8080/8085. Either the 16-bit address is a constant value encoded in the CPU instruction and the memory location is accessed directly by using the encoded address (this is direct addressing) or the 16-bit address is contained in a 16-bit register of the CPU (BC or HL in our examples) and this address is read from the register before accessing a memory location by this address (this is indirect addressing). Now, the 8086 can do the same thing... Instruction: MOV AL, [2050H] Action: Load AL (least significant half of 16-bit accumulator register AX) with byte from memory location 2050H. Instruction: MOV BX, [0A00H] Action: Load BX (16-bit register) with word from memory location 0A00H (byte at 0A00H would go to BL (least signigicant half of BX) and byte at 0A01H would go to BH (most significant half of BH)). Instruction: MOV AL, [BX] Action: Load AL (least significant half of 16-bit accumulator register AX) with byte from memory location specified in the 16-bit register BX. Instruction: LODSB Action: Load AL (least significant half of 16-bit accumulator register AX) with byte from memory location specified in the 16-bit register SI. Same thing. Almost... 16 or 20 Bits? Meet the Segment:Offset Pair! ============================================ Do you remember that 8086 has been said to have 20-bit-wide address bus? You surely do, don't you? Then how come the four 8086 instructions above specify only 16 bits of the address? Where's the leftover, the 4 other bits to make it 20-bit? :) The fun part is that there's one special address register involved, the DS register (data segment register). The DS register is also a 16-bit register. The value of the DS register is concatenated with the 16-bit address specified in the instruction. The concatenation is a bit tricky. The DS value is shifted left by 4 binary positions (or, equivalently, multiplied by 16) and then added to the 16-bit address specified in the instruction. Example: BX=341BH DS=123AH MOV AL, [BX] would load the AL register with a byte from memory location 123AH * 16 + 341BH = 123A0H + 341BH = 157BBH. 157BBH is the physical 20-bit address that is placed on the address bus so that the memory value at this address can be transferred to CPU (or backward, from CPU to memory, with e.g. MOV [BX], AL). Really simple. The address of the form 123AH:341BH is referred as to logical address. The part that is specified before the colon is referred as to segment part of the address (or often for shortness just segment). The part that is specified after the colon is referred as to offset part of the address (for shortness just offset or sometimes displacement). segment:offset pair is a logical address segment * 16 + offset = physical address So, with a constant value of segment (say, constant DS; there can be other segment registers used) but with different values of offset, we can address up to 216 = 65536 bytes = 64 KB of memory starting at the physical address equal to segment * 16. This 64 KB region of memory is referred as to segment. Right, same word is often used to refer to different things and smart guys are known to do it all the time. :) This is important to remember, if you're new to this addressing stuff and its terminology. Hopefully, you'll be able to deduce from context what segment stands for. By changing the segment value (say DS value) and offset value we can generate all the physical addresses from 0 up to 220-1, but this is not the upper bound. Technically, if we take segment=0FFFFH and offset=0FFFFH, then we'll end up with physical address equal to 10FFEFH, which needs 21 bit to be represented. The 8086 CPU has only 20 address lines, so such an address would lose its most significant bit and wrap around zero and in this example the 8086 CPU would access the byte at physical address 0FFEFH instead of 10FFEFH. It is important to mention that there are many different logical address possible such that transform to the same physical address. This is the effect of the way the segment:offset pair is transformed to the final, physical, address. Just an example: 123AH * 16 + 341BH = 123A0H + 341BH = 157BBH 1239H * 16 + 342BH = 12390H + 342BH = 157BBH 143AH * 16 + 141BH = 143A0H + 141BH = 157BBH ... More Than 1 MB? =============== With introduction of the intel 80286 CPU, the number of address lines extended to 24, so on the 80286 you can access memory above 1 MB mark by using the segment:offset pair. Only FFF0H = 65520 bytes (almost 64 KB) above 1 MB can be accessed this way. But that can only be possible if you enable the A20 address line (8086 had only A0 through A19 lines). For compatibility (with 8086 PCs) reasons, the PC engineers had added a programmable hardware mechanism on 80286+ based PCs to enable and disable the A20 address line, so that the address wrap around be possible just like on the 8086. When the A20 is disabled, both 10FFEFH and 0FFEFH physical addresses, generated by a 80286+ CPU, would appear to the memory as physical address 0FFEFH, i.e. the 20th address bit would always be 0. We won't discuss details of A20 enabling and disabling here because it's an off-topic. For now, let's just mention that in the protected mode of the intel 80286+ and 80386+ CPUs, it's possible to access to much more memory than 1 MB. The 80286 can access up to 16 MB of memory and the 80386 and 80486 can access up to 4 GB. Pentium class CPUs can access even more. That's it about protected mode for now. Which Segment Register? ======================= OK. Let's get back to the segment registers... In fact, the 8086 CPU always uses some segment register to read code/data from memory or write data to memory. The instructions executed by the 8086 CPU are sequentially read from memory using the CS:IP pair of CPU registers (CS is Code Segment register, IP is Instruction Pointer register). After execution of an instruction has completed, the IP will increment so the next instruction can be feched and executed. IP can also be changed by the near jump, call and return instructions, e.g. the control is transferred within 64 KB segment starting at physical address equal to CS * 16. The far jump, call and return instructions modify both IP and CS and make it possible to transfer control to any part of a program anywhere in the 1 MB of addressable memory. Interrupt and return from interrupt instructions always modify CS and IP, similarly to far call and return instructions. The 8086 CPU stack is organized with the SS:SP pair of registers (SS is Stack Segment register, SP is Stack Pointer register). SP decrements by 2 before a 16-bit word is stored on the stack, and conversly increments by 2 after a 16-bit word is removed from the stack. All interrupt, call and return instructions affect SP, not affecting SS. Let alone instruction fetch (with CS:IP) and stack manipulations (with SS:SP)... The interesting thing is how the 8086 CPU transfers data between itself and memory using direct and indirect addressing with registers other than IP and SP. It might look a bit complicated, but here's how it works... The 8086 CPU registers are: * AH/AL = AX * BH/BL = BX * CH/CL = CX * DH/DL = DX * FLAGS * DI * SI * BP * SP * IP * ES * DS * SS * CS Just for the completeness, 8086 CPU registers description: Register: AX Description: 16-bit Accumulator register, least and most significant halves (AL and AH respectively) are separately accessible. Most suited for/dedicated to the ALU operations and I/O. Register: BX Description: 16-bit Base register, least and most significant halves (BL and BH respectively) are separately accessible. Can be used as indirect address register when accessing memory. Register: CX Description: 16-bit Counter register, least and most significant halves (CL and CH respectively) are separately accessible. Can be used to organize loops and repeat string instructions. Register: DX Description: 16-bit Data register, least and most significant halves (DL and DH respectively) are separately accessible. Used in some special ALU and I/O operations. Register: FLAGS Description: 16-bit Flags register. Contains control/status flags. Register: IP Description: 16-bit Instruction Pointer register. Points to an instruction to be executed. Register: SP Description: 16-bit Stack Pointer register. Points to the last 16-bit word pushed to the stack. Register: BP Description: 16-bit Base Pointer register. Can be used as indirect address register when accessing memory (handy for stack memory accesses). Register: SI Description: 16-bit Source Index register. Can be used as indirect address register when accessing memory (used by string instructions). Register: DI Description: 16-bit Destination Index register. Can be used as indirect address register when accessing memory (used by string instructions). Register: CS Description: 16-bit Code Segment register. Selects the 64 KB region of memory, from which instructions are fetched and executed by the CPU. Register: SS Description: 16-bit Stack Segment register. Selects the 64 KB region of memory, where the CPU stack is located. Register: DS Description: 16-bit Data Segment register. Selects the 64 KB region of memory, with which most of memory reads and writes are done. Register: ES Description: 16-bit Extra data Segment register. Selects an additional 64 KB region (additional to one selected by DS) of memory, with which more memory reads and writes can be done. Used by string instructions that work with DI. Now, having introduced all of the 8086 CPU registers, let's see how we can access memory using them for indirect addressing. What if I want to use say register SI to indirectly address memory? Which segment register will be used by default in this case? The following table below lists all possible addressing modes and the default data segment register used in each of them. Addressing Mode: Direct/Displacement Address Operand Format: [displacement/offset/label/whatever you call it] Default Segment Register: DS Addressing Mode: Indirect Address Operand Format: [BX] Default Segment Register: DS Addressing Mode: Indirect Address Operand Format: [BP] Default Segment Register: SS Addressing Mode: Indirect Address Operand Format: [SI] Default Segment Register: DS Addressing Mode: Indirect Address Operand Format: [DI] Default Segment Register: DS (ES for string instructions) Addressing Mode: Indirect+Displacement Address Operand Format: [BX+displacement] Default Segment Register: DS Addressing Mode: Indirect+Displacement Address Operand Format: [BP+displacement] Default Segment Register: SS Addressing Mode: Indirect+Displacement Address Operand Format: [SI+displacement] Default Segment Register: DS Addressing Mode: Indirect+Displacement Address Operand Format: [DI+displacement] Default Segment Register: DS Addressing Mode: Double Indirect+Displacement Address Operand Format: [BX][SI]+displacement Default Segment Register: DS Addressing Mode: Double Indirect+Displacement Address Operand Format: [BX][DI]+displacement Default Segment Register: DS Addressing Mode: Double Indirect+Displacement Address Operand Format: [BP][SI]+displacement Default Segment Register: SS Addressing Mode: Double Indirect+Displacement Address Operand Format: [BP][DI]+displacement Default Segment Register: SS Notes: * displacement is a constant 8/16-bit value. * [reg] means that a memory location is being indirectly accessed through the register reg. The memory address (offset) is contained in the register reg. * [reg+displacement] means that a memory location is being indirectly accessed through the register reg. The memory address (offset) is the sum of the register reg value and the displacement value. * [reg1][reg2]+displacement means that a memory location is being indirectly accessed through the two registers reg1 and reg2. The memory address (offset) is the sum of the values of the registers reg1 and reg2 and the displacement value. That is, all three values are added together to form the offset. To summarize: * Wherever the BP register used as indirect, SS is used as the default segment register to make up the physical address * Wherever the DI register is used by a string instruction, it's used together with the ES segment register * In all other cases, DS is used as default segment register for accessing data If you need to override the use of the default segment register, you can explicitly specify the segment register to use, like so: MOV AL, CS:MyTable[BP][SI] or MOV AL, [CS:BP+SI+MyTable] whichever format is supported by your assembler. The prefix, consisting of segment name and colon, overrides the default segment register to the one specified before the colon. Memory Models Employed by Realmode Compilers ============================================ The following table summarizes the most common memory models employed by 16-bit realmode 80x86 compilers. Near pointers (in real mode) are 16-bit pointers, consisting only of a 16-bit offset. The default segment register (CS for code, DS/SS for data/stack) is assumed to be constant. Near pointers are small and quick, need less code to handle. Far pointers (in real mode) are 32-bit pointers, consisting of the both 16-bit parts, segment and offset. Far pointer increment/decrement usually doesn't affect the segment part of the far pointer. Far pointers are big and slow, need more code to handle. It is problematic to access objects or arrays bigger than 64 KB with both near and far pointers in HLL (C/C++) compilers because this needs manual implementation of far pointer arithmetics. Tiny Memory Model ================= < 64 KB code segment size, near pointer type < 64 KB data segment size, near pointer type Use the tiny model for small size applications. All four segment registers (CS, DS, ES, SS) are set to the same address, so you have a total of 64 KB for all of your code, data, and stack. Near pointers are always used. Tiny model programs can be compiled to .COM format. SS=ES=DS=CS, always Small Memory Model ================== < 64 KB code segment size, near pointer type < 64 KB data segment size, near pointer type Use the small model for average size applications. The code and data segments are different and don't overlap, so you have 64 KB of code and 64 KB of data and stack. Near pointers are always used. SS=DS, usually Medium Memory Model =================== < 1 MB code segment size, far pointer type < 64 KB data segment size, near pointer type The medium model is best for large programs that don't keep much data in memory. Far pointers are used for code but not for data. As a result, data plus stack are limited to 64 KB, but code can occupy up to 1 MB. SS=DS, usually Compact Memory Model ==================== < 64 KB code segment size, near pointer type < 1 MB data segment size, far pointer type Use Compact model if your code is small but you need to address a lot of data. The opposite of the medium model is true for the compact model: far pointers are used for data but not for code; code is then limited to 64 KB, while data has a 1 MB range. All functions are near by default and all data pointers are far by default. SS!=DS, usually Large Memory Model ================== < 1 MB code segment size, far pointer type < 1 MB code segment size, far pointer type Use Large model for very large applications, only. Far pointers are used for both code and data, giving both a 1 MB range. All functions and data pointers are far by default. SS!=DS, usually Huge Memory Model ================= < 1 MB code segment size, far pointer type < 1 MB code segment size, far pointer type Use Huge Model for very large applications only. Far pointers are used for both code and data. Turbo C++ normally limits the size of all data to 64 KB; the huge memory model sets aside that limit, allowing data to occupy more than 64 KB. The Huge model allows multiple data segments, (each 64 KB in size), up to 1 MB for code, and 64 KB for stack. All functions and data pointers are assumed to be far. SS!=DS, usually NASM, assembler =============== Useful options and commands: * -f obj Will generate Intel/OMF .OBJ object outfile (compatible with Borland/Turbo C/C++/Pascal compilers) from the specified file. * -F obj Will generate Borland debug information (useful for TD only). * -D[=value] Predefines a macro. * -U Undefines a macro. Compiling with Open Watcom C/C++ ================================ Important details on Open Watcom C/C++ compiler =============================================== Code is put to _TEXT segment with class CODE and USE16 attribute By default, the data group DGROUP consists of the CONST, CONST2, _DATA and _BSS segments. The compiler places certain types of data in each segment. The CONST segment (of class DATA) contains constant literals that appear in your source code. Example: char* birds[3] = {"robin", "finch", "wren"}; printf ("Hello world\n"); In the above example, the strings "Hello world\n", "robin", "finch", etc. appear in the CONST segment. The CONST2 segment (of class DATA) contains initialized read-only data. The _DATA segment (of class DATA) contains initialized writable data. Example: const int cvar = 1; int var = 2; int table[5] = {1, 2, 3, 4, 5}; char* birds[3] = {"robin", "finch", "wren"}; In the above example, the constant variable "cvar" is placed in the CONST2 segment, "var", "table" and "birds" are placed in the _DATA segment. Finally, the strings "robin", "finch", "wren" are placed in the CONST segment. The _BSS segment (of class BSS) contains uninitialized data such as scalars, structures or arrays. Example: int var1; int array1[400]; For Tiny/.COM model/format _TEXT, _DATA, CONST, CONST2 and _BSS segments are grouped together into the DGROUP group. The _TEXT segment must have "ORG 100h" or equivalent ("RESB 100h" if NASM used) directive so the .COM format be possible. The _TEXT segment must be the first in the DGROUP group. For Small/.EXE model/format only _DATA, CONST, CONST2 and _BSS segments are grouped together into the DGROUP group. The .EXE stack segment (named _STACK, with attribute STACK and class STACK) is either small unused (instead SS:SP is initialized by application to point to end of data segment (DGROUP), so that SS=DS=DGROUP) or big enough to be usable (and also grouped to DGROUP, so that SS=DS=DGROUP). Some of the arithmetic operators (long multiplication and division) routines are implemented as functions and must be additionally linked with your program. By default, the compiler uses the register-based argument passing (unlike Turbo C++). This register convention isn't covered here, but I suppose, it can be deduced from the generated code and from the assembler source codes for the Watcom standard C library. By default, the compiler appends an underscore character to the function and variable names when compiling C/C++ code to the object file, e.g. "void MyFunction()" would have "MyFunction_" name in the object file. Therefore, any external assembly functions must be written with this in mind. An assembler function must have a name with trailing underscore to be accessible from C/C++, e.g. asm name "MyAsmFxn_" will be seen to the C/C++ code as say "extern void MyAsmFxn()". And of course, if MyAsmFxn() needs to call MyFunction(), it must "call MyFunction_" because in the object files the C/C++ names must have the trailing underscore. Note: the additional underscore character in function/variable names appears at different positions in Borland/Turbo C/C++ and Open Watcom C/C++. By default Borland/Turbo does leading underscore, Watcom does trailing underscore. It is, however, possible to generate code with stack-based argument passing and link Watcom compiled code with the code whose functions have leading underscore in the object files. For this, there's a special reserved keyword cdecl (may also be _cdecl and __cdecl). Functions definded as, say, int cdecl fxn (int x); will compile for stack-based argument passing and the additional underscore in the name will appear in front of the C name, e.g. _fxn. This (cdecl) calling and naming convention is exactly the same as adopted by the Turbo C++ compiler. Calling Conventions and Register Conventions ============================================ When calling a function, the following is pushed into the stack, in the specified order: * function arguments from last to first (notice reverse order!), * return address. The called function never removes its arguments from stack when returning to the caller. The caller pushes arguments to the stack and removes them after the call. 8-bit arguments are extended to 16-bit when pushed on the stack. Function return values are placed into: * AL (8-bit value) or * AX (16-bit value) or * both DX and AX (32-bit value, most significant half goes to DX, least significant half goes to AX). Pointers in the real mode can be 16-bit (near) and 32-bit (far). Segment part of a far pointer goes to DX, while offset part goes to AX. A function must preserve values of: * DS, SS, BP, SI, DI (remember them for writing functions in assembler). * The direction flag (DF, in FLAGS register) should also be preserved as 0. * The ES register is not guaranteed to be equal to DS. Set ES to value of DS if needed. The reserved interrupt keyword provides additional entry and exit code for void(void) functions to make them directly usable as interrupt service routines. Their addresses can be directly stored to the interrupt vector table. Remember that the compiler is 16-bit. The entry and exit code of interrupt functions won't save/restore 80386+ 32-bit registers entirely, it will only save/restore AX of EAX, etc. Floating point unit state isn't saved/restored either. Structure passing and returning by value isn't covered here. Neither is floating point data. If you feel need this information, you may create a C function that makes use of structures of floating point types and generate assembly source code from it. You may find most of answers to your questions by inspecting the generated assembly source code. I'm not used to pass structures, for most things a pointer to a structure is enough. And I don't consider floating point support in Turbo C++ any serious or really helpful for the kind of stuff OS developers do at first place. Work In Progress ================ The work on this document is in progress. Meanwhile, try learning things from the compiler documentation and source codes provided here (already available). If you want to contact me regarding this doc or anything else, please post a message on the usenet: news:alt.os.development Alexei A. Frounze From: