____________ Introduction If you have an 8 bit microcomputer system and want to experiment with the Motorola MC68000, this cross compiler may interest you. One of the problems an experimenter faces in using a new microprocessor is how to develop programs for it. One solution to this problem is to purchase a new computer system and a new set of software for each of the processors that interest you. Clearly, this solution is expensive. Another solution is to use an existing computer and software to develop the programs for the new one. Compilers which use an existing computer to produce code for another machine are called cross compilers. Such cross compilers (and cross assemblers) are available for most of the microprocessors on the market. However, they typically run on a large mainframe or mini computer. This rules them out for many applications. It would be nice to have a simple compiler that can be run on an existing microcomputer to generate code for the new processor. That is exactly the need addressed by this cross compiler for the MC68000. I have chosen a subset of the FORTH language because it is a powerful language that is simple to implement. I decided to write the compiler in FORTH because most of the routines required by the compiler are already present so the development time for the compiler is greatly reduced. The resulting cross compiler should be transportable to any computer which supports FORTH. I am using this compiler for scientific work where I need the speed of the MC68000 so the compiler does not have a lot of fancy data structures and I/O routines. These things could be added but the current configuration is adequate for my purposes. In this article I am assuming the reader is familiar with the FORTH language. Those not familiar with FORTH should get a copy of the excellent book "Starting FORTH" by Leo Brodie. I used Starting FORTH as a reference when I developed this compiler so the FORTH words that are implemented work as described there. The detailed description of the operators supported by the compiler is presented in assembly language, so familiarity with the MC68000 assembly language is needed for understanding the design of the compiler. However, it is not necessary to know assembly language to use the compiler. Page 2 ______ ______ Memory Layout Before considering the compiler it is necessary to know how the memory of the MC68000 is used. The compiler uses four separate areas of MC68000 address space. The pointers to these areas are all maintained in MC68000 registers so the user can allocate any area of memory for the various functions by setting up the registers properly. Thus, it is not necessary to recompile the program to change the memory map. These areas are described below. 1. Code pool Subroutine definitions place their output code in the code pool and update the code pool pointer variable in the compiler (M68PCODE in listing 2, screen 9). During execution of the resulting code, the MC68000 program counter is the pointer to this area of memory. Only relative addressing is used so the code pool can be relocated simply by moving the code and starting the program at the proper place. 2. Variable pool The memory used by variables and arrays is allocated relative to the variable pool pointer (A5 in the MC68000). The compiler word M68ALLOT is used to allocate space and maintain even address alignment. To avoid address faults the value placed in A5 must be even. With that restriction the variable pool may be placed anywhere in memory by setting the value of A5. With an appropriate supervisor program, it is possible to produce reentrant modules with this compiler by using A5 to assign a separate space for the local variables each time the module is called. 3. Data stack The memory used by the data stack is pointed to by MC68000 register A6. The stack is maintained using the auto decrement addressing mode to store information on the stack and the auto increment addressing mode to remove information from the stack. For further information on the workings of the data stack see the stack operator section of listing 1. 4. Return stack The hardware stack is used for the return stack because most of the return stack operations then become automatic. A7 is used as the pointer to the return stack. Since there is a supervisor and a user hardware stack pointer it is possible to use modules generated by this compiler for both interrupt service routines and user programs. Page 3 ________ ___________ Compiler Description The FORTH subset that I implemented was chosen for a particular hardware configuration but could be expanded if different I/O were required. I assume that you have a computer with a FORTH system running on it. In the following discussion I will refer to this computer as the host. I also assume that the MC68000 is used as a coprocessor. This assumption greatly reduces the complexity of the subset that needs to be implemented. All functions which interact with the terminal can be left out, also all the interaction with the operating system and periferals can be handeled by the host. A simple bidirectional communication channel is all that is required between the host and the MC68000. I have chosen to implement the arithmetic, stack, memory access, and control operators found in most FORTH systems. I/O routines for data transfer between the MC68000 and the host can be written using the primitives provided since the MC68000 uses memory mapped I/O. The compiler generates machine code so there is no need for an inner interpreter. The MC68000 provides the capability of writing position independent code, so all of the code produced by this compiler is position independent unless the user explicitly forces it to be otherwise. Since the code is kept separate from the variable and stack space, the output from the compiler can be put into ROM. It is sometimes desirable to use programs generated with this compiler in host environments other than FORTH so I have provided a simple output scheme that allows the output code to be sent to any device supported by the host. To avoid conflict with the FORTH definitions in the host, most of the compiler is in a separate vocabulary (named M68K) that is accessed only through the defining words. All definitions created with the compiler are also placed in this vocabulary to prevent accidental reference to them while using the host development system. Some of the words normally used in FORTH have been given different names to avoid conflicts with the host definitions. These differences will be discussed later. There are two basic types of definitions produced by the compiler, macros and subroutines. Macro definitions do not generate any output code but when referenced they store the code implememting the macro in the definition currently being compiled. Subroutine definitions generate output code when defined and generate a subroutine call when referenced in another subroutine definition. A macro definition may be referenced in another macro or in a subroutine definition but a subroutine may not be referenced in a macro definition. The macro definition is the basic building block in the compiler, so I will discuss it in detail before considering constant, variable, and array definitions. Page 4 A macro is created in the same way as a FORTH colon definition except that :M68MAC and ;M68MAC are used in place of the : and ; of a FORTH definition. The body of the definition consists of executable MC68000 machine code or references to macros, variables, constants, and arrays. For examples of macro definitions see screens 33-43 in listing 2 and the examples below. When a macro is defined the M68K vocabulary is activated, a FORTH header is created in the dictionary of the host, and space is reserved for the code length. The host FORTH is in execution mode so any words referenced in the body of a macro definition are executed immediatly. When the macro definition is terminated, the length of the code segment is stored and the FORTH vocabulary is reactivated. Any subsequent reference to the macro copies the code contained within the macro body into the host dictionary at the location HERE, then the dictionary pointer is updated to point to the memory location following the code. To illustrate this process consider the definition of the macro 2* shown in figure 1. First, :M68MAC is used to start the definition, create the FORTH header for 2* (Note.. I am being deliberately vague about the form of the header because that depends on the particular implementation of FORTH being used), and allocate space for the code length. Next the macro DUP is called, it copies the code for performing a DUP function into the host dictionary at the location HERE and updates the dictionary pointer by two. Then the macro + is called, it copies its code into the dictionary and updates the dictionary pointer by four. Finally, ;M68MAC is used to terminate the definition and compute and store the macro length in the two bytes following the header. In this case the length is six bytes. Single and double precision constants are compiled as macros containing a single MC68000 instruction to push the value of the constant onto the data stack. The word M68CON is used to define a single precision constant as shown in figure 2. The value of the constant is taken from the host stack and stored in the macro as part of a move immediate instruction (see listing 1). The word M68DCON is the same except that a double precision value is involved. Variables are defined as single precision constants that push the variable pool relative address onto the stack for use with the fetch and store operations. Note that the variable pool relative address is a 16 bit signed integer, so the variable pool can be no longer than 32K bytes. The word M68VAR defines a single precision variable while the word M68DVAR is for double precision. Arrays are defined as macros that take the index off the Page 5 stack and compute the variable pool relative address of that element and leave the result on the stack. The compiler supports arrays whose elements can be either byte, single precision, or double precision. The words M68CARY, M68ARY, and M68DARY respectively define these data types. Figure 3 is an example of the definition of a byte array containing five elements. The variable pool pointer is shown before and after the definition of the array. Note that the compiler maintains alignment on word boundaries to avoid address exceptions when the code is executed. When a subroutine is defined, the M68K vocabulary is activated and a FORTH header is created in the dictionary of the host. The code pool relative address of the subroutine is then stored as the first entry of the definition. Compilation then proceeds in the same way as in a macro definition except that references to subroutine definitions are also allowed. When the definition is terminated a return from subroutine instruction is compiled, the code pool pointer M68PCODE is updated, then the code is sent to the output file and deleted from the dictionary. This leaves only the header and the code pool relative address of the subroutine in the dictionary. Subsequent reference to the subroutine uses the code pool relative address to compute the relative address required in a branch to subroutine instruction. Note that the branch instructions on the MC68000 restrict your program to 32K bytes because all of the subroutine calls are back branches, forward referencing is not supported in this compiler. Since the code pool relative address of a subroutine is stored at the start of the definition, a subroutine may be referenced recursively. However, care must be exercised when doing this. Subroutine calls do not create local variables so any subroutine which stores a value in a variable may not operate properly when recursively called. I recommend keeping all variables on the stack in recursive subroutines. When you do this make sure the data stack is large enough. There is no check for stack overflow so something will be clobbered if the data stack space is too small. To illustrate the process of subroutine compilation, consider the example in figure 4. The word :M68K is used to start the definition and create the FORTH header for 4*, this also sets the code pool relative address of 4* (in this case it is zero). Next, the macro 2* is called twice (the macro 2* that is refered to here is the one defined in figure 1 not the one actually implemented in the compiler which is more efficient). Each time 2* is called the code implementing it is stored in the host dictionary and the dictionary pointer is updated. Then ;M68K is used to terminate the definition by compiling a subroutine return instruction, adding the length of the subroutine to the code pool pointer, copying the code to the output file, and deleting the code from the dictionary. Page 6 ____________ Installation To install the compiler you need either a fig FORTH or FORTH-79 system. The installation on a FORTH-79 system is a little more involved so I will cover the fig installation first. Listing 2 contains the complete source for the compiler, you must somehow get these 35 screens into your FORTH system. For $25 I will provide an 8" single density CP/M disk with all of the source code for the compiler and this article. The source code on the disk will be in a screen file supported by Laboratory Microsystems Z80-FORTH, and also as a text file containing a listing of the screens. By the time you read this, the source may be available from a couple of user groups and/or RCPM systems. A few words in the compiler must be customized to your system. In screen 12 is a word called HIGH-BYTE, this word takes the top entry off the stack and returns the high byte. This definition must be replaced with the code to accomplish this task, it may be necessary to use a code definition on your system. The word M68OUT in screen 18 must be designed to send the generated code to whatever output device or file you want, refer to the note in that screen. The word M68OUT currently prints the code on the screen. If you want to use the external reference capability you will need to modify the definition in screen 20 to send the output where you want it. If you do not want that feature then simply delete screen 20 entirely. The compiler is loaded in three sections. The basic compiler and error checking routines are loaded from screens 8-24. The program control and looping operations with their associated error checking are contained in screens 25-33. The macros that implement the operators supported by the compiler are in screens 34-43. For a FORTH-79 system the above applies but some additional work needs to be done. I have used the fig FORTH word ENDIF instead of the FORTH-79 word THEN so on a FORTH-79 system the definition of ENDIF given in listing 2 screen 44 should be used. I have also used the fig word construction. See screen 44 for a discussion of a definition for