Title: Altair Assembler Part 1 Date: February 01 2020 Tags: altair programming ======================================== As I keep complaining, it's getting harder and harder to hand assemble everything. It's error prone and tedious. An assembler can handle a lot of the work for me so I'm going to try to write one. Two big inconveniences of hand assembling are translating all the opcodes to octal values without error, and counting out the addresses of all the statements so I can go back and fill them in at JMP, CALL, and memory references throughout the program. If I have to modify a program by inserting a statement, it shifts all the remaining statements in memory and breaks all the references to any locations after it. Which, of course, happens all the time in development. Assemblers don't just translate the opcodes from ASCII mnemonics to their binary machine code, but they also provide some other convenience features. Assemblers allow you to define your own labels for a statement which saves the address that the statement will be written to and replaces references to the label elsewhere in the code with the address. This way, you can CALL or JMP to a label and never have to know what the address ends up being. Other features are EQU and SET pseudo-opcodes that are used like variables which allows you to create a name for a value and reference that value by the name elsewhere in the code. The difference between the two is just that a SET value can be changed later but an EQU can only be defined once. Not sure why the two options. Why not just use SET? I guess so developers can set constants and not accidentally redefine something they didn't mean to. I'm loosely basing my assembler on the one described in the 8080 Programmer's Manual which I've learned most of my 8080 programming from. There are other 8080 assemblers with different features and implementations. One provided in the MITS Programming System, and Microsoft's M80 for CP/M, are well known examples. # Version 1 Features # For a first pass, I'm implementing a minimum set of features. That way, I can leverage the assembler to more easily build the next, more featureful version. Obviously, I am going to implement the two big needs: opcode mnemonics to octal conversion and address labels. Translating opcodes also includes parsing the expected arguments for each opcode. JMPs take a 16-bit address which take up two additional bytes of memory. MVI, takes a register and a byte of data, using 1 additional byte. The register becomes part of the opcode. There are 7 or 8 different different combinations of arguments so it's a bit of work handling everything. Labels are also more of a challenge than it seems at first glance. When a label is defined, I have to store the string and the address of the line it's defined on. Not too bad. Then when a label is referenced in an opcode argument, I just look it up in the list and read the address. But what about a label that is referenced before it is defined? Like a 'JMP exit' with the exit subroutine defined at the end of the file. Some assemblers handle this by simply not allowing it. That seems like a very crippling solution. Others are "2 pass" assemblers that read the full program and build the list of labels (and EQUs and SETs) and their values. This is called the "symbol table" and you'll still see this referred to today in higher level languages. The assembler then rereads the code, assembling it and substituting all the now known label values. Sounds like a good solution except I am not reading from disk or memory, yet. I'll be passing my program in through the serial port and it would be much more convenient if I only had to send it once. I'm planning to track undefined labels and their locations and then just run through that list after assembling and fill in those addresses with the defined address. If any label is still undefined at this point, we can report the error. That's the current plan. I might have to fall back on just doing a 2 pass over serial or, more defeatist, not allow label references before definitions. I'll also be implementing the ORG pseudo-opcode so I can put code in specific places without having to pad out from address 000000Q. The most likely example will be for writing interrupt handlers for the RST instructions which need to be at specific addresses. It will also be useful for assembling a program at a high memory location such as a bootloader or the next version of the assembler. I plan to end up with the assembler in higher memory so I can write programs that can use interrupts whose handlers need to be at 070Q and below. I want to have the DB, DW, and DS pseudo-codes to store data. DB allows you to store a byte. With more sophisticated argument handling, a list of bytes that ASCII characters could also be a string. DW stores a word and DS simply skips a number of bytes of memory so you can use that space to write data to later. I don't know if I'll get to strings, but a list of letter bytes using DB will work fine as a first version. # Missing Features # Since I'm currently still hand assembling, I've got to leave out some features to ease the process of development. It's also the first assembler I've ever written, and the longest 8080 program I've ever written. For now, I'm skipping EQU and SET pseudo-opcodes. I haven't felt like I've been missing these in my programming yet. Labels will handle just about everything I need for references. I'm requiring a strict format of code entry. All fields must be separated by exactly one TAB character. A field that isn't required, but comes before a required field, must exist, but be empty. For example, when not assigning a label, you need to start the line with a TAB so the first field is read as an empty label name and label processing is skipped. Arguments are comma separated, no spaces. If an opcode does not have arguments, you can leave out the TAB after the opcode field. Expressions are right out. Assemblers typically allow you do some basic math in the arguments. Stack pointer + 2. mylabel + 4. 015Q SHL 3. I don't want to write a calculator at the same time as the assembler itself. Included in expressions is base conversion. Assemblers will allow the developer to specify data or addresses in binary, octal, hexadecimal, or decimal and convert as necessary. I'll only allow octal for now. I may just blanket switch to hexadecimal for easier 16-bit number management before implementing conversion routines. Macros are out of scope as well. Macros allow you to name a block of code, and in some cases parameterized it like a function. Any time the macro is referenced, it will be replaced with the code and with the parameters filled in. It's nice for some often reused code but so far, I'm ok with subroutines. Macros will be nice some day.