Title: Altair Assembler Part 1
Date: February 01 2020
Tags: altair programming
========================================

As I keep complaining, it's getting harder and harder to hand assemble
everything.  It's error prone and tedious.  An assembler can handle a lot of the
work for me so I'm going to try to write one.

Two big inconveniences of hand assembling are translating all the opcodes to
octal values without error, and counting out the addresses of all the statements
so I can go back and fill them in at JMP, CALL, and memory references throughout
the program.  If I have to modify a program by inserting a statement, it shifts
all the remaining statements in memory and breaks all the references to any
locations after it.  Which, of course, happens all the time in development.

Assemblers don't just translate the opcodes from ASCII mnemonics to their binary
machine code, but they also provide some other convenience features.  Assemblers
allow you to define your own labels for a statement which saves the address that
the statement will be written to and replaces references to the label elsewhere
in the code with the address.  This way, you can CALL or JMP to a label and
never have to know what the address ends up being.

Other features are EQU and SET pseudo-opcodes that are used like variables which
allows you to create a name for a value and reference that value by the name
elsewhere in the code.  The difference between the two is just that a SET value
can be changed later but an EQU can only be defined once.  Not sure why the two
options.  Why not just use SET?  I guess so developers can set constants and not
accidentally redefine something they didn't mean to.

I'm loosely basing my assembler on the one described in the <i>8080 Programmer's
Manual</i> which I've learned most of my 8080 programming from.  There are other
8080 assemblers with different features and implementations.  One provided in
the MITS Programming System, and Microsoft's M80 for CP/M, are well known
examples.


# Version 1 Features #

For a first pass, I'm implementing a minimum set of features.  That way, I can
leverage the assembler to more easily build the next, more featureful version.

Obviously, I am going to implement the two big needs:  opcode mnemonics to octal
conversion and address labels.  Translating opcodes also includes parsing the
expected arguments for each opcode.  JMPs take a 16-bit address which take up
two additional bytes of memory.  MVI, takes a register and a byte of data, using
1 additional byte.  The register becomes part of the opcode.  There are 7 or 8
different different combinations of arguments so it's a bit of work handling
everything.

Labels are also more of a challenge than it seems at first glance.  When a label
is defined, I have to store the string and the address of the line it's defined
on.  Not too bad.  Then when a label is referenced in an opcode argument, I just
look it up in the list and read the address.  But what about a label that is
referenced before it is defined?  Like a 'JMP exit' with the exit subroutine
defined at the end of the file.

Some assemblers handle this by simply not allowing it.  That seems like a very
crippling solution.  Others are "2 pass" assemblers that read the full program
and build the list of labels (and EQUs and SETs) and their values.  This is
called the "symbol table" and you'll still see this referred to today in higher
level languages.  The assembler then rereads the code, assembling it and
substituting all the now known label values.  Sounds like a good solution except
I am not reading from disk or memory, yet.  I'll be passing my program in
through the serial port and it would be much more convenient if I only had to
send it once.  I'm planning to track undefined labels and their locations and
then just run through that list after assembling and fill in those addresses
with the defined address.  If any label is still undefined at this point, we can
report the error.

That's the current plan.  I might have to fall back on just doing a 2 pass over
serial or, more defeatist, not allow label references before definitions.

I'll also be implementing the ORG pseudo-opcode so I can put code in specific
places without having to pad out from address 000000Q.  The most likely example
will be for writing interrupt handlers for the RST instructions which need to be
at specific addresses.  It will also be useful for assembling a program at a
high memory location such as a bootloader or the next version of the assembler.
I plan to end up with the assembler in higher memory so I can write programs
that can use interrupts whose handlers need to be at 070Q and below.

I want to have the DB, DW, and DS pseudo-codes to store data.  DB allows you to
store a byte.  With more sophisticated argument handling, a list of bytes that
ASCII characters could also be a string.  DW stores a word and DS simply skips a
number of bytes of memory so you can use that space to write data to later.  I
don't know if I'll get to strings, but a list of letter bytes using DB will work
fine as a first version.


# Missing Features #

Since I'm currently still hand assembling, I've got to leave out some features
to ease the process of development.  It's also the first assembler I've ever
written, and the longest 8080 program I've ever written.

For now, I'm skipping EQU and SET pseudo-opcodes.  I haven't felt like I've been
missing these in my programming yet.  Labels will handle just about everything I
need for references.

I'm requiring a strict format of code entry.  All fields must be separated by
exactly one TAB character.  A field that isn't required, but comes before a
required field, must exist, but be empty.  For example, when not assigning a
label, you need to start the line with a TAB so the first field is read as an
empty label name and label processing is skipped.  Arguments are comma
separated, no spaces.  If an opcode does not have arguments, you can leave out
the TAB after the opcode field.

Expressions are right out.  Assemblers typically allow you do some basic math in
the arguments.  Stack pointer + 2.  mylabel + 4.  015Q SHL 3.  I don't want to
write a calculator at the same time as the assembler itself.

Included in expressions is base conversion.  Assemblers will allow the developer
to specify data or addresses in binary, octal, hexadecimal, or decimal and
convert as necessary.  I'll only allow octal for now.  I may just blanket switch
to hexadecimal for easier 16-bit number management before implementing
conversion routines.

Macros are out of scope as well.  Macros allow you to name a block of code, and
in some cases parameterized it like a function.  Any time the macro is
referenced, it will be replaced with the code and with the parameters filled in.
It's nice for some often reused code but so far, I'm ok with subroutines.
Macros will be nice some day.