From : Ed Beroset                                           1:3641/1.250
To   : Bob Kohl                            
Subj : reading object code?                                                  


In a msg on <Mar 09 13:43>, to All, Bob Kohl writes:

 BK>     file for this program. In the old days, I learned how to read an
 BK>     object file. That was when I was learning to program ASM on a
 BK>     mainframe. I'm a bit out of practice, and I'm not sure the same
 BK>     method holds true for the PC.

It does, but the object module format (OMF) for 80XXX linkers is wickedly
complex.  First, a bit of background.  The OMF was first promulgated by Intel
for their own compilers and linkers.  It was created way back in the dark old
days when memory was at a premium, so all sorts of gimmicks were used to reduce
the amount of memory required to store OMF records.  Later, other companies,
including Microsoft, IBM, Borland, and PharLap added to the original OMF and,
unfortunately, didn't always use the same conventions.  Microsoft's
documentation, while incomplete, is still the most comprehensive I've seen and
is an essential reference for anybody needing to understand OMF as it is
currently used.  It is available for download from Microsoft's BBS, and probably
via their ftp site as SS0288.ZIP.  It's also commonly available on programmers'
BBSs around the world.  Until you download that file, here's a rough sketch so
you'll have some idea how to read these things.  

The first thing to realize is that OMF files consist of a series of records. 
Every record consists of a one byte Record Type, followed by a word-sized Record
Length.  What follows that is data bytes and lastly the checksum.  The Record
Length includes the data bytes and checksum but not the Record Type or Record
Length bytes.  The checksum byte is calculated to make the sum of all bytes in
the record (modulo 256) equal zero, although some language products simply put a
zero in this byte instead of calculating a checksum. 

Given just that much information, your object file breaks into the following
records:

          0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
80 0E 00 0C 6B 65 79 69 6E 67 30 33 2E 61 73 6D 0D
96 25 00 00 06 44 47 52 4F 55 50 04 44 41 54 41 04 43 4F
         44 45 05 53 54 41 43 4B 05 5F 44 41 54 41 05 5F
         54 45 58 54 8F
98 07 00 48 21 00 07 04 01 EC
98 07 00 48 2D 00 06 03 01 E2
98 07 00 74 00 01 05 05 01 E1
9A 06 00 02 FF 02 FF 03 5B
88 04 00 00 A2 00 D2
A0 06 00 02 00 00 19 00 3F
A0 16 00 02 1B 00 4F 75 74 70 75 74 20 66 69 6C 65 6E 61
         6D 65 3A 20 24 BD
A0 25 00 01 00 00 B8 00 00 8E D8 B4 0A BA 00 00 CD 21 BA
         00 00 B4 09 CD 21 BB 01 00 BA 00 00 B4 09 CD 21
         B4 4C CD 21 42
9C 19 00 C8 01 15 01 01 C4 08 14 01 02 C4 0D 10 01 02 1B
         00 C4 17 10 01 02 02 00 99
8A 07 00 C1 00 01 01 00 00 AC

Record type 80h is THEADR (Translator HEADer Record) which contains the name of
the source module.  In this case, it reads "keying03.asm" followed by the
checksum byte 0dh.

Record type 96h is an LNAMES record, which specifies a list of names to be used
subsequently in the file.  Each name consists of a length byte, followed by the
actual name.  Null names (e.g. with zero length) are valid.  In this case,
you've got the following names:

''
'DGROUP'
'DATA'
'CODE'
'STACK'
'_DATA'
'_TEXT'

Record type 98h is a SEGDEF record.  The format for this one is quite complex,
but basically it contains fields which describe the name (by reference to the
previous LNAMES record), the size, the class, combine type and other attributes.
 In your case there are three SEGDEF records which describe the following
segments:

 _TEXT           WORD  PUBLIC  Class 'CODE'     Length: 0021
 _DATA           WORD  PUBLIC  Class 'DATA'     Length: 002d
 STACK           PARA  STACK   Class 'STACK'    Length: 0100

Record type 9Ah is a GRPDEF record, which is about as complicated as the SEGDEF
records.  This one specifies that the _DATA and STACK segments are both in the
group named DGROUP.

Record type 88h is a COMENT record.  This particular one is an "A2h" class
comment, which is a marker indicating that a two-pass linker can stop the first
pass at this point.  All of the tables required internally by the linker (e.g.
the above lists of segment names, combine classes, etc.) have been read in by
this point, so the linker can proceed to pass two.  This record type isn't
technically required but it can theoretically speed up the link process.  I can
tell from this that you're probably using MASM 5.10 -- later versions of MASM
don't seem to generate records of this type.

Record type A0h is an LEDATA record.  LEDATA (Logical Enumerated DATA records)
and LIDATA (Logical Iterated DATA records) are where the "meat" of the program
actually lies.  For that reason, I'll describe it in a bit more detail.

A0 06 00 02 00 00 19 00 3F

A0 is the record type, and 06 00 means there are six bytes in the record.  The
02 means that this is segment #2 (the _DATA segment as described above by the
SEGDEF records).  The following word (00 00) specifies the address of this data
relative to the beginning of the segment.  Successive data records occupy higher
addresses.  In this case, it's the first data in this segment, so it starts at
0.  What follows is the actual data.  In this case, we only have two bytes, 19
00 which correspond to the following two assembly language source lines:

 BK> mlbuff  db  25               ;Maximum # of byes to read
 BK> albuff  db  0                ;Number of bytes read by DOS

The checksum is 3Fh.

The next LEDATA record follows the same format and indicates segment #2 (_DATA),
an offset of 01bh (leaving room for 25 bytes of strbuff + 2 bytes of data in the
last record), and 12h data bytes which correspond to the following source line:

 BK> input lit     db  "Output filename: $" ; setup for input

The next LEDATA record contains the actual code for your program.  If you
generated a listing file when you assembled the file, you should be able to
match up the data bytes in this record with the code bytes generated by the
assembler.  In this case:

 BK>         mov ax,@data
 BK>         mov ds,ax
            ...

corresponds to B8 00 00 8E D8... 

Finally we get to record type 9Ch, which is a FIXUPP record.  This is one of the
most complicated records commonly used in OMF files, so I won't go into any
detail here, but in a nutshell, it specifies the locations of things that need
to be "fixed up" by the linker.  As an example, the mov ax,@data instruction is
encoded as though @data = 0.  This isn't necessarily true, since the _DATA
segment might actually be anywhere.  In this case, the first FIXUPP subrecord
specifies that at location 0001h in the previous LEDATA or LIDATA record, a
logical segment base address specified by a Group Index #1 (which refers to the
_DATA segment within the DGROUP group) may need a fix-up.  (whew!)  

The last record is type 8Ah, a MODEND (MODule END) record.  It is what its name
implies, specifying the end of a module.  This particular MODEND record also
indicates that this is the main module and that the starting location (relative
to the start of the _TEXT segment) is 0.

Now aren't you sorry you asked?  :-)

There used to be (and probably still is) a utility that Microsoft provided to
dump out object files.  While you're on Microsoft's BBS, you might look for such
 a utility there.  There are a number of freeware utilities that do the same
thing, and the utility I use (IMHO the most complete and helpful) is one called
TDUMP that is shipped with Borland's language products (I used TDUMP's output to
help create this message).  

-> Ed <-
 
--- Squish v1.01
 * Origin: = Psychotronic BBS // 919-286-4542 // Durham, NC = (1:3641/1.250)
