Reading Records

Now we will consider the process of reading records. In this program, we will read each record and display the first name listed with each record.

Since each person's name is a different length, we will need a function to count the number of characters we want to write. Since we pad each field with null characters, we can simply count characters until we reach a null character.^[2] Note that this means our records must contain at least one null character each.

Here is the code. Put it in a file called count-chars.s:

#PURPOSE:  Count the characters until a null byte is reached.
#
#INPUT:    The address of the character string
#
#OUTPUT:   Returns the count in %eax
#
#PROCESS:
#  Registers used:
#    %ecx - character count
#    %al - current character
#    %edx - current character address

 .type count_chars, @function
 .globl count_chars

 #This is where our one parameter is on the stack
 .equ ST_STRING_START_ADDRESS, 8
count_chars:
 pushl %ebp
 movl  %esp, %ebp

 #Counter starts at zero
 movl  $0, %ecx
 #Starting address of data
 movl  ST_STRING_START_ADDRESS(%ebp), %edx

count_loop_begin:
 #Grab the current character
 movb  (%edx), %al
 #Is it null?
 cmpb  $0, %al
 #If yes, we're done
 je    count_loop_end
 #Otherwise, increment the counter and the pointer
 incl  %ecx
 incl  %edx
 #Go back to the beginning of the loop
 jmp   count_loop_begin

count_loop_end:
 #We're done.  Move the count into %eax
 #and return.
 movl  %ecx, %eax

 popl  %ebp
 ret

As you can see, it's a fairly straightforward function. It simply loops through the bytes, counting as it goes, until it hits a null character. Then it returns the count.

Our record-reading program will be fairly straightforward, too. It will do the following:

Open the file
Attempt to read a record
If we are at the end of the file, exit
Otherwise, count the characters of the first name
Write the first name to STDOUT
Write a newline to STDOUT
Go back to read another record

To write this, we need one more simple function - a function to write out a newline to STDOUT. Put the following code into write-newline.s:

 .include "linux.s"
 .globl write_newline
 .type write_newline, @function
 .section .data
newline:
 .ascii "\n"
 .section .text
 .equ ST_FILEDES, 8
write_newline:
 pushl %ebp
 movl  %esp, %ebp

 movl  $SYS_WRITE, %eax
 movl  ST_FILEDES(%ebp), %ebx
 movl  $newline, %ecx
 movl  $1, %edx
 int   $LINUX_SYSCALL
 movl  %ebp, %esp
 popl  %ebp
 ret

Now we are ready to write the main program. Here is the code to read-records.s:

 .include "linux.s"
 .include "record-def.s"

 .section .data
file_name:
 .ascii "test.dat\0"

 .section .bss
 .lcomm record_buffer, RECORD_SIZE

 .section .text
 #Main program
 .globl _start
_start:
 #These are the locations on the stack where
 #we will store the input and output descriptors
 #(FYI - we could have used memory addresses in
 #a .data section instead)
 .equ ST_INPUT_DESCRIPTOR, -4
 .equ ST_OUTPUT_DESCRIPTOR, -8

 #Copy the stack pointer to %ebp
 movl %esp, %ebp
 #Allocate space to hold the file descriptors
 subl $8,  %esp

 #Open the file
 movl  $SYS_OPEN, %eax
 movl  $file_name, %ebx
 movl  $0, %ecx    #This says to open read-only
 movl  $0666, %edx
 int   $LINUX_SYSCALL

 #Save file descriptor

 movl  %eax, ST_INPUT_DESCRIPTOR(%ebp)

 #Even though it's a constant, we are
 #saving the output file descriptor in
 #a local variable so that if we later
 #decide that it isn't always going to
 #be STDOUT, we can change it easily.
 movl  $STDOUT, ST_OUTPUT_DESCRIPTOR(%ebp)

record_read_loop:
 pushl ST_INPUT_DESCRIPTOR(%ebp)
 pushl $record_buffer
 call  read_record
 addl  $8, %esp

 #Returns the number of bytes read.
 #If it isn't the same number we
 #requested, then it's either an
 #end-of-file, or an error, so we're
 #quitting
 cmpl  $RECORD_SIZE, %eax
 jne   finished_reading

 #Otherwise, print out the first name
 #but first, we must know it's size
 pushl  $RECORD_FIRSTNAME + record_buffer
 call   count_chars
 addl   $4, %esp
 movl   %eax, %edx
 movl   ST_OUTPUT_DESCRIPTOR(%ebp), %ebx
 movl   $SYS_WRITE, %eax
 movl   $RECORD_FIRSTNAME + record_buffer, %ecx
 int    $LINUX_SYSCALL

 pushl  ST_OUTPUT_DESCRIPTOR(%ebp)
 call   write_newline
 addl   $4, %esp

 jmp    record_read_loop

finished_reading:
 movl   $SYS_EXIT, %eax
 movl   $0, %ebx
 int    $LINUX_SYSCALL

To build this program, we need to assemble all of the parts and link them together:

as read-record.s -o read-record.o
as count-chars.s -o count-chars.o
as write-newline.s -o write-newline.o
as read-records.s -o read-records.o
ld read-record.o count-chars.o write-newline.o \
   read-records.o -o read-records

The backslash in the first line simply means that the command continues on the next line. You can run your program by doing ./read-records.

As you can see, this program opens the file and then runs a loop of reading, checking for the end of file, and writing the firstname. The one construct that might be new is the line that says:

 pushl  $RECORD_FIRSTNAME + record_buffer

It looks like we are combining and add instruction with a push instruction, but we are not. You see, both RECORD_FIRSTNAME and record_buffer are constants. The first is a direct constant, created through the use of a .equ directive, while the latter is defined automatically by the assembler through its use as a label (it's value being the address that the data that follows it will start at). Since they are both constants that the assembler knows, it is able to add them together while it is assembling your program, so the whole instruction is a single immediate-mode push of a single constant.

The RECORD_FIRSTNAME constant is the number of bytes after the beginning of a record before we hit the first name. record_buffer is the name of our buffer for holding records. Adding them together gets us the address of the first name member of the record stored in record_buffer.

^[2]If you have used C, this is what the strlen function does.