Other system documentation:

Data Representations
====================

- Prolog Data Structures

We describe here how the SB-Prolog system represents its data structures and
the macros that help the programmer manipulate them.  (The macros described
here are found in sim/aux.h and types are found in sim/sim.h)

A Prolog data object (a term) is always represented beginning from a single
tagged word (declared in C as type `word').  The tag is kept in the LOW-ORDER
two bits of the word.  We will call such a tagged word a P-word (for Prolog
data object word).  There are four tags:

00: (FREE)

indicates this P-word is a ``reference'' or a pointer.  (All pointers are to
four-byte boundaries, so the word itself is the address; no manipulation of
the word is needed to calculate the effective address).  If the address is of
this word itself, then this word represents a ``free'' variable.  If the
address is to some other word, then the effective value of this word is
obtained by looking at the P-word at that address.  The process of running a
list of such references until a free variable or some other (see below)
P-word is obtained is called ``dereferencing''.

There is a macro deref(op) which will dereference its argument (op).  This
macro uses a temporary variable `top' of type `pw' (*word), so top must be
declared.  It is most efficient to declare top a `register' variable.

01: (CS or CS_TAG) 

indicates this P-word represents a term that is a constant or structure
record.  For each constant symbol and structure symbol there is a record,
called the PSC-entry for that symbol.  This record contains the arity of the
symbol, a pointer to the name of the symbol, and the length of the name of
the symbol.  (It may also contain other information depending on how the
symbol is used.  See type psc_rec in sim/sim.h.)  Every time a constant or
structure symbol is read in (or otherwise created), the same PSC record is
found (by hashing) and that record is used in lieu of the character string.
This is similar to the obj-list in Lisp implementations.  A P-word with a CS
tag is a pointer to a pointer to the PSC-entry for the constant or structure
represented.  If the symbol represented by a P-word is a constant, then
(almost) all the CS P-words for that constant point to the same pointer to
that constant's PSC-entry.  (For the interested, it is the pointer used in
the hash chain.)  If the symbol is a structure symbol (i.e.  with arity Arity
> 0), then this CS P-word represents a term with the structure symbol as its
main structure symbol.  In this case the CS P-word points to Arity+1 words on
the heap: the first word is a pointer to the PSC-entry for the structure
symbol; the next Arity words are P-words for the Arity arguments of this
term.

There are several macros which can be used to access these records. Given a
P-word op, get_str_psc(op) returns a pointer to the PSC-entry for this
record; get_str_arity(op) returns the arity of the symbol; get_str_length(op)
returns the length of the name of the symbol.

110: (INT)

indicates this P-word represents a 29-bit signed integer. The integer is
obtained by (arithmetically) shifting the P-word three bits to the right.

010: (FLOAT)

indicates this P-word represents a floating-point number.  The representation
of floats is as follows: bits 0-2: tag (010); bits 3-7:  absolute value of
exponent; bits 8-29: absolute value of mantissa; bit 30:  sign of exponent
(1: negative); bit 31: sign of mantissa (1: negative).

The following macros (see sim/aux.h) allow playing with numeric P-words:

	intval(op)	returns the integer value represented by the P-word op
			(op must be a integer P-word.)

	floatval(op)	returns the float value represented by the P-word op.
			(op must be a float P-word.) (This is currently a
			 C function, not a macro.)

	numval(op)	returns the numeric value represented by the P-word
			op (op must be an integer or float P-word.)

	makeint(val) 	takes an integer val and returns the P-word to represent it.
	
	makefloat(val)	takes a float val and returns the P-word to represent it.
			(This is currently a C function, not a macro.)

	isinteger(op)	true if op is an integer P-word.
	
	isfloat(op)	true if op is a float P-word.
	
	isnum(op)	true is op is a numeric P-word, i.e. integer or float.

11: (LIST)

Since lists are so common in Prolog programs, a special representation is
used for them to conserve space (and time). A P-word with a list tag is a
pointer to a pair of words on the heap, that make up a cons node (or `.'
record structure in Prolog terms). These two words are the P-words for the
car (first) and cdr (second) fields of the cons (`.') record. One way to view
it is that a LIST P-word is similar to a CS P-word that points to a structure
record, except that the structure symbol is not represented in the record,
but in the pointer to it. This is just an implementation device to try to
save space. It has no semantic effect (and makes it harder to implement such
predicates as `functor', `arg', etc. because they now have to special-case
lists.)


General macros for manipulating tags:

There are two general macros which turn off tag bits: untag(op) which sets
the tag bits of op to 00, and untagged(op) which is a function that returns
the untagged version of op and leaves op unchanged.  There are also several
macros which can be used to test the type of a DEREFERENCED P-word:

	isatom(op) is true if op represents a symbolic constant (not integer).
	isnonvar(op) is true if op is NOT a free variable.
	isfree(op) is true if op IS a free variable.
	isinteger(op) is true if op represents an integer
	isconstr(op) is true if op is a constant or a structure-term
	islist(op) is true if op represents a list
	isnil(op) is true if op is the P-word that represents the constant [].

Switch on type of P-word:

Given a P-word, it is sometimes required to take different actions depending
on the tag.  Since the P-word must first be dereferenced, this is not JUST a
simple switch construct.  The skeleton below is the most efficient way (we
have thus far found) to handle such a case:

    newlabel: switch (op & TAGMASK) {
	case FREE:
	    nderef(op, newlabel);
	    { code to handle free variable }
	    break;
	case CS:
	    { code to handle constant or structure }
	    break;
	case NUM:
	    { code to handle number }
	    break;
	case LIST:
	    { code to handle list }
	    break;
	}

In this code op contains a P-word, newlabel is a label to branch back to if
dereferencing is necessary. (nderef also requires that `top' be declared.)

General Unification:

There is a general unification routine called unify. It is passed two P-words
and unifies them (NOT doing occur-check). It changes the global data
structures, binding variables and trailing such bindings as necessary. It
returns true if the terms unify and false if they do not. It is the
responsibility of the caller to perform the Fail if unify returns false. So
the call to unify usually occurs as:

	if (!unify(op1,op2)) {Fail0;}

Fail0 is a macro which sets the address of the next instruction to be that of
a fail instruction.  Fail0 is used in builtins and requires the brackets.
(There is also a Fail1 macro which directly changes the register version of
the program counter and is used only in main.)

Registers:

The Prolog engine has a set of registers.  Parameters are passed to invoked
procedures by putting the P-words of the N arguments in registers 1 - N.  The
registers are kept in the C simulator in an array of words called reg.
(Various machinations are done to allow fast access to this array; such as
putting its address in a register variable and accessing by displacement.
This is done only in the main routine.)  It should be noted that a register
will never contain a free variable (which would be represented by a pointer
to itself); it may contain any other form of P-word.  For writing builtins,
the macro gregc(i) is provided which returns the P-word in register i.  A
common sequence of instructions at the beginning of a builtin is:

    register word op1;
    register pw top;
    num int;

	op1 = gregc(1); /* P-word from register 1 into op1 */
	deref(op1);	/* dereference that P-word */
	num = numval(op1);	/* e.g., we know it must be a number, so get
					its value */

.