Other system documentation: Data Representations ==================== - Prolog Data Structures We describe here how the SB-Prolog system represents its data structures and the macros that help the programmer manipulate them. (The macros described here are found in sim/aux.h and types are found in sim/sim.h) A Prolog data object (a term) is always represented beginning from a single tagged word (declared in C as type `word'). The tag is kept in the LOW-ORDER two bits of the word. We will call such a tagged word a P-word (for Prolog data object word). There are four tags: 00: (FREE) indicates this P-word is a ``reference'' or a pointer. (All pointers are to four-byte boundaries, so the word itself is the address; no manipulation of the word is needed to calculate the effective address). If the address is of this word itself, then this word represents a ``free'' variable. If the address is to some other word, then the effective value of this word is obtained by looking at the P-word at that address. The process of running a list of such references until a free variable or some other (see below) P-word is obtained is called ``dereferencing''. There is a macro deref(op) which will dereference its argument (op). This macro uses a temporary variable `top' of type `pw' (*word), so top must be declared. It is most efficient to declare top a `register' variable. 01: (CS or CS_TAG) indicates this P-word represents a term that is a constant or structure record. For each constant symbol and structure symbol there is a record, called the PSC-entry for that symbol. This record contains the arity of the symbol, a pointer to the name of the symbol, and the length of the name of the symbol. (It may also contain other information depending on how the symbol is used. See type psc_rec in sim/sim.h.) Every time a constant or structure symbol is read in (or otherwise created), the same PSC record is found (by hashing) and that record is used in lieu of the character string. This is similar to the obj-list in Lisp implementations. A P-word with a CS tag is a pointer to a pointer to the PSC-entry for the constant or structure represented. If the symbol represented by a P-word is a constant, then (almost) all the CS P-words for that constant point to the same pointer to that constant's PSC-entry. (For the interested, it is the pointer used in the hash chain.) If the symbol is a structure symbol (i.e. with arity Arity > 0), then this CS P-word represents a term with the structure symbol as its main structure symbol. In this case the CS P-word points to Arity+1 words on the heap: the first word is a pointer to the PSC-entry for the structure symbol; the next Arity words are P-words for the Arity arguments of this term. There are several macros which can be used to access these records. Given a P-word op, get_str_psc(op) returns a pointer to the PSC-entry for this record; get_str_arity(op) returns the arity of the symbol; get_str_length(op) returns the length of the name of the symbol. 110: (INT) indicates this P-word represents a 29-bit signed integer. The integer is obtained by (arithmetically) shifting the P-word three bits to the right. 010: (FLOAT) indicates this P-word represents a floating-point number. The representation of floats is as follows: bits 0-2: tag (010); bits 3-7: absolute value of exponent; bits 8-29: absolute value of mantissa; bit 30: sign of exponent (1: negative); bit 31: sign of mantissa (1: negative). The following macros (see sim/aux.h) allow playing with numeric P-words: intval(op) returns the integer value represented by the P-word op (op must be a integer P-word.) floatval(op) returns the float value represented by the P-word op. (op must be a float P-word.) (This is currently a C function, not a macro.) numval(op) returns the numeric value represented by the P-word op (op must be an integer or float P-word.) makeint(val) takes an integer val and returns the P-word to represent it. makefloat(val) takes a float val and returns the P-word to represent it. (This is currently a C function, not a macro.) isinteger(op) true if op is an integer P-word. isfloat(op) true if op is a float P-word. isnum(op) true is op is a numeric P-word, i.e. integer or float. 11: (LIST) Since lists are so common in Prolog programs, a special representation is used for them to conserve space (and time). A P-word with a list tag is a pointer to a pair of words on the heap, that make up a cons node (or `.' record structure in Prolog terms). These two words are the P-words for the car (first) and cdr (second) fields of the cons (`.') record. One way to view it is that a LIST P-word is similar to a CS P-word that points to a structure record, except that the structure symbol is not represented in the record, but in the pointer to it. This is just an implementation device to try to save space. It has no semantic effect (and makes it harder to implement such predicates as `functor', `arg', etc. because they now have to special-case lists.) General macros for manipulating tags: There are two general macros which turn off tag bits: untag(op) which sets the tag bits of op to 00, and untagged(op) which is a function that returns the untagged version of op and leaves op unchanged. There are also several macros which can be used to test the type of a DEREFERENCED P-word: isatom(op) is true if op represents a symbolic constant (not integer). isnonvar(op) is true if op is NOT a free variable. isfree(op) is true if op IS a free variable. isinteger(op) is true if op represents an integer isconstr(op) is true if op is a constant or a structure-term islist(op) is true if op represents a list isnil(op) is true if op is the P-word that represents the constant []. Switch on type of P-word: Given a P-word, it is sometimes required to take different actions depending on the tag. Since the P-word must first be dereferenced, this is not JUST a simple switch construct. The skeleton below is the most efficient way (we have thus far found) to handle such a case: newlabel: switch (op & TAGMASK) { case FREE: nderef(op, newlabel); { code to handle free variable } break; case CS: { code to handle constant or structure } break; case NUM: { code to handle number } break; case LIST: { code to handle list } break; } In this code op contains a P-word, newlabel is a label to branch back to if dereferencing is necessary. (nderef also requires that `top' be declared.) General Unification: There is a general unification routine called unify. It is passed two P-words and unifies them (NOT doing occur-check). It changes the global data structures, binding variables and trailing such bindings as necessary. It returns true if the terms unify and false if they do not. It is the responsibility of the caller to perform the Fail if unify returns false. So the call to unify usually occurs as: if (!unify(op1,op2)) {Fail0;} Fail0 is a macro which sets the address of the next instruction to be that of a fail instruction. Fail0 is used in builtins and requires the brackets. (There is also a Fail1 macro which directly changes the register version of the program counter and is used only in main.) Registers: The Prolog engine has a set of registers. Parameters are passed to invoked procedures by putting the P-words of the N arguments in registers 1 - N. The registers are kept in the C simulator in an array of words called reg. (Various machinations are done to allow fast access to this array; such as putting its address in a register variable and accessing by displacement. This is done only in the main routine.) It should be noted that a register will never contain a free variable (which would be represented by a pointer to itself); it may contain any other form of P-word. For writing builtins, the macro gregc(i) is provided which returns the P-word in register i. A common sequence of instructions at the beginning of a builtin is: register word op1; register pw top; num int; op1 = gregc(1); /* P-word from register 1 into op1 */ deref(op1); /* dereference that P-word */ num = numval(op1); /* e.g., we know it must be a number, so get its value */ .