# Collapse OS usage guide

If you already know Forth, start here. Otherwise, read
doc/primer first.

We begin with a few oddities in Collapse OS compared to tradi-
tional forths, then cover higher level operations.

# Comments

Both () and \ comments are supported. The word "(" begins a
comments and ends it when it reads a ")" word. It needs to be a
word, that is, surrounded by whitespaces. "\" comments the rest
of the line.

# Cell size and memory map

Cell size is hardcoded to 16-bit. Endian-ness is arch-dependent
and core words dealing with words will read-write according to
native endian-ness.

Memory is filled by 4 main zones:

1. Boot binary: the binary that has to be present in memory at
   boot time. When it is, jump to the first address of this bin-
   ary to boot Collapse OS. This code is designed to be able to
   run from ROM: nothing is ever written there.
2. Work RAM: As much space as possible is given to this zone.
   This is where HERE begins.
3. SYSVARS: Hardcoded memory offsets where the core system
   stores its things. It's $80 bytes in size. If drivers need
   more memory, it's bigger. See doc/impl for details.
4. PS+RS: Typically around $100 bytes in size. Their implemen-
   tation is entirely arch-specific. Overflows aren't checked,
   PS underflows are checked through SCNT.

Unless there are arch-related constraints, these zones are
placed in that order (boot binary at addr 0, PSP at $ffff).

# Number Literals

Whenever a word is parsed in the interpreter loop, we first try
parsing the word as a number literal. There are 3 literal types.

1. A 100% digits number is parsed as a decimal (12345).
2. A string starting with $ is parsed as hexadecimal ($ab12).
3. A character inside quotes is parsed as that character ('A').

# Strings and lines

Strings in Collapse OS are an array of characters in memory
associated with a length. There are no termination.

This length, when refering to that string in the different
string handling words, is usually passed around as a separate
argument in PS. It is common to see "sa sl", "sa" being the
string's address, "sl" being its length.

How that "sl" is encoded depends on the situation. For example,
the LIT" word, which writes the enclosed string and, at runtime,
yields "sa sl", is wrapped around a branch word (so that the
string isn't evaluated by forth) followed by 2 number literals.

When we refer to a "line", it's a string that is of size LNSZ,
a constant that is always 64. It corresponds to the size of the
input buffer and to the size of a line in a Block (16 lines per
block).

Because those lines have a fixed length, we sometimes want to
know the length of the actual content in it (for example, to
EMIT it). When we do so, for example in LNLEN, we go through the
whole line and check when is that last visible character, that
is, the last one that is higher than $20 (space). That's where
our line ends.

We don't use any termination character for lines, it's too
messy.  Blocks might not have them, and when we want to display
lines in a visual mode (that is, always the full 64 characters
on the screen), we need complicated CR handling. It's simpler
to fill lines in blocks with spaces all the way.

# Signed-ness

For simplicity purposes, numbers are generally considered
unsigned. For convenience, decimal parsing and formatting
support the "-" prefix, but under the hood, it's all unsigned.

This leads to some oddities. For example, "-1 0 <" is false.
To compare whether something is negative, use the "0<" word
which is the equivalent to "$7fff >".

# Branching

Branching in Collapse OS is limited to 8-bit. This represents
64 word references (or a bit less if there are literals and
branches) forward or backward. While this might seem a bit tight
at first, having this limit saves us a non-negligible amount of
resource usage.

The reasoning behind this intentional limit is that huge
branches are generally an indicator that a logic ought to be
simplified. So here's one more constraint for you to help you
towards simplicity.

# Interpreter and I/Os

Collapse OS' main I/O loop is line-based. INTERPRET calls WORD
which then iterates over the current "input buffer" (INBUF) for
characters to eat up. That input buffer is a 64 characters space
in SYSVARS where typed characters are buffered from KEY, but
that's not always the case.

During a LOAD, the input buffer pointer changes and points to
one of the 16 lines of the BLK buffer. WORD eats it up just the
same, but it ain't coming from KEY anymore. When the 16th line
is read, we come back to the regular program.

Back to KEY. It always yields a characters, which means it
blocks until it yields. It loops over KEY? which returns a
flag telling us whether a key is pressed, and if there is one,
the character itself.

KEY? is an alias which points to a driver implementing this
routine. It can also be overridden at runtime for nice tricks.
For example, if you want to control your computer from RS-232,
you can do "' RX<? 'KEY? !".

Interpreter output is unbuffered and only has EMIT. This word
can also be overriden, mostly as a companion to the *raison
d'etre* of your KEY? override.

# Interpreting and compiling words

When the INTERPRET loop reads from INBUF, it separates its input
in words which yields chunks of characters.

Whenever we have a word, we begin by checking if it's a number
literal with PARSE. If yes, push it on the stack and get next
word. Otherwise, check if the word exists in the dictionary.
If yes, EXECUTE. Otherwise, it's a "word not found" error.

Compiling words with ":" follows the same logic, except that
instead of putting literals on the stack, it compiles them with
LITN and instead of executing words, it writes their address
down (except immediates, which are executed).

This "PARSE then FIND" order is the opposite of many traditional
Forths, which generally go the other way around. This is because
traditional forths often don't have hexadecimal prefixes for
their literals and the "PARSE then FIND" order would prevent the
creation of words like "face", "beef", cafe", etc. This is not
a problem we have in Collapse OS.

"PARSE then FIND" is faster because it saves us a dictionary
lookup when parsing a literal.

# Native words

Native words are regular forth words wrapping binary executable
code.

With the proper assembler loaded in memory, you can compile
words that directly execute native code. Here's a z80 example:

CODE foo BC PUSH, BC 42 LDdi, ;CODE

See doc/asm/intro for more details.

# VALUE, TO, CONSTANT

Cell access with @ becomes heavy in cases where a cell is read
at many places in the code and seldom written to. It is also
inefficient.

Collapse OS has a special "value" word type which is very
similar to a cell, but instead of pushing the cell's address to
PS, it reads the value at that address and pushes it to PS in
a much faster and lighter way than "MYVAR @". You create such
word with VALUE:

42 VALUE FOO
FOO . \ prints 42

Modifying that value is a bit less straightforward than with
a regular cell, but can be done with TO:

43 TO FOO
FOO . \ prints 43

To set a value in a compiled word, use [TO] instead of TO.

There's an additional word that facilitates the declaration of
multiple values: VALUES. You call it with the number of values
to declare an then type down their name, like this:

3 VALUES FOO BAR BAZ

All values are initialized to 0.

If you don't need to modify your value, it's better to use
CONSTANT instead. It's much faster because it spits native code
to push that value to PS directly. It's faster than a literal.

42 CONSTANT foo
2 CONSTS 43 bar 44 baz

# Aliases

Sometimes, often for fulfilling protocols, we want to "plug" a
word into another, for example, we want FOO and BAR to mean the
same thing. Of course, you can do ": BAR FOO ;", but this
represents an annoying overhead, both in terms of speed and RS
space. In this case, you'll want to create an alias like this:

ALIAS FOO BAR

Which means "make BAR point to FOO". This generates a native
jump which is pretty much as low overhead as it can be.

Those aliases are read-only. Once created, they can't be
changed. If you want to use a word as an indirection, you need
to use execute like this:

: FOO ;
' FOO VALUE 'BAR
: BAR 'BAR EXECUTE ; \ BAR executes FOO
: BAZ ;
' BAZ TO 'BAR \ BAR EXECUTES BAZ

# System aliases

Core words have 2 special aliases, which jump to an address
determined in their corresponding SYSVAR. These are EMIT and
KEY?.

Each of these system aliases have their corresponding "'" SYSVAR
address CONSTANT. You go through them to modify where the alias
jumps to. Example:

' RX<? 'KEY? !
' TX> 'EMIT !

# System values

Most SYSVARS described in doc/impl have a CONSTANT corresponding
to their absolute address. For example, you get the value of
"NL" with "NL @" and set it with "NL !".

Some SYSVARS are very often used and necessitate faster access.
These SYSVARS are split in 2 words: the accessor and the
address. For example, we have HERE and 'HERE. HERE returns
HERE's value directly and 'HERE returns HERE's address.
Therefore, you get HERE with "HERE" and set it with "'HERE !".

The list of such SYSVARS is:

HERE CURRENT IN( IN>

# BEGIN..NEXT

Most traditional Forths have DO..LOOP, Collapse OS has BEGIN..
NEXT. It only stores one number on RS instead of 2. It's a
number that is decremented at each NEXT and the loop exits when
that number is zero.

The initial value for this loop counter must be manually placed
on RS. Example: 42 >R BEGIN NEXT.

# The A register

The A register is an out of stack temporary value that often
helps minimize stack juggling. Its location is arch- dependent,
but it's often in SYSVARS. On register-rich CPUs, it's a
register.

Access to it is fast, but its downside is that words using it
must be careful not to use words that also use the A register.
doc/dict indicate such words with *A*.

# Dealing with performance bottlenecks

Because Collapse OS runs on multiple CPUs, dealing with bottle-
necks is a bit tricky. We want to avoid, in arch-independant
application code (VE, ME, assemblers, emulators), to maintain
bottleneck words in all supported architectures.

The way we deal with this situation is by declaring bottleneck
words as "back-overridable" with the word ?: (instead of :).

This word creates a new word only if the specified name doesn't
already exist in the dictionary. With this, what you can do is
optionally load "speedup words" for your arch, and then load
your app. Your sped-up version will superseed the default, slow
version and your bottlenecks will be faster. Example:

\ My super app
?: slowstuff ( ... ) ;
: myapp ( ... ) slowstuff ( ... ) ;

\ My arch-specific speedup
CODE slowstuff ( ... ) ;CODE

If you load the app without loading speedups, "slowstuff" will
be slow, but will work under all arches. If you load your
speedups first, then the forth version of "slowstuff" will never
be created and "myapp" will refer to the fast "slowstuff"
instead.

# Mass storage through disk blocks

Collapse OS can access mass storage through its BLK subsystem.
See doc/blk for more information.

# Useful little words

In Collapse OS, we try to include as few words as possible into
the cross-compiled core, making it minimally functional for
reaching its design goals.

However, in its source code, it has a section of what is called
"Useful little words" at B120 and you'll probably want to load
some of them quite regularly because they make the system more
usable.

# Contexts

B122 provides the word "context" allowing multiple dictionaries
to exist concurrently. This allows you to develop applications
without having to worry too much about name clashes because
those names exist in separate namespaces.

A context is created with a name like this:

context foo \ creates context "foo"

When a context is created, it is "branched off" CURRENT as it
was at the moment the context was created.

To activate a context, call its name (in the case, "foo"). This
will do two things:

1. Save CURRENT in the previously active context.
2. Restore CURRENT to where it was the last time "foo" was
   active (or created).

Note that creating a context doesn't automatically activate it.

# DOER and DOES>

In traditional forths, DOES> is often used with CREATE. Not in
Collapse OS. To use the DOES> word, you must pair it with DOER.
See doc/primer for details.

# Code generation

The kernel has 3 words that generate native code and although
they're there as support for define words (:, CONSTANT, etc.),
they can be used for interesting thing.

These words are JMPi! CALLi! and i>! and have the same signature
of "n a -- len".

For example, let's say that you're debugging the kernel and want
to ruthlessly patch a word with another behavior you're trying
out. You could do:

' newword ' wordtopatch JMPi! DROP

And poof! wordtopatch is now an alias to newword.