Advanced EXEC and CPU Issues

As we have seen, the pace of technology changes are getting faster all the
time.  It is hoped that the Amiga and its applications will be able to keep
up with these changes as best as possible.  This however means that changes
will need to take place in both the Amiga's OS and in application software.

With the release of V37 and V39 EXEC, a number of new concepts have been
advanced to a stage from which a number of new technologies can be lauched.
In fact, a number of the functions in V37 EXEC are required in order to make
the system work with 68040 CPUs in a consistant manner.  (Namely the
CacheControl(), CachePreDMA(), CachePostDMA(), CacheClearU(), and
CacheClearE() calls)  With V39, the addition of private memory pools and the
clean up of the memory allocation routines has set the ground work for some
more advanced memory systems.

New for V39

For V39, I had some time to clean up some of the areas of EXEC that could
not be fixed in some external manner.  The major changes were in the
semaphores, memory subsystems, and the ROM debugger.

Semaphores are the key to making a multitasking system work as one system.
EXEC has a very nice set of semaphore functions that are called
SignalSemaphores.  Before V39, these semaphores could only be used and
accessed in a synchronous manner.  That is, you made a function call
that would block until you had obtained the semaphore.  While this usually
ends up being enough for most people, there are times when software may
need to "bid" for a semaphore and go and do something until it obtains it.
EXEC already had this concept in the Procure() and Vacate() functions but
these functions were both broken and somewhat useless due to the fact that
they worked with a different semaphore structure than the SignalSemaphores.
As it turns out, I was able to reuse these two function calls and they
now, when used with SignalSemaphores, work and are useful in that the
same semaphore may be both synchronously and asyncronously obtained.
See the AutoDocs on Procure() and Vacate() for more information as to
how this works.

One of the features of the Amiga has always been the dynamic nature of its
use of memory.  This has also been one of the trickiest parts of good Amiga
programming.  Applications would like to dynamically use memory and release
it but with more than one running at the same time, memory got fragmented
and performance suffered.  For V39, two different parts were added to the
memory subsystem; pools and memory handlers.

Memory pools are a way to help combat memory fragmentation, increase the
speed of the system, and give a simple way to keep associated memory
together.  Due to the fact that memory pools are "private" to the
application, a number of performance benefits are obtained (including not
neededing to go into Forbid() during allocation or deallocation from the
pool).  Since pools give the system a simple way to keep your allocations
together it also gives the system a simple way to release your allocations
by just destrying the pool as a whole.  Also, the design was left back-box
such that future system enhancements can be made to them without too many
problems.  (A very important point here is that pools will be the memory
interface of choice in the future.)  For more information on the memory
pools, see the AutoDocs and the next AmigaMail.

Memory handlers are an extension to the Amiga's physical memory manegment
system.  As you know, when a memory allocation fails, the system will
first attempt to release any resources that are no longer in use and retry
the allocation before it will truely fail the allocation.  This design
was very inovative when the Amiga first came out but it was not complete
enough to let applications cache data as long as there was enough memory
or to do other, more complex memory usage games.  The memory handler system
expands this feature and makes a number of performance problems much easier
to deal with.  (Such as resterized outline font caching or large database
RAM caching).  It also makes it possible for the caching code to know how
much and of what type of memory the currently failing allocation is for such
that it can more intelligently release memory without releasing everything.
The following is a quick overview of the design goals behind the memory
handler design.  As a side benefit to this work, most memory allocations are
over 100 cycles faster on a 68000 based machine since the overhead of RAMLIB's
SetFunction() of AllocMem() is no longer an issue.

    ------------

    Memory Handler - Quick Overview

    The basic design is a handler list that is called when a memory
    allocation fails.  The handler list (just like input.device, etc) will
    contain routines that applicationss and libraries added.

    Each handler in the list will be called in order until the memory
    allocation works or the handler list is completed.  Only after the
    handler list has been completely traversed will the allocation fail.

    The handler list is a standard exec-style list that is stored in
    priority order.

    RAMLIB, which currently SetFunctions the AllocMem() routine will no
    longer do this but rather add itself to the handler list at priority 0.
    This lets applications come before and after the RAMLIB expunge.

    -------------

    New functions:

    There will need to be two new functions in EXEC to deal with the
    handler list.  There will also be a new flag to AllocMem()

    The basic functions are:

    void AddMemHandler(struct Interrupt *)
                       a1

    void RemMemHandler(struct Interrupt *)
                       a1

    AddMemHandler() - This function will take the handler given and enqueue
    it onto the memory handler list.  Once on the list, the handler must be
    ready to be called.  This means that the handler must be ready to be
    called before this function even returns.

    RemMemHandler() - This function will remove the handler from the list.
    This function *CAN* be called while within the handler.

    A new memory flag, MEMF_NO_EXPUNGE, will be added to exec. This flag
    will cause the memory allocation attempt to fail without going through
    the memory handler.  This is usefull for caching systems that may not
    really need the memory but will take it if available and also is
    required for use within the handler such that memory could be allocated
    during the expunge cycle.  (Or at least attempted)  This flag will be
    ignored in systems where there is no memory handler.

        BITDEF  MEM,NO_EXPUNGE,31    ;AllocMem: Do not call expunge on failure

    ------------

    The MemHandler structure:

    This structure is the data passed to a MemHandler.  This structure
    is *READ ONLY*

    struct MemHandlerData
    {
    	ULONG	memh_RequestSize;	/* Size of the requested allocation */
    	ULONG	memh_RequestFlags;	/* Flags of the requested allocation */
    	ULONG	memh_Flags;		/* Flags (see below) */
    };

    The memh_RequestSize and memh_RequestFlags are the size and flags
    arguments from the AllocMem() call that failed.

    	BITDEF	MEMH,RECYCLE,0	; Recycle

    The MEMHF_RECYCLE flag is 0 if this was the first time this handler was
    called due to this allocation failure.  If this is 1, the handler
    is being called again for the same failure.  See below about handler
    return and recycling...

    ------------

    The Handler:

    The protocal for a MemHandler must be strictly followed.  Due to the
    fact that the handlers are being called on the AllocMem() context and
    the fact that AllocMem() *MUST* *NOT* break a Forbid(), the handler
    *MUST* *NOT* break a Forbid().  Another issue is stack usage.  The
    handler could be running on any task in the system that calls
    AllocMem()  For this reason, the handler must try to keep stack usage
    as low as possible.  Exact stack usage is not available, but a good
    rule would be to keep it under 128 bytes if possible.

    The handler may call AllocMem() with the new MEMF_NO_EXPUNGE flag.
    This flag is new to the exec that has the memory handler system.
    Library expunge vectors can *not* make use of this feature. This flag
    would let a handler move memory from one location to another.  For example,
    if the requested memory is for CHIP, the handler could move any of its
    CHIP allocations that it can to FAST memory (if possible) and would then
    be able to help satisfy the MEMF_CHIP request.  Also caching systems may
    wish to only cache an item if memory is available and would not want to
    have the system do an expunge just to cache this "unimportant" item.

    The handler will be called in a Forbid() state that *MUST* *NOT* be
    broken.

    A handler can RemMemHandler() itself *ONLY* if it
    returns MEM_DID_NOTHING or MEM_ALL_DONE.

    The handler code, with is in (*is_Code)() of the interrupt structure
    will be called as follows:

    a0=Pointer to (struct MemHandler)
    a1=Value from is_Data
    a2=Pointer to the Interrupt structure for this handler
    a6=ExecBase

    The handler must follow the standard rules about register usage. Only
    d0, d1, a0, and a1 may be modified, all other registers *MUST* remain
    unchanged.

    Return results:

    d0=		MEM_DID_NOTHING
    	or	MEM_ALL_DONE
    	or	MEM_TRY_AGAIN

    MEM_DID_NOTHING	If the handler could not release any resources
    		it should return with d0 set to this.

    MEM_ALL_DONE	If the handler released all of its resources, it
    		should return this in d0.

    MEM_TRY_AGAIN	If the handler released some resources in hopes
    		that it will have solved the memory problem
    		it can return with this value.  In that case,
    		EXEC will retry the allocation and if it does not
    		work, will call the handler again.  Note that
    		the handler can tell if it was already called by
    		the MEMHF_FIRST_TIME flag which will be 0 if this
    		is the first call to the handler.
    		The main use of this return value is to help
    		implement the RAMLIB handler but it could be usefull
    		for LRU caching code or caching code that
    		tries to defragment memory during expunge in order
    		to try to satisfy the allocation request.

    -------------

    RAMLIB:

    RAMLIB will, under this system, no longer setfunction the memory
    allocation routines but rather add a memory handler at priority 0.
    This handler would then be called when the allocation failed and RAMLIB
    could then call the library expunge vectors as it does today.  If
    RAMLIB wishes to continue to do the 2.0 partial expunge, that would be
    possible with the MEM_TRY_AGAIN return value.

    -------------

Another key point in the design of V39 EXEC was to provide for a low-level
debugging core that can be used to debug rather complex problems.  This
low-level debugger, the Simple Amiga Debugging kernel, SAD, replaces ROM-WACK
from pre-V39 systems.  One of the goals of SAD was to provide near emulator
level access to debugging the Amiga.  Due to some minor hardware issues,
this was not 100% implemented.  The goal was to use the unused NMI interrupt
to trap into the SAD kernel and then let have the controlling systems
talk to SAD and do whatever is needed of them.  By default, due to hardware
issues on certain Amiga models, SAD in not connected to the NMI vector.  It
is ready to be connected, but it is not.

The Simple Amiga Debugging Kernel (SAD) is a set of very simple control
routines stored in the Kickstart ROM that would let debuggers control the
Amiga's development enviroment from the outside.  These tools would make
it possible to do remote machine development/debugging via just the
on-board serial port.

This set of control routines is very simple and yet completely flexible,
thus making it possible to control the whole machine.

Technical Issues

SAD will make use of the motherboard serial port that exists in all
Amiga systems.  The connection via the serial port lets the system be
able to execute SAD without needing any of the system software up and
running. (SAD will play with the serial port directly)

With some minor changes to the Amiga hardware, an NMI-like line could
be hooked up to a pin on the serial port.  This would let external
control of the machine and would let the external controller stop the
machine no matter what state it is in.  (NMI is that way)

In order to function correctly, SAD requires the some of the EXEC
CPU control functions work and that ExecBase be valid.  Beyond that,
SAD does not require the OS to be running.


Command Overview

The basic commands needed to operate SAD are as follows:

Read and Write memory as byte, word, and longword.
Get the register frame address (contains all registers)
JSR to Address
Return to system operation  (return from interrupt)

These basic routines will let the system do whatever is needed.
Since the JSR to address and memory read/write routines can be used
to download small sections of code that could be used to do more
complex things, this basic command set is thus flexible enough
to even replace itself.

Caches will automatically be flushed as needed after each write.
(A call to CacheClearU() will be made after the write and before
the command done sequence)

Technical Command Descriptions

Since the communications with SAD is via a serial port, data formats
have been defined for minimum overhead while still giving reasonable data
reliability.  SAD will use the serial port at default 9600 baud but the
external tools can change the serial port's data rate if it wishes.  It
would need to make sure that it will be able to reconnect.  SAD sets
the baud rate to 9600 each time it is entered.  However, while within
SAD, a simple command to write a WORD to the SERPER register would
change the baud rate.  This will remain in effect until you exit and
re-enter SAD or until you change the register again.  (This can be usefull
if you need to transfer a large amount of data)

All commands have a basic format that they will follow.  All commands have
both an ACK and a completion message.

Basic command format is:

SENDER:	$AF <command byte> [<data bytes as needed by command>]

Receive:
Command ACK:  $00 <command byte>

Command Done: $1F <command byte> [<data if needed>]

Waiting: $53 $41 $44 $BF

Waiting when called from Debug():	$53 $41 $44 $3F

Waiting when in dead-end crash: 	$53 $41 $44 $21

The data sequence will be that SAD will emit a $BF and then wait for a
command. If no command is received within <2> seconds, it will emit $BF
again and loop back.  (This is the "heart beat" of SAD)  When called from
Debug() and not the NMI hook, SAD will use $3F as the "heart beat"

If SAD does not get a responce after <10> heartbeats, it will return to
the system.  (Execute an RTS or RTE as needed)  This is to prevent a full
hang.  The debugger at the other end can keep SAD happy by sending a
NO-OP command.

All I/O in SAD times out.  During the transmition of a command, if
more than 2 seconds pass between bytes of data SAD will time out
and return to the prompt.  This is mainly to help make sure that
SAD can never get into an i-loop situation.

Data Structure Issues

While executing in SAD, you may have full access to machine from the CPU
standpoint.  However, this could also be a problem.  It is important to
understand that when entered via NMI that many system lists may be in
unstable state.  (NMI can happen in the middle of the AllocMem routine
or task switch, etc)

Also, since you are doing debugging, it is up to you to determin what
operations can be done and what can not be done.  A good example is
that if you want to write a WORD or LONG that the address will need to
be even on 68000 processors.  Also, if you read or write memory that does
not exist, you may get a bus error.  Following system structures may
require that you check the pointers at each step.

When entered via Debug(), you are now running as a "task" so you will
be able to assume some things about system structures.  This means that
you are not in supervisor state and that you can assume that the
system is at least not between states.  However, remember that since
you are debugging the system, some bad code could cause data structures
to be invalid.  Again, standard debugging issues are in play.  SAD just
gives you the hooks to do whatever you need.

Note:  When SAD prompts with $BF you will be in full disable/forbid
state.  When $3F prompting, SAD will only do a Forbid().  It is possible
for you to then disable interrupts as needed.  This is done such that it
is possible to "run" the system from SAD when called with Debug().

Data Frames and the Registers

SAD generates a special data frame that can be used to read what
registers contain and to change the contents of the registers.
See the entry for GET_CONTEXT_FRAME for more details

For more information on how SAD works, please check the EXEC AutoDocs.


The Future

Now that I have talked about the major changes to EXEC for V39, we should
look into what the future may bring.  What follows is a general description
of the "vision" I have for the EXEC of the future.  They do not mean that
exactly these features or that all or only these features will be
implemented.  However, it does show you some of the directions that we hope
to be able to push the operating system.

One of the future features that is already somewhat in use is CPU-specific
support libraries.  Currently, there is a 68040.library which patches itself
into EXEC and the system to provide the functions needed to make the 68040
based Amiga work.  Future processors will also need such support libraries.
As such, a goal will be to have a different library for each of the processor
groups.  This library would take care of things like handling of the various
processor specific issues such as instruction emulation, cache control,
MMU support, and other things that are system-level and CPU specific.
(In other words, there will be a 68060.library and maybe even a 68030.library)

Along with the CPU-based extentions, the much asked for and very much
mis-understood feature of virtual memory would become an issue.  The design
(from the concept point of view) is rather far along at this point in time
and now a number of changes to the memory system will make this possible.
In order to prevent compatibility problems and other issues, the only way
to obtain vertual memory would be to obtain a private pool with the
attribute of PAGED memory.  As a side issue, since it is only via private
pools that such memory can be allocated, it may be possible to have
a form of protected pools too.

Object oriented programming has become a major "key" word in the market
today.  While much of the OOP hype is just that, there are a number of
benefits that can be had in a system that supports object oriented features.
The benefits however, are only available if there is a good base from
which the objects are built and includes the core functions that makes up
the basic OOP interfaces.

For 2.0, Jim Mackraz implemented an object oriented gadget/image system
for Intuition.  Basic Object Oriented Programming System for Intuition
or BOOPSI as we call it, was a very important improvement in the dealing
with the details of user interface building and operation.  The model was
designed with those specific goals in mind and it added a great deal to
the capability of Intuition and the user interfaces that can be built in
Intuition.

One need, however, is for objects that may not be user interface based or
have any need for those aspects.  Example objects would be a data retrieval
object or maybe a network link object or even a "thread" object.  In fact,
given the correct core set of objects, a full resource tracking system
can be built out of just having objects dispose of their parts when the
"process" or "task" object is disposed.

The object support that I envision would be very low overhead support for
runtime "linked" objects.  (Much like shared code libraries are runtime
"linked")  It would provide for both high-speed method inharitance and
complex multiple inharitance.  Disk loaded object classes and application
embedded private object classes would both be supported.

In addition, the long-awaited task-tree (child-tasking) support is once
again on the list of things to do.  This would give tasks the ability to be
notified about their children tasks (or parent task).  Some of this may
be well suited to the object-based "task" or "thread" construct.

Debugging support will also continue to get better.  Both via tools such
as Enforcer and via better support within the new constructs to check for
and report invalid operations.  The hardest part of the developing a major
application is the varification of its quality.  The system should be
able to help here.

Conclusion

So, while much work needs to be done just to keep up with other aspects of
the Amiga OS and the hardware (including new CPUs) there are a number of
key issues/features that would make for an even better system.  Some of
these are good because they are great "PR" (everyone talks about getting
VM, even the 1-floppy A1200 owner who does not even have a MMU) and others
are just very useful for system construction and simplified application
development.  (Debugging software on such a complex system can be a pain)
