============================================================
=  BUGS and problems in the GM driver for the SGI IRIX OS  =
============================================================

  $Id: BUGS,v 1.10 2000/12/01 02:39:30 maxstern Exp $


This is a list of the known bugs and unresolved issues in
the IRIX GM driver.  They are listed approximately in
descending order of significance and impact, as perceived
by us at Myricom.

This list is current as of gm-1.4pre6.

See also the file ./drivers/irix/gm/README.todo.


No IP support
=============
The driver's IP support does not function correctly.  We
are working with SGI to resolve this problem.  Reference
SGI Bug 808660.



Can't PIO-map in 64-bit mode
============================
When using an L7/L9 card in 64-bit mode, a bus error occurs
on the driver's first reference to a pio-mapped address.

This has been accepted as an IRIX bug by SGI, and recorded
as SupportFolio Case 2079933 and Bug 798276.  The fix is
scheduled for IRIX 6.5.10.

The net effect of this problem is that L7/L9 cards must be
installed only with the BAR switch in the 32-bit position.



SGI's PCI Bridge Chip Revision C doesn't support L7/L9
======================================================
We found that we were unable to pio-map more than 2MB on
certain systems.  This mapping is required for all L7 and L9
Myrinet cards.  Research revealed that the failure was due to a
known IRIX bug, number 482741.  The problem lies in the bridge
chip.  Chips with revision level 'C' or earlier have the bug,
whereas chips with revision levels 'D' or later work OK.

If you encounter this problem, the symptom will be a WARNING
message at driver initialization time, and the Myrinet card(s)
using the problem PCI bridge will not come up.  For additional
diagnostic information, we have provided a utility program, in
the tests directory, called bridgeversion, which will report
the revision level of a given bridge (identified by a hwgraph
vertex named "controller").



No support for O2 platform
==========================
We have been unable to port GM to the O2 platform.  We ran
into various problems, mostly related to endian-ness.  We
will raise the priority of this port if demand develops
from users / customers.



Driver does not support unloading and reloading
===============================================
The driver fails when unload and reload is attempted.  The
most important impact of this limitation is that the system
must be rebooted when installing a new driver, which is
not acceptable for some production situations.



Mapper assigns unqualified node name
====================================
When the node name is provided by the driver, the mapper
assigns an unqualified name, e.g. "octane" instead of
"octane.myri.com".  This is an inconvenience.  The fix
is known, but has not yet been implemented.



GM Warnings about sv_broadcast()
================================
During driver operation which appears otherwise normal, the
following diagnostic appears frequently on the console:

  gm: WARNING: sv_broadcast() woke 0 processes; 1 process was expected

It is not clear what the import of this warning is; do we have
a problem in our sleep/interrupt/wake algorithm, or is this a
normal situation?  If the former, we need to clean up the
algorithm, since there is likely a performance impact.  If the
latter, we need to pull the warning (make it a PRINT_LEVEL_9
debugging diagnostic, instead of unconditionally issuing it).



"Device not equipped" diagnostic
================================
Every time the system reboots, the following diagnostic
appears on the console:

   lboot:WARNING:INCLUDE: myrigm; device not equipped


This diagnostic does not appear to be announcing a
substantive problem, but we have been unable to determine
what it means.  We would prefer not to release the driver
with this diagnostic.  SGI has been unable to explain this
diagnostic.



Fragmentary messages in the syslog
==================================
Fragments of gm internal diagnostics appear in the syslog.
This is due to incorrectly formatted cmn_err() calls.  The
fix will be implemented when we get a chance.


device_inventory_get_next() fails
=================================
The driver has debugging code which relies on the system
function device_inventory_get_next().  This call has never
succeeded on any platform.

SGI has been unable to help me solve this problem, other
than to suggest using an alternate method.  I have not
tried the alternate method yet.

The impact of this problem is limited to Myricom's developers
(i.e., Max Stern).


Compile errors in the make
==========================
After doing autoconf and configure for the first time, the
gmake may fail with diagnostics like this:

   cc-1020 cc: ERROR File = libgm/gm_perror.c, Line = 58
   The identifier "stderr" is undefined.


The root cause of this problem is that configure decided
that the IRIX system does not support STDC header files.
The source of this incorrect decision on configure's part
is diagnostics from the compile that it does to test for
these header files.  We have received such spurious
diagnostics when the MIPSpro C compiler license has
expired.

The easy workaround is to edit config.cache, changing

   ac_cv_header_stdc=${ac_cv_header_stdc=no}

to

   ac_cv_header_stdc=${ac_cv_header_stdc=yes}



Kernel PANIC after unsuccessful gm_open
=======================================
User program (e.g. gm_board_info) crashes with bus error
if port open fails (e.g., because LANai has shut down).

Problem seems to be that open failure is not passed back
to user program.  See traces in ~maxstern/irix/Prob01_017/.

The following scenario is probably a manifestation of the
same problem:

   gm_allsize fails when the board it is trying to use
   can not be opened because it did not initialize
   successfully (e.g. because of PIO-map falure, see above).
   After this, a successful test (e.g., gm_board_info) causes
   the kernel to PANIC at close time.  Various conditions
   have been reported as the cause of the PANIC, including:
   negative inode reference count, and SEGV fault.

A workaround for the above scenario is to be sure to run
gm_allsize using the -B option to point to a running board.

This problem _MAY_ have been fixed since it was originally
noted; we have not seen it for a long time.
