	     Open Fabrics Enterprise Distribution (OFED)
		    IPoIB in OFED 1.3 Release Notes
			  
			   February 2008


===============================================================================
Table of Contents
===============================================================================
1. Overview
2. New Features
3. Known Issues
4. DHCP Support of IPoIB
5. The ib-bonding driver
6. Bug Fixes and Enhancements Since OFED 1.2
6. Bug Fixes and Enhancements Since OFED 1.3

===============================================================================
1. Overview
===============================================================================
IPoIB is a network driver implementation that enables transmitting IP and ARP
protocol packets over an InfiniBand UD channel. The implementation conforms to
the relevant IETF working group's RFCs (http://www.ietf.org).


===============================================================================
2. New Features
===============================================================================
IPoIB now supports "connected mode" (RFC 4755). IPOIB CM is enabled by default
on hardware that supports the SRQ optional feature (mthca, ipath).
The maximum MTU for connected mode has been increased to 65520. By default, MTU
will be configured to this maximum value.

1. IPoIB will accept incoming connected mode connections unless connected mode
   is disabled at compile time.
2. IPoIB will use connected mode for all outgoing traffic to unicast destina-
   tions that support connected mode if and only if connected mode is enabled
   at run time.
3. For destinations that do not support connected mode, IPoIB will fall back on
   datagram mode.
4. For multicast traffic, IPoIB always uses datagram mode.
5. This version adds support for CM mode for HCAs which do not support Infiniband
   Shared Receive Qeueus (SRQ). The detection is done automatically at run time. In
   this case, the driver uses RC connections between the peers for transferring IPOIB
   traffic.

   Even if the HCA does support SRQ, the administrator can force the driver to not
   use SRQs. This can be done by setting the module parameter "set_nonsrq".
6. This version of ofed introduces improvements to IPOIB by cutting the CPU
   overhead in handling both send and receive packets. This will improve operation
   in different modes of operation but the change is mostly noticeable for small
   UDP message were message rate has increased.

   Stateless offload features have been introduced:
   checksum offload - generate on send and verify on receive of packet checksum
   Large Send Offload - support fragmenting a large quantity of data paseed form the
   TCP/IP stack and fragment it to MSS sized packets.
7. Support for interrupt moderation was introduced. This is done for HCAs which
   support this feature (like Mellanox's mlx4). Controlling moderation params is
   done through the ethtool interface.

   ethtool -c ib0
   will display the parameters for ib0. Two kinds of parameters are supported:
   rx-usecs, tx_usecs and rx-frames, tx_frames - value for rx and tx are identical.
   To change a value you need to do:
   ethtool -C ib0 rx-usecs <new val>
   This will change both rx_usecs and tx_usecs for ib0.

   The semantics of the parameters is this: an EQE will be pushed to the event queue
   if any of the conditions have been met - *-usecs have elapsed or *-frames were
   received since the last time an EQE was generated. Using this helps reduce
   interrupt rate generated by the device.
8. 4K mtu support has been introduced. In order to benefit from using 4K mtu it is
   required that the SM create the broadcast group with 4K MTU. See SM
   documentation for details.


Usage and configuration:
========================
1. To check the current mode used for outgoing connections, enter:
   cat /sys/class/net/ib0/mode
2. To disable IPoIB CM at compile time, enter:
   cd OFED-1.3
   export OFA_KERNEL_PARAMS="--without-ipoib-cm"
   ./install.sh
3. To change the run-time configuration for IPoIB, enter:
   edit /etc/infiniband/openib.conf, change the following parameters:
   # Enable IPoIB Connected Mode
   SET_IPOIB_CM=yes
   # Set IPoIB MTU
   IPOIB_MTU=65520

4. You can also change the mode and MTU for a specific interface manually.
   
   To enable connected mode for interface ib0, enter:
   echo connected > /sys/class/net/ib0/mode
   
   To increase MTU, enter:
   ifconfig ib0 mtu 65520

5. Switching between CM and UD mode can be done in runtime:
   echo datagram > /sys/class/net/ib0/mode sets the mode of ib0 to UD
   echo connected > /sys/class/net/ib0/mode sets the mode ib0 to CM


===============================================================================
3. Known Issues
===============================================================================
1. If a host has multiple interfaces and (a) each interface belongs to a
   different IP subnet, (b) they all use the same InfiniBand Partition, and (c)
   they are connected to the same IB Switch, then the host violates the IP rule
   requiring different broadcast domains. Consequently, the host may build an
   incorrect ARP table.

   The correct setting of a multi-homed IPoIB host is achieved by using a
   different PKEY for each IP subnet. If a host has multiple interfaces on the
   same IP subnet, then to prevent a peer from building an incorrect ARP entry
   (neighbor) set the net.ipv4.conf.X.arp_ignore value to 1 or 2, where X
   stands for the IPoIB (non-child) interfaces (e.g., ib0, ib1, etc). This
   causes the network stack to send ARP replies only on the interface with the
   IP address specified in the ARP request:

   sysctl -w net.ipv4.conf.ib0.arp_ignore=1
   sysctl -w net.ipv4.conf.ib1.arp_ignore=1

   Or, globally,

   sysctl -w net.ipv4.conf.all.arp_ignore=1

   To learn more about the arp_ignore parameter, see Documentation/networking/ip-sysctl.txt.
   Note that distributions have the means to make kernel parameters persistent.

2. There are IPoIB alias lines in modprobe.conf which prevent stopping/
   unloading the stack (i.e., '/etc/init.d/openibd stop' will fail). 
   These alias lines cause the drivers to be loaded again by udev scripts.

   Workaround: Change modprobe.conf to set
   OFA_KERNEL_PARAMS="--without-modprobe" before running install.sh, or remove 
   the alias lines from modprobe.conf.
   
3. On SLES 10:
   The ib1 interface uses the configuration script of ib0.

   Workaround: Invoke ifup/ifdown using both the interface name and the
   configuration script name (example: ifup ib1 ib1).

4. After a hotplug event, the IPoIB interface falls back to datagram mode, and
   MTU is reduced to 2K.
   Workaround: Re-enable connected mode and increase MTU manually:
   echo connected > /sys/class/net/ib0/mode
   ifconfig ib0 mtu 65520

5. Since the IPoIB configuration files (ifcfg-ib<n>) are installed under the
   standard networking scripts location (RedHat:/etc/sysconfig/network-scripts/
   and SuSE: /etc/sysconfig/network/), the option IPOIB_LOAD=no in openib.conf
   does not prevent the loading of IPoIB on boot.

6. On RedHat EL 4 up4, the IPOIB implementation is not spec-compliant:
   - ipoib multicast does not work
   - ipoib cannot interoperate between RHEL4U4 and other hosts. This is due to
     missing code in the kernel which was available in U3 and U5 but removed in
     U4. As a workaround, upgrade to RHEL4U5.

7. If IPoIB connected mode is enabled, it uses a large MTU for connected mode
   messages and a small MTU for datagram (in particular, multicast) messages,
   and relies on path MTU discovery to adjust MTU appropriately. Packets sent
   in the window before MTU discovery automatically reduces the MTU for a
   specific destination will be dropped, producing the following message in the
   system log:
   "packet len <actual length> (> <max allowed length>) too long to send, dropping"

   To warn about this, a message is produced in the system log each time MTU is
   set to a value higher than 2K.

8. In connected mode, TCP latency for short messages is larger by approx. 1usec
   (~5%) than in datagram mode. As a workaround, use datagram mode.

9. Single-socket TCP bandwidth for kernels < 2.6.18 is lower than with
   newer kernels. We recommend kernels from 2.6.18 and up for
   best IPoIB performance.

===============================================================================
4. DHCP Support of IPoIB
===============================================================================
IPoIB is configured by default to use information obtained dynamically from a
DHCP server, at driver startup time, to configure its interfaces.

Note: To use DHCP the user must apply a special patch (see "DHCP Notes" below).

DHCP Supported Operating Systems
--------------------------------
1. SLES 10
2. RHEL 5
3. All kernels from 2.6.14 and up

DHCP Unsupported Operating Systems
----------------------------------
RedHat EL 4 distributions are supported.


DHCP Notes
----------
1. It may be required to run over different UDP ports than the well known ports
   (67 and 68). Free port numbers greater than 0x8000 must be chosen. To
   specify a server or a client port number, use the option -p <port number>.
   The client's port number must be the chosen server's port number plus one.

2. For IPoIB to use DHCP, you must patch ISC's DHCP. The patch file can be
   found under OFED-1.3/docs/dhcp after extracting the distribution file.
   (After installation it can also be found under <prefix>/docs/dhcp.) The
   patch should be applied for the server and for each client. Tests were run
   on version 3.0.4 of the DHCP package.


===============================================================================
5. The ib-bonding driver
===============================================================================
The ib-bonding driver is a High Availability solution for IPoIB interfaces. 
It is based on the Linux Ethernet Bonding Driver and was adapted to work with
IPoIB. The ib-bonding package contains a bonding driver and a utility called 
ib-bond to manage and control the driver operation. 
The ib-bonding driver comes with the ib-bonding package (run rpm -qi ib-bonding
to get the package information).

Using the ib-bonding driver
---------------------------
The ib-bonding driver can be loaded manually or automatically.

1. Manual operation:
Use the utility ib-bond to stop, query, or stop the driver. For details on this
utility, read the documentation for the ib-bonding package.

2. Automatic operation:
	a. Edit the file '/etc/infiniband/openib.conf' as follows:
			# Enable the bonding driver on startup.
			IPOIBBOND_ENABLE=yes
			# # Set bond interface names
			IPOIB_BONDS=bond0,bond8007
			# Set specific bond params; address and slaves
			bond0_IP=10.10.10.1/24
			bond0_SLAVES=ib0,ib1
			bond8007_IP=20.10.10.1
			bond1_SLAVES=ib0.8007,ib1.8007
	b. Use standard OS tools (sysconfig in SuSE and sysconfig in Redhat)
to create a configuration that will come up with network restart. For details
on this, read the documentation for the ib-bonding package.

Notes:
* The ib-bonding driver does not load when the HA service is configured to load
* If the bondX name is defined but one of bondX_SLAVES or bondX_IPs is missing,
  then that specific bond will not be created.
* The bondX name must not contain characters which are disallowed for bash
  variable names such as '.' and '-'
* Using /etc/infiniband/openib.conf to create a persistent configuration is
  depracated and not recommended. Use it unless you have no other option and
  prefer the standard way. It is not guaranteed that the first method will be
  supported in future versions of OFED.


===============================================================================
6. Bug Fixes and Enhancements Since OFED 1.2
===============================================================================
- Add interrupt moderation support for ipoib
- NAPI is available using a module parameter
- Fixed a leak in ipoib_transport_dev_init
- Fixed kernel oops in IPoIB download
- Fix a crasher bug in IPoIB CM by initializing RX before moving QP to RTR. 


===============================================================================
7. Bug Fixes and Enhancements Since OFED 1.3
===============================================================================
- Don't drop multicast sends when they can be queued
- Bug fix when copying small SKBs
