
The output should be a matrix that looks like this:

    6.0000   18.0000  -12.0000    3.0000
    0.6667   -9.0000   10.0000    2.0000
    0.5000   -0.8889   24.8889    1.2778
    0.3333    0.2222   -0.0089    0.5670

The pivot permutation (terminology?) is computed, but not printed.  This could
easily be added to the output node if desired.

The equivalent sequential algorithm is:

    for(j = 0; j < n; j += nb) {                        /* loop */
        pivot(j, n, nb, a, nca, piv);
        for(l = 0; l <= ((n/nb-j/nb)-1)-1; l++) {       /* fanout */
            int jl;

            jl = (l+1)*nb+j;
            trisolve(nb, nb, &A(j,j), nca, &A(j,jl), nca);
        }
        for(q = 0; q <= ((n/nb-j/nb)-1)-1; q++) {       /* fanout */
            for(r = 0; r <= ((n/nb-j/nb)-1)-1; r++) {   /* fanout */
                int jq, jr;

                jq = (q+1)*nb+j;
                jr = (r+1)*nb+j;
                update2(nb, nb, nb, &A(jq,j),nca, &A(j,jr),nca, &A(jq,jr),nca);
            }
        }
    }
