ITEM: I3245L

Need help tuning TCP/IP performance on Serial Optical and Ethernet



Question:

I have a RISC/System/6000 model 560 with a 128 port adapter and
numerous ttys.  I also have a model 580.  The 560 and the 580 are
connected with Serial Optical Channel Connection.  The 580 is also
attatched to an AS/400 and a PS/2 via ethernet.  I am running tcpip on
both networks.  I would like to tune my network to effectively use the
S.O.C.C. and the ethernet.

On the 580, at 8:49 am, netstat -m shows 

777 mbufs in use
45/83 mapped pages in use
526 KBytes allocated to network (71% in use)

lowclust was 28,         we set to 75
lowmbuf was 76,          we set to 150
mb_cl_hiwat was 56,      we set to 175
thewall was 12288,       we left it as is (12MB, default is 2MB)

lowclust and lowmbuf are the number of clusters (4KB), and the number
of mbufs (256B), that the system tries to keep free and available for
network traffic.  When a burst of network activity comes, and these
buffers are used for data, the netm process is called to allocate more
memory for networking from the virtual memory manager (VMM).  As network
traffic subsides, buffers are freed up, and when free clusters reaches
mb_cl_hiwat, the netm process is called to give memory back to VMM.

mb_cl_hiwat should be at least twice lowclust.  We do not want to 
allocate memory for networking, be quick to give it back, only to 
need it for networking again with the next burst of network traffic.

when lowclust is raised, lowmbuf should be raised by at least the
same amount, because each cluster is pointed to by an mbuf.

It is appropriate to raise the value of lowmbuf, lowclust, mb_cl_hiwat,
because some additional changes we make to other networking options
(set with the no command) will tend to "open up" bandwidth and buffer
consumption by the SOCC.

Application Layer tuning

SOCC has proven to provide the best throughput when the application
uses a write buffer size of 28,672 bytes.  You were not sure if the
buffering in the Barnett application was tunable, and you plan to
check.  If 28KB is not possible, you should try to use a buffer size
that is as close to a multiple of 4KB as possible.  This can cut down
fragmentation within TCP that can occur with large MTU interfaces
like SOCC.  The default MTU for SOCC is 61,428 bytes, which provides
best throughput.  To check MTU size, do a "smitty chif", and pick the
interface to check MTU on.

Socket Layer tuning

If application write buffer size is relatively small, and far from a
multiple of 4KB (e.g., 1KB), TCP internal fragmentation can occur.  TCP
will place each 1KB buffer into a 4KB cluster.  With a large MTU, TCP
will try to fill the MTU size before passing the data to the IP layer.
But after allocating 16 clusters, holding 16KB of data, you may well
hit the maximum socket buffer size (default sb_max is 64KB, which is
sixteen 4KB clusters).  But TCP will not send this data yet, because
the 16KB you have "ready to go" does not hit the threshold of half the
window size.  (Default window size, or tcp_sendspace / tcp_recvspace,
is 16384 bytes, but best throughput on SOCC is optained with a value
of 57344 bytes.).

The answer is to increase sb_max, perhaps using a formula like

sb_max = 4096 * (window size / write buffer size)

This will allow all processes using TCP/IP to allocate more socket
buffer space if they have data to fill it.  This buffer space comes
from VMM, so paging should be reviewed after these changes.  Maximum
throughput on SOCC requires enough buffering at each end, but this has
to be balanced against memory needs of local users and their applications.

We decided to increase your sb_max from 65536 to 262144, based on your
thinking that the application write buffer size was 1024.

TCP Layer tuning

We increased window size by changing

tcp_sendspace was 16384, we set to 57344
tcp_recvspace was 16384, we set to 57344

Other miscellaneaous checks

No problems noted in output from netstat -v, specifically max transmits
queued, max receives queued were 0-2, which says the transmit and receive
queue sizes on ethernet and SOCC device drivers are large enough.

We also set the option rfc1323=1 for the no command.  This will allow
the machine to use rfc1323 enhancements for TCP/IP, which are specifically
for high bandwidth interfaces.  The enhancements include window scaling,
to use larger windows where possible (i.e., on high bandwidth interfaces,
between machines that successfully negotiate the use of rfc 1323.  Likely
to occur between your 560 and 580 over SOCC, not likely to be negotiated
between your 580 and AS/400 over ethernet.).  Also included in rfc 1323
are fast retransmit and fast recovery algorithms.  These enhancements
allow the system to recover from one packet loss per window, without
transmission timeout, drain of the pipeline, and a "slow" restart
as the pipeline is refilled.  rfc1323 is a new no command option in
AIX 3.2.5.


Summary

Application buffer size of 28672 bytes (or some multiple of 4096 bytes),
socket send and receive buffer sizes of 57344 bytes, and default MTU of
61428 bytes give best throughput on SOCC.

Follow up

Review network performance again during peek times, as well as paging 
rates.  Generally speaking, paging rates should be 5 pages per second or
below, for best performance.  

Also, at the IP layer, you might check if the IP input queue is being
overrun.  You can check this with the crash command

\# crash
> knlist ipintrq
     ipintrq: 0x0149ba68  (add hex 10 to the value returned for ipintrq,
                           and use it for input with the od subcommand)
> od 0149ba78 1
0149ba78: 00000000        (if value returned is greater than zero, over-
                           flows have occured)
> quit

You can increase the IP input queue by setting the ipqmaxlen option of
the no command.  Default is 50.

Response:

Checked back with you about 2:10pm, netstat -m shows 

45/160 mapped pages in use
847 Kbytes allocated to network (45% in use)

so with the settings we modified, we have given you a little more
"headroom" for networking.  But with only 847K of 512MB allocated
to network, even with the tuning we did, TCP/IP is probably not the
major consumer of resource in your system.  You mentioned that you 
hadn't had any performance complaints from users, but paging levels
still seemed somewhat high, typically 20, 24, 11, 29, 2, 2, 12, 7
pages per second.

Further performance analysis suggests you are memory constrained, 
which you plan to remedy with a memory upgrade.



Support Line: Need help tuning TCP/IP performance on Serial Optical and Ethernet ITEM: I3245L
Dated: April 1994 Category: N/A
This HTML file was generated 99/06/24~13:30:45
Comments or suggestions? Contact us