AIX Tip of the Week

AIX 4.3.3 Benchmark Settings for Oracle 8.1 Siebel Server

Audience: AIX System Administrators

Date: July 15, 2001

Although AIX requires relatively minimal tuning, there are several settings that can be modified to enhance performance. Here's a list of AIX 4.3.3 settings I used for a recent "I/O intensive" database benchmark. The benchmark measured the throughput and response time of an Oracle 8.1(64 bit) database server with over 20,000 simulated Siebel 6 users on a p680 server with 24 CPU's, 64 GB memory, 4 Fibre adapters and Shark Raid 5 disk.

Standard Disclaimer: these settings are provided only as a tuning guideline. Every system is different, so "your mileage may vary."

Disclaimers aside, these settings resulted in excellent benchmark performance. The p680 server delivered 2-5 times the throughput at less than half the response time of the competitor's top end server with 54 CPUs, and 56 GB memory.

Benchmark Overview

Siebel 6 is a Customer Relationship Management application. It uses a two tier architecture that includes "fat" PC clients that run the application and a backend database server. The database server workload is mostly "read" queries.

The benchmark measured the database performance in terms of throughput and query response time. The load was generated by up to 800 simulated users running queries. The queries ranged in complexity from simple lookups to long running reports requiring non-indexed table scans and up to 16 way table joins.

The server ran AIX 4.3.3 ML7 and Oracle 8.1 (64 bit). The Oracle tables resided on a JFS filesystem. The client connections were JDBC (Java 1.1.8). The server was a p680 with 24 CPU's, 64 GB memory, 100 Mbit Ethernet, 4 Fibre adapters connected to Shark "Raid 5" disk. The Shark was configured as 8 "6+P" Raid 5 LUN's, with 32 GB Cache..

 

AIX Settings

The AIX tuning for this benchmark focused on I/O settings for multiuser database access. We were concerned with two types of database workloads: simple lookup queries and long running reporting queries.

The following tables list the AIX settings used in the benchmark. The tables are organized by CPU, memory, I/O and network settings. In a few cases, the setting affects two categories (ie memory and I/O), In these cases, my assignment to a category was arbitrary.

 

 

 

 

CPU Settings

Description

Benchmark Setting

Command Used

To Change Setting

Command Used

to View Setting

Increase the maximum number of processes allowed per user (for the 20,000+ connections)

100,000

chdev -l sys0 -a maxuproc='100000'

lsattr -El sys0

 

 

 

 

 

Memory Settings

Description

Benchmark Setting

Command Used

To Change Setting

Command Used

to View Setting

numfsbufs - increases buffers for high I/O rate

Changed in response to large "fsbufwaitcnt" count (vmtune -a). Arbitrarily set to 10x default.

930 (10x default)

vmtune -b 930

vmtune

vmtune -a

 

Memory Comments

The benchmark system had sufficient memory to avoid paging. If paging had occurred, we would have tuned memory to favor paging out permanent storage (executables, filesystem metadata) over paging out working storage (Oracle's SGA). To do so, see the "vmtune" command and the "minperm" and "maxperm" settings.

 

I/O Settings

Description

Benchmark Setting

Command Used

To Change Setting

Command Used

to View Setting

Minimize potential I/O bottlenecks by randomizing data layout and sequence (see comments).

  • LVM stripe data across Raid 5 LUNS
  • Randomize layout sequence of each LV's

smit mklv

lslv -l <lvname>

Improve sequential read performance by increasing "read ahead".

(see comments)

Maxpgahead=256

vmtune -R256

vmtune

Allocate sufficient free memory to complement "read ahead setting":

 

maxfree=minfree + maxpgahead

maxfree=736

vmtune -F 736

vmtune

Resize JFSLOG to support large filesystems. Stripe to minimize JFSLOG bottleneck

(see comments)

Size = 256 MB

Stripe across 8 disks (size=128k)

See comments

lslv -l <jfslogname>

I/O activity:

filemon

Configure and enable Asynch I/O servers for Oracle

(see comments)

Min=150

Max=300

Enable

smit chgaio

(requires reboot)

smit chgaio

Active AIO servers =>

pstat -a |grep -c aios

 

I/O Comments

Randomize data layout: reduce possible I/O contention by maximizing data spread across disks (mklv -e x). In addition, each LV should be laid out in unique sequence across the disks. This minimizes I/O bottlenecks that can occur when multiple serial reads (table scans, data loads) "tailgate" each other.

To specify a layout sequence, first create the LV specifying the layout sequence. Second create a filesystem on the "previously defined" LV. Here's an example of how to create 2 LV's across the same disks, but with different sequencing.

mkly -y lv01 -e x datavg hdisk1 hdisk2 hdisk3

mklv -y lv02 -e x datavg hdisk3 hdisk2 hdisk1

Improve sequential read by increasing "read ahead: When AIX senses a sequential read, it attempts to improve I/O performance by increasing the number of blocks it reads on the next I/O. The vmtune "maxpgahead" flag determines the number of blocks AIX can read ahead. The maxpgahead parameter must be a power of 2 (2, 4, 8, 16, …).

In our benchmark, we sized "maxpgahead" so that the maximum read ahead would roughly match the size of the RAID 5 stripe across the disks (6+P). We used the following formula to calculate "maxpgahead", then rounded up to the nearest multiple of 2:

maxpgahead Í (Raid 5 stripe size)* (#Raid-5 number of disks) / (4k/blocks )

Í (128k) * 6 disks / (4k) = 192 blocks = 256 (rounding up to nearest power of 2)

 

JFSLOG is a logical volume that is used by AIX to journal file changes. There's at least one JFSLOG in each Volume Group containing a filesystem. The JFSLOG helps maintain data integrity, but can become a bottleneck during heavy write traffic.

The JFSLOG should be tuned for filesystem size and I/O rate. The default JFSLOG is undersized for large filesystems. The sizing rule of thumb is 2 MB JFSLOG per GB of filesystem. To support 150GB of filesystems, we needed a 300 MB JFSLOG. To prevent the JFSLOG from becoming an I/O bottleneck, we "Raid 0" striped it across multiple disks.

The procedure for recreating a striped JFSLOG is below. In this example, the volume group name is "datavg", the JFSLOG is named jfslog00, and we want to "Raid 0" stripe it with 128k blocks over hdisks 6, 7, 8 and 9.

    1. Unmount all filesystems in datavg
    2. Remove the JFSLOG: rmlv jfslog00
    3. Recreate JFSLOG: mklv -y jfslog00 datavg -S128k hdisk6 hdisk7 hdisk8 hdisk9
    4. Format the JFSLOG: logform /dev/jfslog00
    5. Remount filesystems in datavg

Asynchronous I/O improves write performance by grouping acknowledgments. Applications must be written to take advantage of AIO, as is the case with Oracle. AIO tuning involves specifying the number of AIO servers, and enabling them (smit aio). We set minimum AIX at 150, based on a SupportLine's recommendation. The maximum setting of 300 is double the minimum setting.

During the benchmark, we noted the minimum setting of 150 was probably too high. The number of active AIO servers should be between the min/max setting (pstat -a | grep -c aios). Being overconfigured doesn't hurt, so we didn't change the setting.

 

Network

Description

Benchmark Setting

Command Used

To Change Setting

Command Used

to View Setting

All settings were set in advance based on experience. No network bottlenecks observed

sb_max = 1310720

tcp_sendspace=221184

tcp_recvspace=221184

no -o sb_max=1310720

no -o tcp_sendspace=

no -o tcp_recvspace=

no -a

 

 

 

All of the networks setting were changed before the start of the benchmark. The changes were based on prior experience, not bottlenecks.

Comments/Observations

System performance: the CPU averaged less than 30% utilization. Wait I/O averaged below 15%. Disk activity was equally distributed across all disks. No system bottlenecks were observed during the benchmark.

Read-only databases can have significant write content. This benchmark exhibited periods of high "write" activity, even though the queries were read-only. In order of activity, the sources of write I/O included:

    1. The JFSLOG had the highest level of write activity. The JFSLOG was 100% utilized at the end of the benchmark when Oracle closed inactive connections. (We suspect this is a "known" Oracle bug associated with excessive fsync() calls. I understand a patch is available.) Because it occurred after the benchmark, it didn't affect the results.
    2. Table joins were the second highest source of write activity. Write I/O to Oracle's temp disk peaked over 40MB/sec. However these I/O's tended to short bursts lasting 10-20 seconds.
    3. Oracle log files accounted for up to 5% of the Wait I/O. This level of activity would not be considered a system bottleneck, but might affect Oracle performance.

 

Memory requirements: The primary users of memory were Oracle's SGA and the JDBC connections. The Oracle SGA used 4-5 GB. (It could have been larger, but was constrained to match the size competitive server.) Each JDBC connection between the client and the database required roughly 3 MB of memory on the p680. The p680 ran out of memory at about 20,000 connections, the excess connections being paged to disk. The large number of connections did not significantly affect the database performance (after the excess connections were paged to disk).

Data Caching: AIX and Oracle did a good job of caching data in memory. There was almost no disk I/O with 400 active users running simple queries. Most I/O was associated mainly with writing temp files.

Because of the low disk activity, I would recommend the default 8 GB Shark cache over the 32 GB cache used in the benchmark.

Mixing simple and complex queries required more resources than if run separately. There appeared to be an interaction between simple lookup and long running queries that reduced throughput. The effect is difficult to quantify, but I'd estimate it to be on the order of 10-15% added overhead.


Bruce Spencer,
baspence@us.ibm.com