Software Related Information

ID #1017

Locating Performance Bottlenecks

Once you have brought your Dynamo system up to maximum throughput, you can look at the components of the system to determine which components are limiting factors in performance.


Monitoring System Utilization

Use a program like top (on Solaris), the Windows Performance Monitor, or a more sophisticated tool to keep track of information like:

  • CPU utilization

  • paging activity

  • disk I/O utilization

  • network I/O utilization

A well-performing site will have high CPU utilization when the site is achieving its maximum throughput and will not be doing any paging. A site with high I/O activity and low CPU utilization has some I/O bottleneck.

Bottlenecks at Low CPU Utilization

If your site has low CPU utilization when it is achieving maximum throughput, the bottleneck is likely either:

  • database limited (if database output is maxed out); see Checking for Database Bottlenecks

  • disk I/O limited (if I/O output is maxed out); see Checking for Disk I/O Bottlenecks

  • network I/O limited (if I/O output is maxed out); see Checking for Network-Limited Problems

  • database or I/O activity in a synchronized method (if database or I/O output is not maxed out); see System Resource Bottlenecks

If your site is in this situation, CPU profiling tools like OptimizeIt are not that useful. Thread dumps taken while the system is under load can give you better information. If you take a few of these, you can get a quick idea of which parts of your application are the slowest. That may help you direct your efforts to the right part of your application. You should be able to tell, for example, whether threads are waiting for a response from the database, a write to the client, or a read from a local file system. If many threads are waiting for the same resource, this is an indication of a potential bottleneck on that resource. Here is some information on what to do about resource bottlenecks for various resources:

Checking for Database Bottlenecks

If your site has low CPU utilization at maximum throughput, check whether the database is limiting performance.

  • Get a JVM thread dump (see Getting Java VM Dumps ) and examine it to see if there are many threads waiting for a response from the database (see Analyzing Java VM Dumps ).

  • Check the CPU utilization and disk I/O utilization of your database server.

  • Check the network bandwidth between the Dynamo server and the database server.

For more information about improving database performance with Dynamo, see the Repository and Database Performance chapter.

Checking for Disk I/O Bottlenecks

Make sure that your JVM really is waiting for file I/O, not paging activity. Check for paging with your operating system's monitoring tools.

If the source of slow performance is file I/O, it will show up in JVM thread dumps. See Bottlenecks and Deadlocks . The cause could be either some application-specific code that you have, or else the file I/O that Dynamo does itself. One common place Dynamo does file I/O on each request is using the FileCache . You can check the status of your FileCache by going to the service:

http://localhost:8830/nucleus/atg/dynamo/servlet/pipeline/FileCache

The FileCache has properties for totalSize of the cache, number of hits, number of misses, ratio of hits to total requests, and size of current entries. Make sure the ratio of hits to misses is high (assuming that you have enough memory to make it so). Make sure that the size of your file cache is large enough to hold the frequently served content, but small enough to fit comfortably within your JVM size. See Adjust the FileCache Size .

You should also check any request logging that you are performing. By default, Dynamo is configured to log request events, content viewed events, and user events to the file system. This logging is queued to a separate thread so that it is done efficiently, but this action will consume some I/O resources of your server.

Checking for Network-Limited Problems

One way to identify network-limited performance problems is by getting your JVM to dump out stack traces while your system is under load. You can tell if your system is network limited because your thread dump will show lots of threads waiting in socket reads or writes. See Getting Java VM Dumps and Analyzing Java VM Dumps .

One sign of network-limited performance problems that may show up in a Java VM dump is threads waiting to read from the Connection Module called from the sendReply method. In this case, the DRP Server has written the data to the Connection Module. The Connection Module is in the process of writing the data to the client. When the Connection Module has finished, it sends an acknowledgement to the DRP Server that it has finished. This is likely to happen when sending a large file to a client with a slow network connection.

Some ways to address network-limited problems include:

  • Reduce the size of your HTML files by limiting comments and white space or redesigning the content of especially large pages.

  • Increase the number of request handling threads. This won't improve the latency experienced by a user who requests a large file, but it will improve total throughput.

  • Get a faster network connection.

  • Locate and correct network bottlenecks.

Bottlenecks at High CPU Utilization

If your site CPU utilization is close to 100%, you can use a tool like OptimizeIt or KL Group's JProbe Profiler to help determine slow points of your code. See Using OptimizeIt .

In some instances, OptimizeIt cannot handle large sites running under load. If so, another way to identify deadlocks and bottlenecks is to get your JVM to dump out stack traces while your system is under load. If you examine 5 or 10 of these stack traces, you can start to see a pattern and find places in your site that are consuming CPU resources or causing deadlocks. See Getting Java VM Dumps , Analyzing Java VM Dumps , and Bottlenecks and Deadlocks .

Thread Context Switching Problems

Check how many simultaneous requests are typically being handled when you have a large number of clients trying to access Dynamo. When the site is under load, go to the DRP Server page in the Dynamo administration page, at http://localhost:8830/nucleus/atg/dynamo/server/DrpServer , and see how many handlers are active.

Thread dumps can be useful to see where these threads are waiting as well. If there are too many threads waiting, your site's performance may be impaired by thread context switching. You might see throughput decrease as load increases if your server were spending too much time context-switching between requests. Check the percentage of System CPU time consumed by your JVM. If this is more than 10% to 20%, this is potentially a problem. If you see several threads in a thread dump that are in a runnable state, you may want to try lowering the number of DrpServer handler threads. You should also verify that the priorityDelta property of the /atg/dynamo/server/DrpConnectionAcceptor component is a negative number. This setting should ensure that Dynamo finishes processing any requests that have work to do before it picks up a new request and should reduce thread context switching. However, thread context switching also depends in part on how your JVM schedules threads with different priorities.

You can also reduce overhead from thread context switching by making sure you have at least one CPU for each process involved in handling the majority of requests: one CPU for your HTTP server, one for Dynamo, one for the database server.

You might see throughput go down as load increases in cases where all of your DRP handler threads were busy waiting for some resource at the same time. For example, you might have one page on your site that makes a very long-running database query. If you increase the number of clients well beyond 40, you might see all 40 threads waiting for the response to this query. At this point, your throughput will go down because your CPU is idle. You should either speed up the slow requests (perhaps by adding caching of these queries) or increase the number of DRP handler threads to increase the parallelism. Of course, at some point, the database may become the bottleneck of your site (which is likely before you have 40 simultaneous queries running).

Note that if you are using green threads rather than native threads, thread context switching shows up as user time rather than as System CPU time.

Context switching can also occur when you have a network protocol which synchronizes too often (such as sending a request and waiting for a response). For example, Dynamo synchronizes with the Connection Module on each call of out.flush() .

Typically, these context switches can be overcome by increasing the parallelism in your site (increasing the number of DrpServer handler threads). If there are just too many of these synchronization points, though, this won't work. For example, if you have 40 synchronous RPC calls for each HTTP request, you'd need to context switch processes 80 times for each request if you handled one request at a time. If you handled 2 requests at a time, you'd cut the number of context switches in half. This is in addition to the number of handlers that you'd need to hide any I/O or database activity so the number can add up fast.

System Resource Bottlenecks

If your site has not maxed out either CPU utilization, database server utilization, or I/O subsystem, the problem may result from synchronized access to one of your system's resources (disk, network, database, etc.). This situation occurs when you access this resource from within a synchronized method in Java. All other requests wait for this monitor lock while you do the I/O, thus wasting both CPU and I/O resources. The only ways around this problem are to recode the Java (the right solution) or add more Dynamo instances (the wrong solution).

The easiest way to find these problems is to test your site when it is serving pages under load and get a JVM thread dump. By examining the thread dump, you may see one thread waiting for a response from the OS (database or I/O) and a set of other threads waiting on a monitor lock that this other thread has. See Getting Java VM Dumps , Analyzing Java VM Dumps , and Bottlenecks and Deadlocks .

Lower Thread Priorities

If you have a rarely used feature that uses a lot of CPU resources, you can lower the priority of the thread that handles requests for that feature. Use the setPriority() method of java.lang.Thread to temporarily lower the thread priority. This will result in higher latency for users of that expensive feature, but prevents that feature from hurting performance of other users.

TCP Wait Problem on Solaris

In some testing situations involving a very large number of requests from a single client on the Solaris platform, you may see a dramatic and periodic decline in throughput. You may be able to correct this by modifying the tcp_close_wait_interval setting in the /dev/tcp module. You can do this in two different ways:

  • Start ndd , access the /dev/tcp module, and change the value of tcp_close_wait_interval to 60000 (60 seconds).

  • Edit the /etc/init.d/inetinit file and include the following line:

    ndd -set /dev/tcp tcp_close_wait_interval 60000

Tags: -

Related entries:

Last update: 2006-12-02 12:46
Author: Oleg
Revision: 1.0

Digg it! Print this record Send to a friend Show this as PDF file
Please rate this entry:

Average rating: 5 out of 5 (1 Votes )

completely useless 1 2 3 4 5 most valuable