Know yourself and know your enemies, use you ammunition to annihilate the enemies. So before going for a war, first figure out whether you have enemies (performance issues). Is your application not meeting the performance benchmark? There is always scope of improvement, albeit in diminishing returns, so your performance work would be never completed unless you set the benchmark. Just a side note, here by performance, we mean the response time and the throughput.
If you have hammer everything looks like nail, so your next step is to look into your armory. In this step you should use the divide and conquer strategy. Divide the issues into layers.
How do you want to start?
OK, I’ll suggest, fight from the front.
So your layers could be
The Application layer has lot of sub-layers, namely the algorithms (the Code you write), database, JVM, and the OS.
The front layer is easy and complicated at the same time. Easy in the sense that algorithms are missing or are trivial, and complicated in the sense that the code will behave differently in different browsers. But first thing first, diagnose the issue. The useful diagnostic tools are HTTPWatch, Firebug, ySlow and PageSpeed, based on the browser you support.
You need to familiarize yourself with these tools. On high level, the optimization areas/steps are:
Note that browsers have limit of concurrent connections (mostly in single digits), so reduce number of concurrent requests by merging/concatenating resources.
As a developer, you don’t directly control this layer. So most of the time, issues in this layer are network bandwidth or network IO related. The best bet in this case is to monitor the network and fix issues if you notice it. Some useful tools available for this layer are WireShark, fiddler, tcsdump/Windump, Cain & Abel, Netdude. These tools are based on packet analysis which has a steep learning curve.
On higher level, you may use network IO utilization monitoring tools like nicstat and windows Task Manager.
Avoid large number of network read and write with small amount of data in individual calls. Use of non-blocking java NIO instead of blocking java Sockets will improve the performance. As a rule of thumb, UDP is better performant than TCP, so favor UDP if you can live without transmission control. The other common issue is related to data download where users in different locations have different response time. You can use Google analytics for the diagnostic, and CDNs (Content Delivery Networks) to solve this problem by distributing resources over networks. Akamai is one such service provider if you don’t have this facility available natively.
This is the layer where you put most of your efforts. By the response time of this layer, we mean the response time of the stating method. The simplest way to measure it is to use System.currentTimeMillis() directly or in AOP. There are many profiling tools available, both free and paid to measure the response time on method level in addition to other features. We will discuss Vantage Analyzer from Compuware.
The main concept behind this profiling is measuring the method response time and the number of child methods called and the time spent in child methods. By following the child method call stack, you can also find the time spend in DB calls. Look into the circled area in the image which visualizes what we have just talked about. So if the method itself is consuming most of the time, then you should look into its algorithms, also pay attention to the method invocation count, otherwise move to down streams to child methods and DB calls.
Suppose, from the above analysis you have figured out the bottleneck to be the DB calls. So your next step should be figure out the issue in the DB. For oracle, AWR (Automatic Workload Repository) and ADDM (Automatic Database Diagnostic Monitor) are tools for your rescue.
The AWR is used to collect performance statistics including:
The ADDM analysis includes the following.
An example from a test instance is shown below.
FINDING 1: 59% impact (944 seconds)
The buffer cache was undersized causing significant additional read I/O.
RECOMMENDATION 1: DB Configuration, 59% benefit (944 seconds)
ACTION: Increase SGA target size by increasing the value of parameter
“sga_target” by 28 M.
SYMPTOMS THAT LED TO THE FINDING:
Wait class “User I/O” was consuming significant database time. (83%
impact [1336 seconds])
The recommendations may include:
For tuning on jvm level, you should consider following points, which are the main spots for tuning.
If you set a large heap size, full garbage collection is slower, but it occurs less frequently. If you set your heap size in accordance with your memory needs, full garbage collection is faster, but occurs more frequently.
Depending on which JVM you are using, you can choose from several garbage collection schemes to manage your system memory. For example, some garbage collection schemes are more appropriate for a given type of application. So based on JVM you use, you can optimize the configuration of the garbage collection.
For further analysis, take heap dump and thread dump and analyze with tools of your choice. You may use IBM workbench assistant or dump analysis plugins in eclipse.
In small applications this sub layer is generally ignored but for big application this has a very crucial role. Once your cpu utilization crosses 70-80%, the systems becomes unstable. Studies have shown that doubling the number of CPUs increases servlet performance by 50 to 80 percent. Hot spots for OS tuning are
You may use NMON analyzer which provides huge amount of information for this purpose. It concentrates on performance information for the performance tuner and in a concise layout to aid understanding. This includes: CPU, memory, disks, adapters, networks, NFS, Kernel statistics, File-systems, Workload Manger (AIX), Workload Partitions (AIX) and Top Processes.
To win a war, all units namely armies, navies and air force must work in sync to achieve the target. The same concept applies to software. So don’t spent too much effort in just the jvm and algorithm optimizations, rather take the all-inclusive approach. In many cases where response time was 10-20 sec, server response time was just 500ms to 1 sec. So even if you reduce the server time, you will not get the desired result time. Also, it’s is a huge and complicated task, so familiarize with the tools without which can’t do much.
So what next? Ok, in this blog we have discussed mainly the performance targeting the response time with single user perspective. What happens when the number of users increase, how your system is going to behave with increase in load? So, scalability and throughput will be subject of our next discussion. We will look into load balancing and clustering to cope with such situations.
An engineer by profession but a scientist by heart, ever juggling between perfection and an optimal solution.