Earning Your Performance Tester Stripes

Earning Your Performance Tester Stripes
How does a performance tester earn the gold stripes? By providing results that enable fixing underlying problems. As a customer on a recent performance testing project commented, “We aren’t done until we diagnose and fix the two issues the last test uncovered.” He was referring to two graphs from our final test that we had not yet fully explained (see fig 1 & 2).

While neither of these issues had a discernible impact on the application’s scalability, we agreed. Each graph signaled a risk that under real production conditions could have a serious impact on both system performance and availability. A deeper dive with a systems engineer and an application expert was the next step to confirm our hypotheses, identify the root cause, fully explain, and resolve.

Failed ASP.NET Requests and Available Memory, Facets DB

As most testing professionals will agree, the hardest technical challenges of performance testing usually occur in script development, preceded by the challenges of the critical step of gathering and business requirements and modeling the workload. This is where we as performance testers truly earn our stripes: generating interpreted results that engage the larger technical team to wrestle issues to the mat and fix the underlying causes.

The process of getting to these two graphs and developing the engaging hypotheses takes a mix of discipline and creativity. I’ve codified this approach as CAVIAR: Collecting, Aggregating, Visualizing, Analyzing, and Reporting.

The Creative part is exactly that… creative. It comes from exploring data patterns, creating visualizations at different levels of data granularity and summarization, and matures with experience across many projects and different types of apps.

So what analysis path led us to these two graphs?

We started with Collecting. To the two standard measurements in every performance test – load and response times – we added system resource monitors to the infrastructure. In this case, the app runs on an all-Windows environment comprised of two web (running on IIS), two app (ASP.NET) and two DB servers (SQL Server) (see fig. 3). We began with the typical operating system points: cpu, memory, disk activity, and network throughput. Next, we added our bellwether points to the system software layer. On the web/app tiers we selected IIS and ASP.NET points: queued and failed requests, and request wait time. On the DB tier we added several key SQL Server resources: full table scans, lock waits, SQL Compilations, deadlocks (see table 2).

Test Environment

We then ran our test against the designated workload: 144 concurrent users distributed across four key healthcare member services workflows completing 480 transactions (i.e., calls) during a peak hour (see table 1).

Table 1: Workload

# Name Process Navigation Concurrent Users Target Xcns
1 STR2 Medical Claims Inquiry The customer service rep pulls up the member informationa and then reviews the data on the Medical Claims tab. Call is documented. 36 60
2 STR3 Member Demographic Changes The customer service rep pulls up the member, changes the address and marks it as the new mailing address. Call is documented. 36 60
3 STR4 Eligibility Inquiry The customer service rep pulls up the member and reviews the membership data. Call is documented. 36 60
4 STR5 Benefits Inquiry The customer service rep pulls up the member and reviews the data within the Benefit Matrix tab. Call is documented. 36 60
  Total: 144 480

Next, we moved into a cycle of Aggregating and Visualizing. Using the Analysis module in LoadRunner 11, we generated the typical initial set of graphs: network throughput, response time (at both the macro end-to-end and micro “page/user action” levels), and errors, all graphed against user load. Then we generated graphs for the monitored resources against load for all the servers: cpu utilization, disk IO, available megabytes, network throughput. Finally, we did the same for our system software points.

Table 2: Resource Monitors

Object Point Description
Processor % Processor Time/Total Monitors the average % utilization of all CPU’s
on the server, and is the key indicator of processing
Physical Disk % Disk Time/Total Monitors the percentage of time that disks are busy doing physical IO, averaging over all disks.
Average Disk Queue Length/Total Monitors the number of IO requests in the IO queue, averaging over all disks.
Memory Available Megabytes Monitors the memory available for use.
Pages/Sec The rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays.
Network Interface Bytes Total/sec./<on primary network interface> Monitors the total number of bytes of data sent and received, and thus measures bandwidth utilization and system throughput.
System Context Switches/sec. Monitors the combined rate at which all processors on the computer are switched from one thread to another.
Processor Queue Length Monitors the number of threads in the processor queue.
App Server Requests Queued ASP.NET Monitors the requests waiting to be processed
Requests Failed ASP.NET Monitors total number of failed requests
SQL Server Full Scans / sec Monitors how many times a query must examine every row in a table to fulfill a request.
Buffer cache hit ratio Monitors the percentage of the time that a data block is found in server data cache rather than giving to do a physical IO.
User Connections Monitors the number of connections to the database from the application.
Number of Deadlocks / sec. / Total Monitors database transactions that cannot complete because another transaction is holding resources it needs, and vice-versa.
Lock Waits/sec Monitors the number of lock requests that could not be satisfied immediately and required the caller to wait before being granted the lock.
Average Wait Time (ms) – Total Monitors the wait time that transactions spend waiting for a locked resource.
SQL Compilations / second Monitors the rate at which SQL or stored procedures are compiled to develop execution plans, which are then stored in cache.

We noticed immediately that LoadRunner had arbitrarily aggregated all measurements over 128-second time periods. We knew from experience that for a one hour test this is too large a time bucket, tending to “wash out” the telling high-low variations. We explored several shorter periods and settled on an averaging period of 60 seconds.
Moving to Analyzing and Reporting, we eliminated several graphs that were uninteresting — disk IO was low on all servers, as was available memory on all but one of the DB servers. Studying the cpu graph (see fig. 4), we immediately noted that App02 averaged 5% utilization (with spikes to 15%) but that the We02/App02 pair were nearly idle (0.4% and 2% respectively). We immediately suspected a load balancing problem.

CPU Utilization over Load

To test that hypothesis, we looked more deeply, this time at services running on App02. There, we were isolated APS.NET Requests Failed (see fig. 1), showing steady growth from zero to 54,000 from start to end of the test. Using this observation, we modified our hypothesis from “a potential load balancing issue” to “ASP services not functioning” – the first of our two observations that needed a further drill-down with the infrastructure team.

The second unexplained issue was the sudden available memory drop on the Facets SQL Server at minute 25 (see fig. 2). We had not totally exhausted memory, or the DB server would have likely crashed. We also were not able to correlate this sudden decrease to application response time or any other visible sign of trouble. This is the second observation that requires further investigation.

The final analysis? I promise to reveal the conclusion (diagnosis still in process) during my Fall STPCon two-part session on Interpreting and Reporting Performance Results next month. Along, of course, with many more examples of applying CAVIAR to your performance testing matrix and helping you learn to earn the highest stripes as a performance tester.

Dan Downing Dan Downing Principal Consultant, Mentora Group – Dan Downing is co-founder and Principal Consultant at Mentora, a Forsythe Company (www.mentora.com), a testing and managed hosting company. Dan is the author of the 5-Steps of Load Testing, which he taught at Mercury Education Centers, and of numerous presentations, white papers and articles on performance testing. He teaches load testing and over the past 15 years has led hundreds of performance projects on applications ranging from eCommerce to ERP and companies ranging from startups to global enterprises. He’s been a frequent presenter at STAR, HP Software Universe, Software Test Professionals conferences, and is one of the organizers of the Workshop on Performance and Reliability (WOPR).

Come see Dan speak at this Spring on March 31st as he leads his workshop Interpreting and Reporting Performance Test Results.