In the prior blog post, I provided a checklist of some metrics that would help you assemble your defense if it was asserted that your database was the source of performance problems. But those ratios and indicators are just circumstantial evidence of probable innocence. Here comes the DNA test. It's hard. It's time consuming. It's complex. But the analytical effort just might help get you out of the hot seat and properly direct a performance issue to application or networking teams...
In the prior blog post, I provided a checklist of some metrics that would help you assemble your defense if it was asserted that your database was the source of performance problems. But those ratios and indicators are just circumstantial evidence of probable innocence. Here comes the DNA test. It's hard. It's time consuming. It's complex. But the analytical effort just might help get you out of the hot seat and properly direct a performance issue to application or networking teams...
The Big Picture
We need to figure out, for a given time period, the average database transaction response time. A database transaction begins with the first SQL call and ends upon execution of a commit or rollback command. Generally, the database transaction response time mirrors the overall end user "end-to-end" response time. I readily acknowledge that some application processing and network time may occur before the first SQL call and after the commit or rollback. I my experience working with customers for the last 12 years, Database Transaction response time usually represents 95-98% of End-to-End User response time.
Within the scope of a database transaction, several time consuming events can occur. Time consumed by the database (that we can measure) includes:
The above times are often called "Inside DB2 Time". Transaction time can also be consumed outside of the database by:
The Basic Math
We begin by finding the average Database Transaction time for a given time period. From this average transaction time (TXAET), we subtract the CPU time, Sort Time, Lock Time, and I/O times (collectively the Inside DB2 Times). What is left over, the difference, if any, is time that the database cannot account for or does not own; this is Outside DB2 Time. To prove that your database is NOT GUILTY of degrading business performance, you want to see a high percentage of time Outside of the database.
Please note that the transaction event monitor reports CPU time consumed in microseconds, but, however, most other times in DB2 LUW are reported as milliseconds. For this exercise, we will convert all times to microseconds.
RUN the following SQL query against your transaction event monitor table:
select avg(bigint(timestampdiff(1,char(stop_time - uow_start_time)))) as TXAET, avg(user_cpu_time + system_cpu_time) as TXAVGCPU, avg(lock_wait_time)*1000 as TXAVGLOCK, count(*) as EVTTXCNT from classlab.xact_classlab;
The timestampdiff "1" computes the time difference between two timestamps in microseconds. If your SQL query errors out with a buffer overflow, try changing to timestampdiff "2" which finds the difference in seconds - then multiply seconds by 1,000,000. You can also try using FLOAT instead of BIGINT.
The average lock wait time is multiplied by 1,000 to convert from milliseconds to microseconds.
EVTTXCNT will tell us how many transactions completed while the event monitor was active.
Step 2: Compute SNAPTXCNT
Look at your second database snapshot, "dbsnap2.txt" Add together the number of Commit Statements Attempted plus Rollback Statements Attempted. This is SNAP2TXCNT.
Look at your first database snapshot, "dbsnap1.txt". Add together the number of Commit Statements Attempted plus Rollback Statements Attempted. This is SNAP1TXCNT.
Subtract SNAP1TXCNT from SNAP2TXCNT giving SNAPTXCNT.
If you followed the preparation instructions closely, the delta values from the two snapshots should sync up with the activity and time period reflected by the event monitor.
Your value for SNAPTXCNT should very closely match EVTTXCNT. If there is a difference of more than just a few transactions (ideally they should be equal), then your snapshot data and event monitor data is out of sync and you should start over with the preparation instructions.
Step 3: Compute TXAVGSORT, the average sort time, in microseconds, consumed by each transaction.
Step 4: Compute TXAVGBPRIO, the average bufferpool read I/O time, in microseconds, per transaction.
Step 5: Compute TXAVGBPWIO, the average bufferpool write I/O time, in microseconds, per transaction.
Step 6: Compute TXAVGDRIO, the average direct read I/O time, in microseconds, per transaction.
Step 7: Compute TXAVGDWIO, the average direct write I/O time, in microseconds, per transaction.
Step 8: Compute TXINSIDE, the amount of Inside DB2 Time (that we can reasonably account for) per transaction in microseconds
Step 9: Compute TXOUTSIDE, the amount of Outside DB2 Time (that we cannot account for) per transaction in microseconds
Step 10: Compute INSIDEPCT, the percentage of time consumed by Inside DB2 Time.
Step 11: Compute OUTSIDEPCT, the percentage of time not consumed by the DB2 database
Divide all of the computed average times by 1,000 to convert to milliseconds, or divide all of the computed average times by 1,000,000 to convert to seconds. Seconds seems to be the most popular scale amongst database professionals.
Step 20: Make it attractive for management
Congratulations! You made it to the end of the math! Now put all of your numbers into Excel or PowerPoint, or equivalent open source products, and create some pretty graphs for management. Everyone knows that management likes graphs.
DISCLAIMERS and IMPORTANT NOTES
Don't be alarmed if your Inside DB2 Time (TXINSIDE) adds up to more than, or is greater than, the Database Transaction Average Elapsed Time (TXAET). DB2 can consume CPU time, Sort time, and I/O time in parallel, so it is possible that the sum of inside time components can be greater than the database transaction response time. We love DB2 for its parallelism. Zoom zoom.
Because DB2 can consume time in parallel, the scale is potentially tipped in favor of high Inside DB2 Time. So, if you complete this analysis and find that your percentage of Inside DB2 Time is a relatively small percent (less than 30%), then you have a very strong DNA test argument in favor of "It's not your or your database's fault!"
If the percentage of Inside DB2 Time is high, then the DB2 database performance is at fault and the DBA truly does own the performance "opportunity for improvement". If this is the case, use the Inside Time percentages for CPU, Sort, Lock, and I/O to help determine where your resource bottleneck is. We'll cover next steps in future blog posts.
A Real World Example
I teach this class at IDUG, at various regional DB2 User groups, and on site at company locations. Last week at the Heart of America DB2 Users Group, one of the students had these values:
All times shown in microseconds. Percentages are rounded. It is quite common for CPU time to be the highest resource consumer thanks to large bufferpools and hidden physical design flaws resulting in scans.
Frankly, now that you know how difficult this analysis is to do by hand and the steps involved, I hope you will search the internet to find a solution that automates all of this for you...
Now that you've searched Google and discovered that only DBI provides a solution for analyzing DB2 LUW transaction response times and service level attainments, please have a closer look.
DBI's Brother-Thoroughbred® was specifically designed to help DBA teams manage performance accountability, identify ownership of issues, accurately ascertain resource bottlenecks inside the database, and monitor achievement of Service Level Agreements (SLAs).
This Brother-Thoroughbred image shows the average database transaction response time, how much time is inside the database (owned by the DBA team), and how much time is outside the database.
This Brother-Thoroughbred image shows the time distribution of time spent inside the database. The largest pie slice represents the primary resource bottleneck. Clicking on a pie slice will automatically take you to Brother-Panther's statement analysis sorted according to the resource bottleneck selected. Click, click, solved!
Just for Fun
I found a cool retirement calculator the other day and thought you might find it helpful. Check out: http://www.ing-usa.com/us/individuals/planningtools/calculatorsto ols/retirement/index.htm
Cheers,
Scott
Scott Hayes
President & CEO, DBI
IBM DB2 GOLD Consultant