Thursday, October 11, 2007

Limitations of wait event analysis?

OK, so we have come full circle to know that hit ratio tuning doesn't work (for everything), and now wait event analysis has been deemed to have limitations...but what are they? I sense a *great* interview question coming on for a senior DBA position.

Quite frankly, there is no silver bullet in tuning. All that matters is how is your application performing. Can you risk walking down the hall to the restroom without a user "accidentally" tripping you on the way? Having said that, I do think wait event analysis is an excellent place to start. Humans measure most things by time..."the system is really slow today...", so analyzing the wait event interface is a great place to at least see what is, well...waiting :)

An example will be provided for where it is of good use. User A comes into DBA's cube, and says, "We installed a new version of the application software this weekend, and now the system is crawling."

The DBA immediately goes to his handy grid control interface, and runs ADDM for the last 30 minutes. He immediately sees that I/O is consuming nearly 60% of user time (db file scattered read). He looks at the top SQL, and sees there is an index that needs to be created. Voila, problem solved.

Now let's look at a different example. While being proactive one Thursday afternoon, the DBA reviewed the top waits of the past 60 days in his database. "What's this?! We have quadrupled our waits on "direct path read" in the last 30 days. What happened?!"

Since his restroom trips have proven largely uneventful in terms of being tripped, he checks other statistics. He finds a corresponding increase in "physical reads direct (lob)" over the same time period.

He finally drills down into the historical SQL detail, and finds a new statement that begin running 30 days ago (right after a software install). He compares the time spent performing I/O by the statement and it is almost perfectly correlated to his increase in direct path read waits. He finds it does an EXISTSNODE call on a cached XMLTYPE column. Instant answer! But where is the problem? There may or may not be one.

Stay tuned for more later...

No comments:

Locations of visitors to this page