Tuesday, March 06, 2007

Rationalising ratios

Over the last couple of weeks I have participated in some threads in the OTN DB General forum which converge on a pattern. A neophyte starts a thread asking for assistance in understanding buffer cache hit ratios. I join the thread with the suggestion that there are better, more useful ways of tuning the database. The irrepressible Don Burleson weighs in with a defence of ratios and the thread turns into a exercise in Hegelian dialectics as the inestimable Mark Powell provides some contextualisation.

Thesis (me): ratios are not particularly helpful for database tuning
Antithesis (DB): ratios must be useful because Oracle still include them in Statspack and AWR
Synthesis (MP): ratios cannot be used in isolation and at best can only support more detailed diagnostic practices.

It happens here and here, and it may yet happen here.

I usually point people at Connor McDonald's helpful script which allows us to "tune" the SGA by generating enough useless activity to set the buffer cache hit ratio to as high a value as we could wish for. But I have recently discovered that Mogens Nørgaard has published an article he wrote for the UKOUG Oracle Scene magazine in 2005. In the third section of this article he compares and contrasts several different approaches to tuning, including his own MOANS strategy. This requires us to tune a process simply by focusing on the SQL statement which takes the longest chunk of elapsed time.

I recently undertook a tuning exercise which used the MOANS approach. We have a background process which runs a couple of thousand times a day. When we first deployed it the process ran in about six seconds; now the average time was approaching seventy-five seconds. A query against V$SQLAREA quickly identified the most expensive SQL statement by any number of criteria: ELAPSED_TIME, CPU_TIME, DISK_READS. It was a select statement consisting of a three table join which matched a single row in an intersection table to single rows in table #1 and table #2.

An explain plan revealed that the query was executing full table scans against both the intersection table and table #2. This was bad news as the intersection table was the largest table in the schema, with over 4.5 million rows and growing all the time. Inadequate indexing looked to be the reason. I built a composite index on the intersection and the query improved quite a bit. But it was still doing a full table scan on table #2. As it happens, this set up still uses the Rule-Based Optimizer (for sound but irrelevant reasons). The order of the tables in the FROM clause seem to be causing the query to drive off table #2 when it ought to have been driving off the intersection table. A quick re-jigging corrected this...except that now the query was doing a full table scan of the intersection table again. Grrrr. This was fixed by re-ordering the predicates in the WHERE clause.

At this point we had a process which ran in less than seven seconds. This was good enough, so according to the MOANS best practice we stopped tuning. At some point we will need to tweak this process further and it has other statements in the process which look susceptible to MOANS-style tuning. The point is, this approach was focused on identifying the statement which took the longest time to ruin and fixing that. Ratios didn't feature.

No comments: