Papa's got a brand new BAAG
"Why on earth do we still rely on guesswork in other troubleshooting tasks especially when more deterministic paths are available?"
The thread in question had been started by somebody who had a query which had started running slowly after a a couple of months of satisfactory performance; furthermore the errant query requires a lot of CPU. What, asked Joe Armstrong-Champ, might be the cause? Several Oracle-L regulars replied with things which could cause excessive CPU consumption. Even Cary Milsap, who through his work with Hotsos might be regarded as the Pope of evidence-based tuning chipped in a few guesses of his own. As Alex pointed out, nobody replied with suggestions for investigating the problem in a methodical fashion.
Thinking - we've heard of it
So guessing is obviously innate. Why might that be? Well, Michael "Rands" Lopp, in a classic Rands In Repose article, has the answer:
"Why can't you think when you're busy? Because you're not thinking.. you're reacting."
Rands' argument is that in times of emergency we don't think. Rather, we flip through a mental rolodex of similar experiences and do whatever it was we did last time. In his words, "Panic is the mother of the path of least resistance". This is why the forums and listservers are the first resort of weak swimmers. They do not have a catalogue of previous experiences to draw on. So they simply post a half-baked question, probably with an generic subject - URGENT! PLZ HELP!!!!! - and a message usually lacking both useful information and any regard for mixed case. The galling thing is that we often respond to their panic with guesses, which is Teh Suck. There has to be a better way.
In his article, Rands proposes that we do are thinking at the most conducive time. Coming from a product perspective, his time for deep thinking is right after a major release, when the lessons are still fresh in everybody's mind and nobody is yet panicking about the next deliverable. Then is the time to think about what went wrong, how things could have been done better and to put a plan of action in place to avoid those things next time. Other jobs have other reflection points.
Red Five, where are you?
A couple of weeks back I got to hear Spike Jepson talk about his time as leader of the Red Arrows, the RAF's acrobatic display team. After every sortie (practice flight or real display) the team gathers for a debriefing. The team leader kicks off the session by going over his own mistakes. This creates an atmosphere in which the other team members can admit their bloomers without recrimination. This is crucial for the next step, which is to address the underlying issues and devise strategies for avoiding them next time.
The debriefing session finishes off with the team leader lobbing a problem at somebody - "Red two, we're coming out of a diagonal somerset into an obverse corkscrew over the heads of ten thousand people at Eastbourne when your engine catches fire. What do you do?" The team member then has to step through every action he would take to minimise the risks to himself, the rest of the team and the watching crowds, whilst not disrupting the display too much. In other words, rehearse the emergency procedure in a situation of calm, when no lives are at stake. Consequently, everybody knows what to do in a real emergency. Because when you're hurtling through the air at 400mph mere inches from several other airplanes there is no time for thinking, only for reacting.
Obviously we don't operate at such extremes. But the principle still holds true. We all have some form of mental toolkit which we bring to bear on tuning problems on our own databases. The most advanced will have a set of Perl utilities, tailored SQL scripts and an archive of benchmarks; the stragglers will have bookmarked some URLs on the Wait Event interface. The thing is, when we are faced with somebody else's problem it is quicker to toss out a few guesses than it is to properly engage with them. If your MO is predicated on having archived performance profiles which you have custom generated for your system it is hard to know where to start with somebody who hasn't even heard of Statspack. After all, this response is an e-mail or a forum post squeezed in during coffee time: nobody has the time to write a comprehensive explanation targeted to the original poster's specific problem.
This is where BAAG can help. It provides an obvious focal point for people who want to make things better and who have the required expertise. In the long run BAAG ought to be a compendium of resources and advice for the guess-prone. But as a starting point I think it should have a comprehensive (and comprehensible) list of questions we need to ask every poster with a performance issue. For instance
- Which version of the database are you using?
- Which version of which operating system?
- What sort of application is it?
- Is this problem intermittent or reliably reproducible?
- Does this problem occur on all instances of your application or just in live?
- Did this query used to perform well? If so, what has changed?
- Do you have benchmarks (eg Statspack archives) for your system?
- Have you run 10046 and 10053 traces against this statement?
Once BAAG has such a list of questions we can respond to ill-formed posts simply by linking to it. Requiring the OP to answer these questions means we are forcing them to learn how to solve their own problems. Of course, it is incumbent on BAAG members to exercise willpower and not succumb to offering random suggestions.
After it, if it is truly the Battle Against Any Guess, then it doesn't just apply to when we ourselves are in shtook. We have to fight guesswork when we're troubleshooting other people's problems too.