Monday, December 17, 2007

UKOUG 2007: Ruby Tuesday

Tuesday was a long day for me. I was awake at 5:00am. On Monday I was awake at 5:30am. If it wasn't for the fact that I'm still in the same timezone I would swear I was jet-lagged. It isn't "jet-lag" either - I mainly drank water at the bloggers' meet-up last night. Anyway, it gave me the chance to run through my presentation before breakfast.

The first session of Tuesday was Sue Harper's overview of SQL Developer's new features. These transpired to be the new features in the coming 1.5 release rather than the latest production release. The upcoming release is special because its features are ones suggested and voted on by SQL Developer users on the SQL Developer Exchange site. The team deferred the planned 2.0 release because its manifest contained no features which users were clamouring for, at least not according to the SQL Dev Exchange rankings. So if there's some functionality you want to see added to the product, suggest it. If enough other users agree with you there is probably a reasonable chance you might get it. Power to the people!

Most of these new features are enhancements to existing functionality - a new SQL formatter, better code insight. This is because the SQL Dev team is quite small and has only one real developer. The team piggy-backs on the work of the JDeveloper team, so to a large extent the feature-set in SQL Dev depends on what is in the other tool. That is why the next release will feature the ability to see queue tables but not to create or alter them. In the long run we can hope that the JDev modelling features will be included in SQL Dev.

Database vs middle tier


My chairing of Toon Koppelaar's session got off on the wrong foot. Of course I should have checked how to pronounce his name before I introduced him - Dutch names tend to sound like a cough - but as it turned out it was the name I thought I knew which I had in fact got wrong. Toon (pronounced tone) was generous. Apparaently he gets call "Tune" quite a lot.

Toon's talk was another waltz around the question of where business logic properly belongs. Toon distinguishes three times of logic:
  • data logic - directly manipulating the data;
  • UI logic - directly affecting the user's interaction with the system;
  • business logic - everything else.
Toon rehearsed the familiar arguments. We have moved almost full-circle from text-based stateless apps in the seventies and early eighties through client server apps to the current fad for graphical stateless apps. The main difference between the green screen dumb terminals of yore and today's browser-based apps is responsiveness: the new apps do try to respond to the user in the client layer. This comes at a price: the presentation layers have a large number of moving parts, drawn from an exploding range of widget toolkits and frameworks. Keeping up with all these different bits requires a significant knowledge investment, which in turn leads to teams of specialists overseen by architects (you have to hear Toon say "architect" to realise that he means it as an insult).

But the real problems arise when the business logic is written in Java or .Net or one of these other fancy-dan languages. These applications tend to be very chatty - Toon quoted an example where a single search spawned 7000 separate calls. They also tend to date very quickly, which makes them hard to maintain and hard to migrate. Most applications are "Window on Data" - the user retrieves data and updates data. This is precisely what databases are meant to do. Consequently the underlying database design is paramount. The DBMS and PL/SQL have been the single stable constant in system architectures for more than fifteen years now. So put your business logic in the database (PL/SQL and views) and just change the front-end layer to keep up with the Next Big Thing.

The interesting thing is that these talks - due to the nature of the conference - are usually singing to the converted. However, I noticed an ex-colleague, who is a Java head, in the audience. I chatted to him afterwards. He was spitting feathers. Strawman arguments... business logic is object-oriented....PL/SQL is too hard to maintain. In the end I had to calm him down by proposing that he present his opinions at a SIG. He said he'd think about it. I have heard this a number of times now. Both sides try to bludgeon the other side into admitting defeat. The trouble is finding people who are equally experienced in both database and middle-tier architectures. Almost everybody is a specialist on one side of the fence or the other. There are good reasons why this tends to be the case but it does make it difficult to hold a discussion in which everybody understands the issues involved.

So I think it comes down to numbers. Front-end developers generally outnumber database engineers in any organisation, so most of the application will be written in the programming paradigm which suits them. This will lead to the business logic in the middle tier, because that's what they understand. Whether this is the correct solution is a different matter.

11g for developers


Well at least I managed to pronounce Tom Kyte's name correctly. In this session Tom covered the key new features for developers. As with the other session, I'll list the features he mentioned, and just expand on a couple of points which were news to me:
  • Client result cache
  • server result cache
  • PL/SQL result cache
  • virtual columns
  • the long awaited PIVOT clause
  • finer grained dependency analysis
  • Advanced compression
  • PL/Scope
  • Method 4 dynamic SQL
  • compound triggers
  • change to WHEN OTHERS exceptions (see below).
The more I hear about the result caching features the more I think my project ought to move to 11g. Currently we are on 9i with only the vaguest plans to migrate to 10g. For the last few months I have been, in effect, implementing something like result caching, only not as efficient. A crusade for the new year. Something I had not picked up on before is that the PL/SQL result cache only applies to functions, not procedures, and also doesn't work with functions which have OUT or IN OUT parameters.

Bryn Llewellyn, the PL/SQL product manager, offered Tom the choice of three new features. Tom asked to have the WHEN OTHERS exception banned; regular readers of AskTom will know that the use of this exception without an accompanying RAISE or RAISE_APPLICATION_ERROR is one of his bugbears. In the end they compromised: WHEN OTHERS without a RAISE or RAISE_APPLICATION_ERROR will throw a compilation warning. But only if the compilation is set to the correct level. I rather suspect the sort of developers who code such silent bugs in their exception handlers probably don't use the compiler flags.

The duplicity of duplicate rows


Hugh Darwen started the third talk in his series of SQL's biggest mistakes with a discussion of the Bellman's line from the Hunting of the Snark, "What I tell you three times is true." He followed this up with Ted Codd's observtion that "If something is true saying it twice doesn't make it any more true."

Duplicate rows in a resultset are duplicitous because they can represent two different problems. One is that the table contains duplicated data. The other is that the SQL has generated duplicate rows, for instance through an imperfect join. Either way it is an offense against relational theory, which requires that a relation consist of unique records. The possibility of valid SQL generating duplicate rows is a fundamental flaw in relational theory, which wasn't anticipated by Ted Codd. There are workarounds. We can introduce artificial uniqueness through sequences or or on the fly row number generation. We can suppress duplication through the use of DISTINCT or UNION, or we can ensure we include all the key columns in our select statements and include differentiating tags in UNION ALL statements.

No solution is entirely satisfactory. Synthetic keys do not address the underlying problem of duplicate data in the table. That really needs to be tackled by a decently normalised data model and building unique constraints on the natural keys. Removing duplication from the resultset has performance implications. We wouldn't want to pay the overhead for applying DISTINCT to every query but how else can we guarantee uniqueness in the resultset. Hugh suggested this problem was down to the database vendors but I think that is a bit cheeky. DISTINCT inevitably requires additional work on the part of the database and that has to cost something. It can be optimized (with the kind of hashing algorithm Oracle introduced for GROUP BY in 10gR2) but as Scotty was wont to observe, "Ye cannae break the law of physics".

Incidentally, Hugh owned up to introducing the NATURAL JOIN syntax into the ANSI standard. He acknowledged that it was a controversial addition. Indeed. Some people I know regard it as a mistake; perhaps not on a par with NULLs but a mistake none the less. Personally I think it's a heffalump trap. It may be a boon for the experienced SQL practitioner but it's just a bug waiting to happen for the unwary. And if my experience on the OTN forum is at all representative the clueless outnumber the clueful, and they appear to breed faster. It's not the way we'd like it to be but it's the way it is. Of course this is just the sort of pragmatic approach which often enrages the theoreticians (or at least Fabian Pascal).

Hugh rounded off his presentation with the observation that fundamental mistakes are often irrevocable. Furthermore they don't often reveal themselves until it's too late. So maybe it's time for a Truly Relational Database. This call for for arms was delivered with a knowing smile - most of us in the audience are Oracle wranglers first and foremost. So he asked the question of himself, why does he keep doing this? Because somebody has to, otherwise the importance of theory will disappear off the practitioners' radar altogether.

Understanding statspack


Jonathan Lewis also started his presentation with a quote:
"------ is the art of looking for trouble, finding it everywhere, diagnosing it wrongly and applying unsuitable remedies."
This turns out to be Sir Ernest Benn on politics, but as Jonathan rightly says it acts as a neat summation of the state of Oracle tuning.

All database performance problems mean one of two things:
  • the database is working too hard; or
  • it is waiting for someting.
Waits are due either to the inherent slowness of a particular step or because there is a queue for a particular resource. There are only issues:
  1. disk I/O;
  2. CPU consumption;
  3. network delays;
  4. application contention (locks);
  5. internal contention (latches)

It's almost always the SQL at fault:






SQL ordered by getsCPU
SQL ordered by readsDisk (CPU)
SQL ordered by executionsCPU (network)
SQL ordered by parse callsCPU (internal contention)
SQL ordered by version countsCPU, internal contention
SQL ordered by CPU timeCPU
SQL ordered by elapsed timeapplication contention

The problem with Statspack is that it just gives us a list of symptoms. It can be hard to distinguish the flow of the events which underlie the bald statistics and even harder to determine possible causes. What sort of database activity might generate lots of logfile sync waits? So Jonathan advises us to play with Statspack. Run a query many times; check the Statspack output. Change the statement and repeat the operation. Stress Oracle and see what happens.

The other important thing is to keep an archive of Statspack results. Quite often performance problems arise because something has changed. The current output may flag up a problem but it might not be enough to diagnose it. Having a history of Statspack output will allow us to spot trends, repeating patterns or sudden changes in behaviour.



..occurrence
..rarefrequent
impacthugeeasy to seeeasy to see
.tinyirrelevantstatistically visible

This tied in nicely with Robyn Sands's observation that performance tuning is about reducing variability.

Tuesday evening


In the evening we had the SIG Focus Pubs. This is supposed to be an opportunity for delegates to meet others with similar interests. I'm afraid I took it as an opportunity to find presenters for my next SIG (27th February 2008 - save the date!) I did chat with somebody who is still happily using client/server Forms 6i. Well, it's table, quite richly featured, highly productive and it works - at least for the time being. This person expressed the hope that Forms 6i would run on Vista. Somehow I doubt it (and so did a former Forms guru of my acquaintance). Apart from anything else, archaic software like Forms is exactly the sort of thing which will display very slowly on Vista's new rendering architecture.

Afterwards a few of us went for a mealy at a rather upmarket curry house called Lazeez (hence "ruby Tuesday"). The food was very nice but there were no side dishes. An innovation too far. Anyway, my thanks to Joraph, who picked up the tab.

1 comment:

Anonymous said...

I just want to say that NATURAL JOIN doesn't always suffer from the problem that some people so dislike it for. Consider:

SELECT * FROM ( SELECT A,B FROM T1) U1 NATURAL JOIN ( SELECT B, C FROM T2 ) U2

That's an example where * is 100% safe too!

With a proper relational language, by the way, one would expect to be able tell the DBMS which columns are expected to be the common columns. Then the compiler throws an error when the common columns are not the expected ones. That was what I thought SQL's USING construct was going to do, but it turned out to be slightly different.