Friday, December 21, 2007

OOW2K7 vs UKOUG2007

Having attended the two conferences so close together makes it easy for me to compare them. There are obvious differences. San Francisco in November is warmer, sunnier and all together more attractive than Birmingham in December. Open World is much much larger. But the UKOUG felt more tiring, at least to me. This is because the UKOUG packs a lot more into each day.

Open World is distributed across several sites. In order to give delegates sufficinet time to get from a session in the Westin St Francis to Moscone South there is a half hour gap between sessions. There are no sessions whilst the key notes are on. There are no sessions during lunch hour. The UKOUG schedules sessions with only ten minutes between them. The streams are staggered, so that there were always sessions during lunch. This obviously provides value for the delegates by offering a vast number of sessions to attend. But it also makes us feel guilty about skipping a session in order to go round the exhibition hall.

Another difference is the networking opportunities. Oracle have started putting a lot more effort into providing portals for delegates to get in contact with each other and places where people can meet and converse. This is easier because Moscone Center has some big spaces which can be used for such purposes.

Both conferences are valuable. I learned a lot from attending both. But Open World was much more relaxed. I think the fact that I have just blogged about UKOUG Wednesday more than two weeks after the end of the conferences says it all.

Thursday, December 20, 2007

UKOUG 2007: Wednesday Morning, 3 AM

Well not quite, but I was awake at 4:30am. So I ran through my presentation a couple of times and made some final tweaks to the slides. I didn't have any code demos this year, so there was no chance of me breaking anything.

Every performance problem is one of two things

I almost skipped James Morle's presentation, because I thought it would cover the same ground as Jonathan Lewis's talk on Statspack. I'm glad I didn't though, because it was completely different. But I'm even more frustrated that this talk was scheduled at the same time as Daniel Fink's talk on Why tune SQL?.

James's talk was a reassessment of the YAPP tuning method:
R = S + W

That is, response time equals service time plus wait time. Most tuning methods focus on the waits. This talk's assertion was that most fixes actually deal with skew or latency, that is, the service time. In a nutshell, Morle's Grand Unified Theory of Performance Tuning is: All problems are caused by skew, latency or both.

Bandwidth is not the same as latency: bandwidth is how much transport you have, latency is the wasted time associated with each operation. James demonstrated the difference between latency and bandwidth with some live action in which two volunteers transported bottles of beer across the auditorium, in a suitcase or by hand. The impact of Connor McDonald's infamous "squirrel sketch" is still rippling across the presentation pond. However, this example is easier to blog. Given two networks:

LAN100MB/sping = 3ms
WAN250KB/sping = 2ms

which one can transmit the most data? The answer is the WAN; the LAN's enormous bandwidth is wasted because of the additional latency in its ping speeds. Of course, the WAN is operating more-or-less at capacity and the LAN isn't.

Problems with bandwidth are easy to fix - they just require money and bandwidth is getting cheaper all the time. They are also easy to spot. Latency on the other hand is bounded by the laws of physics, which means it is not necessarily fixable to chucking money at it. Also, it can be hard to spot. Latency multiplication is a very common source of problems - 1ms per operation is a very short time until you do that operation a thousand times.

Common sources of latency:
  • nested loop joins
  • SQL*Net message to/from client
  • too many commits
  • "slow" I/O

Skew is also hard to spot because humans (and computers) tend to think linearly, and unconsciously split things into evenly-sized chunks.

Common sources of skew.:
  • distribution of column values
  • high demand for small set of resources (contention)
  • response times for component SQL statements in a transaction
So the revised version of the YAPP formula is
R = S + W

where S = (skew * service time) and W = (skew * latency + passive wait time).

Modelling on the cheap

This presentation originally started out as some jokey test data, when the UKOUG asked some of us SIG chairs to test the conference site. But the title was sufficiently compelling for me to submit it as a real paper and, more surprisingly, for it to be accepted. I think many people found the title compelling because it focuses on a real problem: how to undertake database and application modelling when the boss won't spring for an expensive CASE tool. I work for a consultancy; consultancies tend to put software licences in one of two categories - those for which the client pays and overhead. So I have an interest in tools and techniques which don't require me to get a business case approved.

I talked about low tech solutions (whiteboards, Post-It notes and index cards), desktop tools (PowerPoint and Visio) and then free modelling tools (basically Schemester). There is JDeveloper but I don't think that is really suitable for people who are developing database applications. However the rumours suggest that JDev 11g will have much more to offer on that side of things, and that means that we can hope those features will eventually be ported to SQL Developer. I feel the session went well but I shall have to wait until UKOUG publishes the evaluations.

Afterwards, somebody who introduced themselves as one of the JDev development team who worked on the original Use Case modeller said they started out modelling that tool with index cards and Post-Its. And Duncan Mills said that he didn't think the JDev features set was any worse that Schemester. It's a fair point: Schemester's big advantage is that it is a mere 295KB zip to download and has a memory footprint of ~12MB of RAM. So, unlike JDev, we can run Schemester without nadgering our desktop.

Monday, December 17, 2007

UKOUG 2007: Ruby Tuesday

Tuesday was a long day for me. I was awake at 5:00am. On Monday I was awake at 5:30am. If it wasn't for the fact that I'm still in the same timezone I would swear I was jet-lagged. It isn't "jet-lag" either - I mainly drank water at the bloggers' meet-up last night. Anyway, it gave me the chance to run through my presentation before breakfast.

The first session of Tuesday was Sue Harper's overview of SQL Developer's new features. These transpired to be the new features in the coming 1.5 release rather than the latest production release. The upcoming release is special because its features are ones suggested and voted on by SQL Developer users on the SQL Developer Exchange site. The team deferred the planned 2.0 release because its manifest contained no features which users were clamouring for, at least not according to the SQL Dev Exchange rankings. So if there's some functionality you want to see added to the product, suggest it. If enough other users agree with you there is probably a reasonable chance you might get it. Power to the people!

Most of these new features are enhancements to existing functionality - a new SQL formatter, better code insight. This is because the SQL Dev team is quite small and has only one real developer. The team piggy-backs on the work of the JDeveloper team, so to a large extent the feature-set in SQL Dev depends on what is in the other tool. That is why the next release will feature the ability to see queue tables but not to create or alter them. In the long run we can hope that the JDev modelling features will be included in SQL Dev.

Database vs middle tier

My chairing of Toon Koppelaar's session got off on the wrong foot. Of course I should have checked how to pronounce his name before I introduced him - Dutch names tend to sound like a cough - but as it turned out it was the name I thought I knew which I had in fact got wrong. Toon (pronounced tone) was generous. Apparaently he gets call "Tune" quite a lot.

Toon's talk was another waltz around the question of where business logic properly belongs. Toon distinguishes three times of logic:
  • data logic - directly manipulating the data;
  • UI logic - directly affecting the user's interaction with the system;
  • business logic - everything else.
Toon rehearsed the familiar arguments. We have moved almost full-circle from text-based stateless apps in the seventies and early eighties through client server apps to the current fad for graphical stateless apps. The main difference between the green screen dumb terminals of yore and today's browser-based apps is responsiveness: the new apps do try to respond to the user in the client layer. This comes at a price: the presentation layers have a large number of moving parts, drawn from an exploding range of widget toolkits and frameworks. Keeping up with all these different bits requires a significant knowledge investment, which in turn leads to teams of specialists overseen by architects (you have to hear Toon say "architect" to realise that he means it as an insult).

But the real problems arise when the business logic is written in Java or .Net or one of these other fancy-dan languages. These applications tend to be very chatty - Toon quoted an example where a single search spawned 7000 separate calls. They also tend to date very quickly, which makes them hard to maintain and hard to migrate. Most applications are "Window on Data" - the user retrieves data and updates data. This is precisely what databases are meant to do. Consequently the underlying database design is paramount. The DBMS and PL/SQL have been the single stable constant in system architectures for more than fifteen years now. So put your business logic in the database (PL/SQL and views) and just change the front-end layer to keep up with the Next Big Thing.

The interesting thing is that these talks - due to the nature of the conference - are usually singing to the converted. However, I noticed an ex-colleague, who is a Java head, in the audience. I chatted to him afterwards. He was spitting feathers. Strawman arguments... business logic is object-oriented....PL/SQL is too hard to maintain. In the end I had to calm him down by proposing that he present his opinions at a SIG. He said he'd think about it. I have heard this a number of times now. Both sides try to bludgeon the other side into admitting defeat. The trouble is finding people who are equally experienced in both database and middle-tier architectures. Almost everybody is a specialist on one side of the fence or the other. There are good reasons why this tends to be the case but it does make it difficult to hold a discussion in which everybody understands the issues involved.

So I think it comes down to numbers. Front-end developers generally outnumber database engineers in any organisation, so most of the application will be written in the programming paradigm which suits them. This will lead to the business logic in the middle tier, because that's what they understand. Whether this is the correct solution is a different matter.

11g for developers

Well at least I managed to pronounce Tom Kyte's name correctly. In this session Tom covered the key new features for developers. As with the other session, I'll list the features he mentioned, and just expand on a couple of points which were news to me:
  • Client result cache
  • server result cache
  • PL/SQL result cache
  • virtual columns
  • the long awaited PIVOT clause
  • finer grained dependency analysis
  • Advanced compression
  • PL/Scope
  • Method 4 dynamic SQL
  • compound triggers
  • change to WHEN OTHERS exceptions (see below).
The more I hear about the result caching features the more I think my project ought to move to 11g. Currently we are on 9i with only the vaguest plans to migrate to 10g. For the last few months I have been, in effect, implementing something like result caching, only not as efficient. A crusade for the new year. Something I had not picked up on before is that the PL/SQL result cache only applies to functions, not procedures, and also doesn't work with functions which have OUT or IN OUT parameters.

Bryn Llewellyn, the PL/SQL product manager, offered Tom the choice of three new features. Tom asked to have the WHEN OTHERS exception banned; regular readers of AskTom will know that the use of this exception without an accompanying RAISE or RAISE_APPLICATION_ERROR is one of his bugbears. In the end they compromised: WHEN OTHERS without a RAISE or RAISE_APPLICATION_ERROR will throw a compilation warning. But only if the compilation is set to the correct level. I rather suspect the sort of developers who code such silent bugs in their exception handlers probably don't use the compiler flags.

The duplicity of duplicate rows

Hugh Darwen started the third talk in his series of SQL's biggest mistakes with a discussion of the Bellman's line from the Hunting of the Snark, "What I tell you three times is true." He followed this up with Ted Codd's observtion that "If something is true saying it twice doesn't make it any more true."

Duplicate rows in a resultset are duplicitous because they can represent two different problems. One is that the table contains duplicated data. The other is that the SQL has generated duplicate rows, for instance through an imperfect join. Either way it is an offense against relational theory, which requires that a relation consist of unique records. The possibility of valid SQL generating duplicate rows is a fundamental flaw in relational theory, which wasn't anticipated by Ted Codd. There are workarounds. We can introduce artificial uniqueness through sequences or or on the fly row number generation. We can suppress duplication through the use of DISTINCT or UNION, or we can ensure we include all the key columns in our select statements and include differentiating tags in UNION ALL statements.

No solution is entirely satisfactory. Synthetic keys do not address the underlying problem of duplicate data in the table. That really needs to be tackled by a decently normalised data model and building unique constraints on the natural keys. Removing duplication from the resultset has performance implications. We wouldn't want to pay the overhead for applying DISTINCT to every query but how else can we guarantee uniqueness in the resultset. Hugh suggested this problem was down to the database vendors but I think that is a bit cheeky. DISTINCT inevitably requires additional work on the part of the database and that has to cost something. It can be optimized (with the kind of hashing algorithm Oracle introduced for GROUP BY in 10gR2) but as Scotty was wont to observe, "Ye cannae break the law of physics".

Incidentally, Hugh owned up to introducing the NATURAL JOIN syntax into the ANSI standard. He acknowledged that it was a controversial addition. Indeed. Some people I know regard it as a mistake; perhaps not on a par with NULLs but a mistake none the less. Personally I think it's a heffalump trap. It may be a boon for the experienced SQL practitioner but it's just a bug waiting to happen for the unwary. And if my experience on the OTN forum is at all representative the clueless outnumber the clueful, and they appear to breed faster. It's not the way we'd like it to be but it's the way it is. Of course this is just the sort of pragmatic approach which often enrages the theoreticians (or at least Fabian Pascal).

Hugh rounded off his presentation with the observation that fundamental mistakes are often irrevocable. Furthermore they don't often reveal themselves until it's too late. So maybe it's time for a Truly Relational Database. This call for for arms was delivered with a knowing smile - most of us in the audience are Oracle wranglers first and foremost. So he asked the question of himself, why does he keep doing this? Because somebody has to, otherwise the importance of theory will disappear off the practitioners' radar altogether.

Understanding statspack

Jonathan Lewis also started his presentation with a quote:
"------ is the art of looking for trouble, finding it everywhere, diagnosing it wrongly and applying unsuitable remedies."
This turns out to be Sir Ernest Benn on politics, but as Jonathan rightly says it acts as a neat summation of the state of Oracle tuning.

All database performance problems mean one of two things:
  • the database is working too hard; or
  • it is waiting for someting.
Waits are due either to the inherent slowness of a particular step or because there is a queue for a particular resource. There are only issues:
  1. disk I/O;
  2. CPU consumption;
  3. network delays;
  4. application contention (locks);
  5. internal contention (latches)

It's almost always the SQL at fault:

SQL ordered by getsCPU
SQL ordered by readsDisk (CPU)
SQL ordered by executionsCPU (network)
SQL ordered by parse callsCPU (internal contention)
SQL ordered by version countsCPU, internal contention
SQL ordered by CPU timeCPU
SQL ordered by elapsed timeapplication contention

The problem with Statspack is that it just gives us a list of symptoms. It can be hard to distinguish the flow of the events which underlie the bald statistics and even harder to determine possible causes. What sort of database activity might generate lots of logfile sync waits? So Jonathan advises us to play with Statspack. Run a query many times; check the Statspack output. Change the statement and repeat the operation. Stress Oracle and see what happens.

The other important thing is to keep an archive of Statspack results. Quite often performance problems arise because something has changed. The current output may flag up a problem but it might not be enough to diagnose it. Having a history of Statspack output will allow us to spot trends, repeating patterns or sudden changes in behaviour.

impacthugeeasy to seeeasy to see
.tinyirrelevantstatistically visible

This tied in nicely with Robyn Sands's observation that performance tuning is about reducing variability.

Tuesday evening

In the evening we had the SIG Focus Pubs. This is supposed to be an opportunity for delegates to meet others with similar interests. I'm afraid I took it as an opportunity to find presenters for my next SIG (27th February 2008 - save the date!) I did chat with somebody who is still happily using client/server Forms 6i. Well, it's table, quite richly featured, highly productive and it works - at least for the time being. This person expressed the hope that Forms 6i would run on Vista. Somehow I doubt it (and so did a former Forms guru of my acquaintance). Apart from anything else, archaic software like Forms is exactly the sort of thing which will display very slowly on Vista's new rendering architecture.

Afterwards a few of us went for a mealy at a rather upmarket curry house called Lazeez (hence "ruby Tuesday"). The food was very nice but there were no side dishes. An innovation too far. Anyway, my thanks to Joraph, who picked up the tab.

Thursday, December 13, 2007

In praise of the Checklist

I love reading The New Yorker magazine. Partly the it is sheer expanse of the articles, which are measured in pages rather than paragraphs. But also it's the breadth of the coverage. Okay, so I could do without the profiles of baseball coaches but pretty much every article is worth reading. Unfortunately I lack the time to read each issue, so these days I buy it when I want to pretend I am still a leisured (and cultured) person.

I have just got around to reading last week's issue. It contained a fascinating piece by Atul Gawande on the use of checklists in intensive care units. ICU staff deal with an very complicated piece of machinery (i.e. us) when it's in an extremely precarious state (hence the need for intensive care). There are thousands of different ICU procedures. Each procedure consists of multiple steps; if somebody misses or botches a step there are often terminal consequences for the patient. Furthermore each condition requires a unique combination of ICU procedures, staff and equipment. Patients in intensive care frequently die there.

In his piece Gawande talks about the efforts of a critical-care specialist named Peter Pronovost to improve survival rates by the simple expedient of checklists for a handful of common yet critical procedures. It is astonishing how such a simple thing can make such a profound difference:
"Pronovost and his colleagues monitored what happened for a year afterward. The results were so dramatic that they weren’t sure whether to believe them: the ten-day line-infection rate went from eleven per cent to zero. So they followed patients for fifteen more months. Only two line infections occurred during the entire period. They calculated that, in this one hospital, the checklist had prevented forty-three infections and eight deaths, and saved two million dollars in costs."
Less surprising but more depressing is the difficulty Pronovost experienced in persuading highly-qualified doctors to bother themselves with yet more form-filling.

Most of us in IT have similarly mundane-yet-complicated procedures. Of course, hardly any of our procedures are literally life-or-death, but there are usually penalties for getting them wrong (even if it's only digging out the manuals to refresh our memories on Flashback Database). Checklists are good because they prompt us to go through each step of a prccedure. And because the machinery we deal with is a lot more tractable than the human body we can often automate our checklists into stored procedures, shell scripts or workflow processes.

Gawande's article reminded me of a couple of things I do on an infrequent but regular basis which would benefit from being documented in a checklist. But it's also a fine and moving piece of writing and worth reading in its own right.

NOT IN, NOT EXISTS and MINUS: an aide-memoir

A colleague asked me whether NOT IN would return the same as MINUS. I said it would depend on whether the results contained nulls. I confess to not being clear as to how the results would be affected by the presence of nulls, but it's easy enough to knock up a test case.

We start with both tables containing nulls:

SQL> select id from a10
2 /


3 rows selected.

SQL> select id from a20
2 /


4 rows selected.

SQL> select * from a10
2 where id not in ( select id from a20)
3 /

no rows selected

SQL> select * from a10
2 where not exists
3 (select id from a20 where =
4 /

2 rows selected.

SQL> select * from a10
2 minus
3 select * from a20
4 /

1 row selected.

With a null in the top table but not in the bottom table:

SQL> delete from a20 where id is null
2 /

1 row deleted.

SQL> select * from a10
2 where id not in ( select id from a20)
3 /

1 row selected.

SQL> select * from a10
2 where not exists
3 (select id from a20 where =
4 /

2 rows selected.

SQL> select * from a10
2 minus
3 select * from a20
4 /

2 rows selected.


With a null in the bottom table but not in the top table:

SQL> delete from a10 where id is null
2 /

1 row deleted.

SQL> insert into a20 values (null)
2 /

1 row created.

SQL> select * from a10
2 where id not in ( select id from a20)
3 /

no rows selected

SQL> select * from a10
2 where not exists
3 (select id from a20 where =
4 /

1 row selected.

SQL> select * from a10
2 minus
3 select * from a20
4 /

1 row selected.


With no nulls in either table:

SQL> delete from a20 where id is null
2 /

1 row deleted.

SQL> select * from a10
2 where id not in ( select id from a20)
3 /

1 row selected.

SQL> select * from a10
2 where not exists
3 (select id from a20 where =
4 /

1 row selected.

SQL> select * from a10
2 minus
3 select * from a20
4 /

1 row selected.


Monday, December 10, 2007

UKOUG 2007: Presentations available.

I haven't finished writing up the conference and already the presentations are downloadable from the conference agenda page. Sigh.

The password is the usual one. If you don't know it, your membership's regular contact should be able to tell you. Otherwise e-mail the UKOUG office, which is quicker than attempting to crack it yourself (although as it's a regular dictionary word with no baroque capitalisation or numbers your rainbow tables won't break a sweat).

Not all the presentations are available. As speakers we had until Friday 7th December to upload the really final version of the slides so perhaps some people didn't make that deadline. Also, putting our slides online was not compulsory. It's a shame, because there are a couple of talks which I missed which are not there yet.

Tuesday, December 04, 2007

UKOUG 2007: Monday, Monday

The first session I chaired was Martin Widlake talking on "Avoiding Avoidable Disasters and Surviving Survivable Ones" He took as his sermon the text What could possibly go wrong? Well, for starters, there could be a problem with the USB drive so that the wireless slide controller doesn't work. Martin disappeared to try and find a replacement, which left me with the possibility of having to vamp for an hour if he didn't come back. Fortunately he did return, albeit without a replacement gadget, so his presentation was more static than useful usual [see Comments - Ed]. Fortunately the ideas were lively. I was particularly taken with the Formica Table. This would be a forum where "not bad" DBAs answered questions which fitted 95% of all scenarios; sort of an Oak Table For The Rest Of Us.

His main theme was that projects tend to fail because decision-makers aren't realistic in their risk assessments. So projects are planned on the basis of everything going right. Staff are told to work extra hours and weekends without any recognition that tired people make mistakes, and fixing mistakes costs time. Or additional people are recruited which just increases the number of communication channels, to the point that the overhead of keeping everybody in the loop becomes excessive.

Martin advises us to install disaster tolerant hardware, because it gives us more interesting problems to solve. Of course we shouldn't really switch to clustering technology just for the sake of it. But if we think we are eventually going to need RAC we should move to it now. That way we'll have learned to live with it and smoothed all the wrinkles before the technology has become critical to the system.

There were some entertaining war stories. One concerned a failed powerpack in a cluster. A sysadmin noticed and helpfully substituted the powerpack from another machine. When he connected the first node without a hitch but the second node promptly fried that power pack too. So he called an electrician. In order to get at the problem the electrician had to climb a ladder. The ladder slipped and the flailing electrician grabbed at the nearest thing to hand, the power rail, which collapsed and took out the leccy for the entire server room. We can't plan for such things, we can merely acknowledge that such stuff will happen.

The solutions are the obvious ones: realistic planning, smaller units of delivery, delivering something on a regular basis. One neat idea for solving the communication problem came from somebody who works for Unilever. They use Jabber to post small messages to a central message board, so everybody can see what everybody else is doing in real time. At last a business use for Twitter.

An Industrial Engineer's Approach to DBMS

Problems with the AV set-up seem to have become a theme in the sessions I've chaired. Cisco's Robyn Sands turned up with a Mac but without the requisite dongle which would allow her to plug it into the hall's projector. So she ended up having to drive her presentation from a PDF on a loaned Windows machine. She handled the transition to an unfamiliar OS but it was an unlucky start to her session.

Industrial engineering concerns itself with "design, improvement and installation of integrated systems of people, material, equipment and energy", which is a pretty good definition of the role of a DBA too. Industrial engineers focus on efficiency, value and methodology; they are the accountants of the engineering world. The application of IE methods to DBMS has the aim of producing a consistent application. For instance, every database in an organisation should be consistent: same installed components, same init parameters, same file locations, with transportable tablespaces and reusable code/scripts/methods. This results in safer and more flexible systems. The installation process is a flowchart; in effect the database instance is deployed as an appliance.

Another IE practice is value analysis. This says that a cost reduction adds value to a system just as much as adding a new feature. Which brings us to Statistical Process Control . Every process displays variability: there is controlled variability and uncontrolled variability. We need to use logging to track the elapsed time of our processes, and measure the degree of variability. Benchmarking is crucial because we need to define the normal system before we can spot abnormality. Abnormal variability falls into three categories:
  • special case;
  • trend;
  • excess variation.
Once we have explained the special cases we can file and forget them. Trends and excess variation both have to be investigated and fixed. The aim is achieving a consistent level of service rather than extreme performance. If you can accurately predict the Mean Response Time then you understand your system well.

Robyn described a project she had worked on which focused on systems reliability. The goal was to reduce or eliminate recurring issues with a view to reducing outages - and henceout-of-hours calls - to increase the amount of uninterrupted sleep for DBAs and developers. A worthy end. The problem is simply that a DBA or developer woken at three in the morning will apply the quickest possible fix to resolve the outage but there was no budget to fix the underlying problem when they got back into the office. Usually old code is the culprit. There's lots of kruft and multiple dependencies, which make the programs brittle. The project worked to identify the underlying causes of outages and fix them. The metric they used to monitor the project's success was the number of out-of-hours calls: over the course of the year these fell by orders of magnitude.

Robyn finished her presentation with some maxims:
  • Rules of thumbs are not heuristics.
  • Discipline and consistency lead to agility.
  • Reduce variation to improve performance.
  • No general model applies to all systems.
  • Understand what the business wants.
  • Model and benchmark your system accurately.
  • Understand the capabilities of you system.

The licensing round table

This event was billed as "Oracle's Right To Reply". Unfortunately there wasn't an Oracle representative present and even when one was rustled up they could only take away our observations to pass them on. This confirmed an observation from Rocela's Jason Pepper that Oracle employees are expressly forbidden from discussing licencing unless they are an account manager. This can lead to situations where advice from Support or Consulting leads to customers having exposure to increased licences.

The issues aired were the usual suspects. Why isn't partitioning part of the Enterprise Edition licence? Why aren't the management packs available for Standard Edition? Why isn't there a single, easily locatable document explaining pricing policy? How can we have sensible negotiations when the account managers keep changing? There was one area which was new to me. There is a recent initiative, the Customer Optimization Team, whose task is to rationalise a customer's licences. Somebody asked the pertinent question: what is the team's motivation - to get the best value for customer or to sell additional licences for all the things which the customer is using without adequate licences? Perhaps we'll get answers. I shall watch my inbox with bated breath.

Index compression

This was a bonus session I hadn't been meaning to attend but it was worthwhile. Philip Marshall from Joraph presented his research into the effects of compression, because these are not well documented in the manual. Compression works by storing the distinct values of the compressed columns and then linking to each instance of that value, which obviously imposes a small overhead per row. So the space saved on storing the compressed column is dependent on both the length of the column and the number of instances of those values. The overhead means that compressing an index with small columns which have high variability could result in very small savings or even a larger index.

Also we need to remember that the apparent space saving could be due to the act of rebuilding the index rather than compressing it. This matters because (as we all should know) the space savings from rebuilding can be quickly lost once the index is subjected to DML. Furthermore there is a cost associated with uncompressing an index when we query its table. This is can be quite expensive. The good news is that the CPU cost of uncompressing the columns is incurred by the index read only: so it is usually only a small slice of the whole query. Still it's a cost we should avoid paying if we aren't actually getting a compensating saving on space. Also compression does not result in more index blocks being cached. More blocks will be read in a single sweep, but the unused blocks will be quickly discarded.

I thought this presentation was a classic example of the Formica Table approach. A focused - dare I say compressed? - look at a feature which probably most of us have contemplated using at some time without really understanding the implications. It was the kind of presentation which might just as easily have been a white paper (I will certainly be downloading the presentation to get the two matrices Philip included) but there is already so much to read on the net that a paper would have just got lost.

11g for DBAs

This was the first of a tie-less Tom Kyte's two selections from the 11g chocolate box. I think the 11g features have been sufficiently rehearsed over the last few months that I have decided to skip the details. So here is just a list of the new features Tom thinks DBAs most need to pay the attention to.
  • Encrypted tablespaces
  • Active Dataguard
  • Real Application Testing
  • Enhancements to Data Pump (EXP and IMP are now deprecated)
  • Virtual columns
  • Enhancements to partitioning
  • Finer grained dependency tracking
  • the xml version of the alert log
  • invisible indexes
Of that list encrypted tablespaces, Active Dataguard, Real Application Testing and partitioning are (or require) chargeable extras. In the earlier round table Ronan kept reminding us that we must distinguish between licensing and pricing: we have to accept that Oracle has 47% of the database market so lots of CTOs and CFOs must think it offers value for money. Which is a fair point, but it is hard to see a compelling reason why a Standard Edition DBA would choose to upgrade. Actually the enhancements for developers are very attractive, but alas we don't carry as much sway.

One notable thing in Tom's presentation occurred when he was demonstrating the new INTERVAL operation for partitioning. The new partitions had a year of 2020, but the dates were supposed to be from this year. It turns out Tom had been tinkering with his demo code and had removed an explicit date conversation without checking the default date format. It's nice to know even the demi-gods fall prone to such things ;)

Monday, December 03, 2007

UKOUG2007: Make It Great

Most great cities have a quarter. Paris has its Latin quarter. Prague has the Jewish Quarter. Lankhmar has a Thieves Quarter. Birmingham has two quarters, the sparkling Jewellery Quarter and the slightly less sparkling Conference Quarter. No, really. It's a area dedicated to conferences, which means it has lots of hotel and lots of bars. No I don't know what bars have to do with conferences either ;)

Outside of the Conference Quarter Birmingham is gearing up for Christmas in a big way. The German market is doing a thriving trade in Gluhwein and Stollen. The central shopping area is thronged with people. Every structure is festooneed with lights and tinsel. And the weather is sufficiently chilly to make mulled wine and hot mince-pies seem like very good ideas.

Last night we had the UKOUG volunteers' meeting. Apparently this year is the biggest UKOUG conference yet: 2900 delegates registered, 600 speakers and 450 presentations (up from 350 last year). The conference is scaling out from the ICC and running some sessions in the Birmingham Rep theatre next door. But it's still relatively compact compared to OOW2K7 - I have already met several colleagues and ex-colleagues without trying, because there are fewer people - and obviously there is a greater preponderance of people from the UK than there was in San Francisco.

I have already attended the keynotes from Ian Smith and Tom Kyte. Ian was keen to stress the point that Oracle UK are responding to customer feedback, and in particular the responses to the UKOUG customer survey. He cited as evidence their initiatives for the JD Edwards and PeopleSoft communities, who felt bewildered by the sheer size of the Oracle range. Oracle is now "the world's leading enterprise software vendor", with the database just being one of its offerings (although it has 47% of the database market, more that #2 and #3 combined). Tom made a similar point. He skated through a list of twenty new or enhanced features out of the 400 in the 11g database release in thirty minutes. He said that when he joined Oracle in 1992 he could have covered the features of their entire product range in the same time. That's the scale of innovation in Oracle.

So the UKOUG conference is going to have to keep growing just to be able to keep up.

Friday, November 30, 2007

UKOUG 2007: My schedule

Although it's nice to go to both OpenWorld and the UKOUG conference it is difficult when they're both so close together. I haven't finished digesting the stuff from OOW2K7 yet. Also being part of the UKOUG committee means I have a few light pre-conference chores. Furthermore, I'm presenting, so there's all that to prepare. Plus the small matter of the day job, which doesn't really respond well to two separate absences so close together; that's why I'm not doing the full UKOUG this year (again). Anyway, enough whinging.

My standard take on the two conferences is that OpenWorld is more fun but you learn more at the UKOUG. This is because OpenWorld is usually dominated by Oracle's marketing and placed presentations (there is no open Call For Papers). The UKOUG is independent of Oracle and casts its net wider. Consequently you generally get a better range of speakers and topics, and a more honest appraisal of Oracle's products. However, OpenWorld07 was a valuable learning experience for me and - compared to previous years - quite low key in the hype department. So the UKOUG Conference has a tougher challenge this year.

One thing I do know is Birmingham in December will not be T-Shirt weather.

For those of us on the committee the conference starts at 16:15 on Sunday with the Volunteers' meeting. Followed by a few sherbets at a local watering-hole.

Monday, 3rd December 2007

12:10-13:10 - "Avoiding Avoidable Disasters and Surviving Survivable Ones" : Hall Exec 8

I felt like easing myself into the conference, so I chose my first session for its potential entertainment value (which isn't to say it won't be useful too). The speaker is the Martin Widlake and it's got another one of his arresting titles. I'm chairing this one.

13:30-14:15 - "TimesTen: Anatomy of an In-Memory Database" : Hall 11a

Having attended Susan Cheung's session at OOW2K7 I now know something about TimesTen, and its capabilities. So I know right now it's not appropriate for my current project, but I want to find out more because I think it has some interesting potential for future projects.

14:10-15:10 - "Oracle right to reply on Oracle Licensing" : Hall 1

This is a roundtable facilitated by Mr UKOUG himself, Ronan Miles. I have only just noticed that this is Oracle's response, I guess to the earlier session by from Rocela. I thought it was going to be a chance for us, the customers, to have a therapeutic bitch about Oracle's licensing policies. Apparently not. Given that it is a round table the agenda can be seized by the attendees, so ....

15:20-16:05 - "An Industrial Engineer's Approach to Oracle Management and Performance" : Hall 11b

Like most conferences these days the UKOUG is dominated by features and functionality rather than process. This session from Cisco's Robyn Sands is one of the exceptions. I'm chairing this one.

17:35-18:35 - "11g new features for DBAs" : Hall 1

I didn't get to see Tom Kyte strutting his technical stuff at OOW2K7, because I knew I was going to catch him here.

Monday rounds off with the Bloggers' drinks. Once again the redoubtable Mark Rittman has organised this and arranged for a cash injection from the UKOUG team. Nice one, Mark.

Tuesday, 4th December 2007

09:00-10:00 - "Oracle SQL Developer Latest Features" : Hall 5

I've downloaded the recent SQL Developer patch but haven't really used it much. Sue Harper should provide some inspiration. I'm chairing this one.

10:20-11:05 - "Harvesting the Advantages of a Database Centric Development Approach" : Hall 8a

I haven't heard Toon Koppelaars talk before but I have heard of him. The topic is a key area for me. I'm chairing this one.

11:15-12:15 - "11g new features for DevelopersSs" : Hall 1

A mad dash over to the main hall to see Tom Kyte give expression to the other side of his personality. I'm chairing this one.

13:30-14:15 - "The Duplicity of Duplicate Rows" : Hall 9

For the third year running Dr Hugh Darwen will be giving us a bracing dose of database theory. I'm chairing this one (which is the end of my chairing duties).

14:25-15:25 - "Understanding Statspack" : Hall 1

It's always good to hear Jonathan Lewis talk about anything.

17:30 - 20:00 - "COMMUNITY FOCUS PUBS"

I'll be reprazentin' the Development Engineering SIG at wherever we get allocated. Probably lumped in with the MAD and App Server SIGs. These are birds-of-a-feather things where you can meet people who have the same Oracle interests as you, so do come along.

Wednesday, 5th December 2007

10:40-11:25 - "Every Performance Problem Is One Of Two Things" : Hall 5

James Morle with a promising title. If it was followed by "Discuss" it could fit nicely into a Philosophy exam paper.

12:30-13:15 - "Modelling on the cheap" :

As Mike Yarwood used to, "And this is me." At the time of writing I am still working on the words but I have met all the other deadlines (just!) so I should be alright. However I am open to taking this one off-piste, so if you have some opinions or suggestions about alternatives to expensive CASE tools please bring them along.

I am leaving UKOUG on the Wednesday afternoon. I might attend another session or so, it depends how I feel.

Tuesday, November 27, 2007

UKOUG 2007: Chair registration - for real

The UKOUG conference website is now open for any delegate to volunteer as a session chair. Yes I know I said this once before but this time it's the truth. I have had the green light from Lauren so it must be okay !

Chairing sessions is not too onerous. We have to remind delegates to switch off mobile phones and fill in the critique forms. We also have to introduce the speakers. However most of them already have prepared their own introductions, so "without further ado" is a handy phrase. The key thing is to ensure the session finishes on time; this means giving the presenter discreet time signals, although most of them should be able to keep their sessions on track without prompting.

The value of volunteering is that you get a free pass to the conference when you chair six or more sessions. I know it is only a few days away, but if your boss is wavering this might just be the extra leverage you need. Sign up now!


At the time of posting the Conference site still displays a message saying you cannot register to chair sessions if you haven't had an invite from Lauren Edwards. Just ignore this and go ahead. I have e-mailed the UKOUG office and I hope the site will be amended soon.

Update again

The website has now be amended.

Wednesday, November 21, 2007

Simplicity is in the eye of the beholder

Laurent has posted a particularly succinct method for doing bitwise aggregations on his blog. I don't know what practical use it is, but that's another matter. What I liked was his cheeky sign-off, "It is that easy!". Which reminded me of a story from years ago, when I was still in the Ministry of Defence

I was sent on a CORAL 66 course. CORAL was a 3GL intended for real-time programming and it combined keywords in English with some very low level functionality, including the ability to flip individual bits. As a COBOL bunny with a History degree this was this first time I'd ever had to wrangle bitmasks and it made my brain hurt. I wasn't lucky enough to be sent on the subsequent advanced course (five weeks long, those were the days!) but these who did got to build either a missile guidance system or a safety system for a nuclear power station in their exercises. Even on the beginners' course the exercises were fairly hard going.

One involved translating Morse code into letters. I went for a data-intensive solution. I wrote an array with all the letters in the Morse alphabet. This was hard work but the actual processing was quite simple: start in the middle of the array and shift the index left or right depending upon whether the current character is a dot or a dash, halving the offset each time. When there's no further input the index points to the transmitted letter.

There was a very bright chap on the course. We often talk about junior programmers, but this guy was definitely junior, because he was a lance corporal and so everybody else on the course outranked him. Even me: as a civvy I had a notional rank of lieutenant. Most of the trainers were officers but there was a sergeant whose job it was to answer the questions of the non-comm students (they weren't allowed to ask officers questions or indeed initiate any conversation).

Anyway, this corporal's solution consist of a single very dense recursive algorithm which somehow spat out the right answer. However, his code contained just the one comment: "This algorithm is easy to understand so no further explanation is necessary." The trainers didn't understand his algorithm and he knew they wouldn't understand it and they knew that was why he had put the comment in. So they marked him down, for insubordination.

Monday, November 19, 2007

OOW2K7: Not so San Fran Psycho

Last year OpenWorld seemed crowded, manic and just generally too much. By contrast OpenWorld07 was pretty chilled. There was only one session where I had to wait in a really long queue. There was the one occasion when I was in a long line of people tramping into Moscone South as another long line of people going in the opposite direction to Moscone West. And only once was I in a queue for the restrooms - and that was during OTN Night, when the amount of beer on offer had generated, um, extra throughput. Certainly this morning, as I found myself pressed up against some bloke's wet raincoat in a crammed tube train, I really missed the light and space in San Francisco.

I think this calmness wasn't because there were fewer people - there were more - but because the OOW Team learned lots of lessons from last year. They couldn't really scale out - they are at the limits of what is possible in San Francisco - but they organised things to spread the load as evenly as possible and to avoid the need for too many people moving from building to building. You can figure out your own RAC/PQ metaphor :)

The other reason why this year was calmer was due to the lack of hype. I think many of us were expecting to be continually hit over the head with 11g. But really it was quite low key; lots of sessions on New Features but no real publicity blitz. this is partly to do with the early release of the software taking the edge off things but I also think the licensing issue has something to do with it. Many of the coolest features are chargeable extras to the Enterprise Edition. If you have already got customers who have to rip chips out of their servers to be able to afford to run Oracle it's hard to get a sympathetic audience for news about stuff that's going to cost a whole lot more.

The thing that disappointed me most this year was missing out on presenting in the Unconference. The Unconference sessions were scheduled on the hour. The nature of Unconference is to be different from the conference. But attending one Unconference session meant skipping two regular conference sessions. Wednesday was the only afternoon I had a free two-hour slot but I didn't nab the last remaining 3:00pm slot when I had the chance, so I really have only myself to blame.

The thing I liked best was the series of presentations from the Real World Performance group. I know before the conference I downplayed the significance of the sessions. So sue me. Meeting some old friends from cyberspace and putting faces to new ones was very pleasant too. And at least I didn't suffer too much from jet-lag.

The next OpenWorld is in September 2008. That's only ten months away.... I'm keeping my fingers crossed I get to go again.

OOW2K7: The Real World Performance Group

The Real World Performance Group is not Peter Gabriel jamming in a muddy field with a bunch of Senegalese drummers. It is a small unit on Oracle's development side, dedicated to understanding database server performance and investigating customers' problems. I included Andrew Holdworth's session on Database performance trends in my schedule having never heard of the group. After that presentation I went to the Group's Round Table session and Andrew's other session the following day, neither of which I had scheduled. He was that good.

The whole series of sessions switched on a number of light bulbs, joining up some of the dots from earlier presentations I had attended. The resulting illumination made me realise I had spent the a large chunk of the last six months barking up the wrong tree. Basically, the sort of session which justifies the whole idea of going to conferences. That's why I've forgone my usual pithy reports in favour of a full-blown dump of my notes from the three sessions. Where the same topic came up more than once I have aggegated all the material in the section which seems most appropriate.

Database performance trends

Andrew Holdsworth opened his session with the observation that most sub-optimal applications are running well enough, because of improvements in CPU speeds and the availability of cheap RAM, which means we can have bigger DB caches. However, in larger databases disk I/O becomes more significant: an SQL statement which requires disk reads for one percent of the data will run one hundred times slower than a query which can be wholly satisfied from memory. Interconnects to remote RAC clusters perform much better than disk I/O.

Most database applications out there are very small. Frequently there are multiple databases on a single server (the most he has seen was forty-nine databases on one host). This is not a problem because the applications are generally used quite lightly, with barely one SQL statement running on the box at any given time. The main challenge is to prevent a single rogue statement stealing all the resources from the other applications. However, there are also a relative handful of really big systems which are rigorously designed and with a single database deployed per host. These applications are owned by significant Oracle customers and have high visibility within the company. Consequently such databases drive the direction of Oracle's development program. These are the sort of databases which demand tuning, because their performance issues cannot be resolved simply by faster CPUs.

The basic practices for good performances are:
  1. Spend time on data modelling. It is easier to tune against good schemas; they are simpler and less error-prone.
  2. Pay attention to cursor and session management, Use the middle tier to protect the database.
  3. Do proper capacity planning. If possible, use intelligent extrapolation from existing systems. Either way, test and validate.
  4. Use database features correctly and appropriately. Always start with the default values and change them only with good reason. (This point was expanded in another session.
  5. Design for predictable perform. Strive for SQL plan stability, but be prepared for dealing with increasing data volumes, which may demand new execution plans.
Andrew devoted a lot of time discussing benchmarks, as the group spend a lot of time running them. Of TPC benchmarks he asked, "Who likes TPC benchmarks? Who believes them?" A vast amount of work goes into them: the time spent optimizing a server for a TPC benchmark is an order of magnitude greater than a customer would spend on tuning their database. Price performance figures are suspect. There is no redundancy (because in the context of a benchmark that inflates the cost without offering any performance advantage), so the configurations are unreliable even dangerous.

They also see a lot of client benchmarks. About nine in ten are dysfunctional: ill-conceived, incomplete and unrealistic. They are time-consuming and expensive to run, and stressful for all concerned: if you inflict a poor benchmark on the RWPG team be sure that they will respond by picking on your cluelessness. Common failings are:
  • an invalid mix of transaction types;
  • no "think time" for the emulated users;
  • poor data sets;
  • in particular, the use of random numbers to seed the data sets so the tests are not repeatable;
  • not allowing for the resources used by the application server;
  • data warehouse tests which fire off 2000 concurrent parallel queries on a twenty CPU box;
  • data warehouse tests which don't distinguish between small, medium or large queries.
His advice is to run it in-house before sharing it with Oracle Support.

His final point was about priorities. The four layers of a database system are, in increasing order of expense: server, interconnects, I/O fabric, storage. Often the cheapest element - the server - is chosen first, and insufficient attention is paid to balancing all the hardware. Storage and network technology has not kept up with CPU speeds: by comparison with them server costs are a rounding error.

The round table

Members of the audience wrote questions on index cards which were passed to the panel. The panel filtered questions according to some strict rules of engagement. The topics had to be of relevance to the largest possible chunk of the audience. On-stage debugging was expressly forbidden. There was some overlap with the preceding and subsequent presentations, so here are a few points which weren't covered in depth elsewhere.

How much SGA is too much? There is no upper bound. Just don't exceed the physical RAM. With DSS we want sorts and joins to write as little as possible to the temporary tablespace so set the PGA to a high value. Aim to have 4-8GB of SGA per core.

The 11g resultset cache uses CPU and RAM. This offers a more favourable price performance than disks, which are just getting bigger without getting better.

The AUTO_SAMPLE_SIZE option with DBMS_STATS is an iterative process. So using it is likely to produce a more accurate sample size than us just picking a percentage value, as we tend not to validate our choice.

There is a problem with bind variable peeking, if the first instances of a query happens to use unrepresentative values. We can solve this through seeding the library cache at start-up by running our key queries with representative bind values.

The answer is always in the data.

Making appropriate use of database technology

The team's original idea for this session had been to cover every database feature. Realising that would have required a four day course (at least) in the end Andrew Holdsworth decided to focus on three key areas:
Parallel Query - lots of people pay for the licences but don't use it properly.
The Optimizer - the part of the database customers find most frustrating.
Session and Connection management - a new but increasing problem.

The code for parallel query has improved considerably over the years but customers are not using it because CPUs have got so fast. The trend is to substitute processor brawn for good design. This is not a sensible strategy with DSS databases, because the chance of resource contention is much greater if the queries take too long to run.

Making Parallel Query work well places a premium on capacity planning. There has to be sufficient bandwidth to support the required throughput down the entire hardware stack. If the server has only got two 100MB HBAs PQ is impossible. Don't just focus on the server; CPUs are the cheap bit ... apart from the licences (AH's joke).

A thumb nail guide to capacity planning. To support a throughput 100MB per second per core (based on the de facto standard unit of a server with four quad-core processors):

4 CPUs x 4 cores x 100MB = 1.6GB of I/O
4 CPUs x 4 cores x 100MB = 1.6GB of interconnect
# of disks = 1.6GB/s / 20MB/s per disk = 80 disks

Note: this doesn't include redundancy for high availability (data warehouses are not allowed to be down anymore).

"Death by indexes". Indexes are great for OLTP but not for DSS which generally rely on aggregation which in turn requires large amounts of data. Indexed reads which accesses only 0.1% of the data performs worse than a full table scan. Lots of indexes will lead to low CPU utilisation. A better approach is to use PQ to haul in large amounts of data quickly and use the powerful CPUs to winnow it: 5ms is a lot of computation time. Partitioning the object to take advantage of query pruning will increase the percentage of relevant rows retrieved, which will further reduce the need for indexes. Use hash partitions for fast joining.

Designing for PQ:
  • Take the database's default for the Degree of Parallelism.
  • Use the resource manage to define the user's DoP.
  • Make the number of hash partitions a power of two.
  • Also, make it a multiple of the DoP.
  • Allow for the concomitant impact on loading, data dictionary management, stats gathering.

The Optimizer is the most maligned, most misunderstood and most misused part of the database. This is in no small part due to Oracle's failure to explain and document it properly. Consequently people tune by Google or witchcraft, which usually means setting underscore parameters. This is a bad idea. Setting init.ora parameters has a global impact: fix one query and break three others. We're setting the values blindly - often underscore parameters have magic values. Most bad execution plans are due to bad stats. The quality of the stats is under our control. We know our data. Whereas the execution of the optimizer is under Oracle's control and it is rocket science. Tweaking the settings of underscore parameters is fiddling with the bit we don't understand instead of fixing the bit we do.

Optimizer basics
  • Run with the default parameter values.
  • Don't make global changes.
  • Hints do not guarantee stability over time.
  • Get the statistics gathering right.
  • Accept change will happen.
  • TEST!
  • When upgrading, accept that your database's previous good performance may just have been luck rather than a testament to your tuning expertise ;)

Gathering stats: start with default values. AUTO everything. Remember, the aim is to get good cardinality estimates.

Table stats are easy. Don't sweat them. An increase in 10% of the rows is unlikely to mean your stats really are stale: think order of magnitude changes.

Columns stats are the key to good performance - size and number of columns, histograms, indexes. Use your knowledge of the data model to figure out whether you have uniform or skewed distributions. Columns with uniform distributions don't need histograms; watch out for bind variable peeking with skewed data. Minimum and maximum values are the ones most susceptible to change (especially for columns populated by sequences) so maybe refreshing the high and low values after each data load is sufficient.

The key to query tuning is to check the correctness of the cardinality estimates. Use DBMS_XPLAN.DISPLAY_CURSOR to see the estimated cardinality alongside the actual cardinality. No need for a 10053 trace (which doesn't give us the real cardinality values anyway).

Connections stress servers, especially SMP, causing latch contention and using lots of CPU. There is a tendency for Java and .Net programmers to be very bad at managing sessions (and cursors). This is because their memory model means they tend to take stuff and not give it back. Connection storms occur when a connection pool floods the server. Connection storms are caused by a problem in the database server which prevents it from satisfying the app server's demands for new connections The app server responds by spawning further sessions. This can escalate quickly to tens of thousands of sessions, which will kill the database server.

Best practice for connection management:
  • 5 - 10 connections per core (I/O dependent);
  • minimise logons and logoffs;
  • avoid parsing;
  • set parameter for minimum and maximum connections per server to the same value.

Final thoughts

I was a bit concerned that I found so much of value in these sessions. Perhaps I had been living in a cave for too long? However, I met up with Doug Burns afterwards and he said that he was going to have rewrite his presentations and whitepapers. As he is Mr Parallel Query I found this quite reassuring.

Andrew Holdsworth observed, when talking about the recommendations for parallel query, we can try this stuff for ourselves but they do it for a living.

Thursday, November 15, 2007

OOW2K7: Billy Joel - A clarification

An anonymous commenter, as they are entitled to, has objected to my opinions on Billy Joel. I don't think there's any point in debating musical taste. We can't blame somebody for the music they enjoy because it works too low down in the limbic system. They are not a bad person just because they like Billy Joel; I am not a bad person just because I don't.

The thing is, Billy Joel, along with Abba, dominated the airwaves in my formative years. Like the Swedes he is a consummate crafter of earworms. Simply thinking about Billy Joel means I'm going to have the line "Working too hard will give me a heart attack (ack, ack, ack)" running around my brain for ages. I'm just hoping that reciting "running around my brain" will derail Joel in favour of Dillinger. All together now: "A knife a fork a bottle and a cork/That's the way we spell New York".

Right on.

Push me pull you

One of the examples of bad design Donald Norman discusses in his seminal book The Design of Everday Objects is the door with a handle and a sign saying Push. If the door is meant to be pushed it should have a plate instead of a handle. A plate gives the affordance of pushing whilst a handle gives the affordance of pulling. Hence the need for a Push sign if the door has a handle instead of a plate.

I was reminded of this in the Lori's Diner next to the King George Hotel. Its entrance door swings both ways. Each side of the door has a handle and a Push sign. I still can't quite decide whether that's design genius or design imbecility.

Wednesday, November 14, 2007

OOW2K7: Simply cowbell

A theme emerges. Oracle announced Oracle VM Server on Monday. Today in his keynote Jonathan Schwartz announced Sun xVM Server. Do two similar products make a trend? It will have to do.

The bulk of Jonathan's keynote was devoted to the proposition that giving away things made good business sense. The argument is, "Developers don't buy things, they join things". Contributing lots of stuff to open source initiatives fosters communities which in turn leads to adoption and that in turn generates demand for the things which Sun sells. Certainly Microsoft has been very successful by treating developers well. I think the difference is that Microsoft still makes a lot of money out of software and operating systems; they really don't care that much about the hardware. Whereas Sun's main revenue stream comes from servers. Java and OpenSolaris and all the other giveaways have been taken up with great enthusiasm; you just need to look at the exhibition hall to understand that developers like free stuff. And of course Java in particularly has spurred the growth of the internet and generated demand for servers and network kit, but that demand is just as likely to be met by Dell or HP as Sun.

Jonathan had a surprize guest: Michael Dell. This was by way of announcing that Dell and Sun would now be partnering. To symbolise the partnership Michael Dell brought Jonathan a Dell T-Shirt. Afterwards, Jonathan confided that backstage Michael told him the T-Shirt was XL, because he was "a CEO now". Jonathan joked, "That's my hat-size not my T-Shirt size."

Larry's keynote was good, a polished performance. He did two short presentations. The first was about Unbreakable Linux and the new VM Server. These are both aimed at promoting Linux. There are now over fifteen hundred enterprises using Oracle's Linux support. The VM server is Linux but it's a new Linux, not the Red Hat code-base which underpins Unbreakable Linux. So perhaps this is the start of the much anticipated Oracle distro. The other presentation was about Fusion Apps. The first Fusion Apps will be released next year. From now on all new Apps will be Fusion - SOA, built-in BI and with SaaS as an option. But it's going to be an evolutionary process. There is no date for the complete Fusion-ization of E-Business Suite.

After that there was a demonstration of the first Fusion App: Sales Force Automation. This is a suite of products - Sales Prospector, Sales Reference and Sales Tools - which will make salespeople more productive. These products leverage social networking principles to facilitate the sharing of information, ideas and documents between sales staff. So the first tools in the new framework are aimed at helping sales people sell more: how very Oracle. The actual tools are fantastically slick. Total eye-candy. The Fusion developers have done a marvellous job with the AJAX widgets. In fact the whole thing looks so lovely I predict a sharp drop in productivity as the sales people spend all day stroking their way through their cache of PowerPoint slides.

Larry then took questions. There were several questions about Fusion Apps and a couple about licencing. Did you know Oracle's preferred licencing approach was to arrange unlimited usage deals with their customers? Me neither, they sure kept that one quiet. Somebody asked, apropos of the Sunday Night Live event, whether Larry had more fun as a start-up or now. The answer was that he much prefers his current situation. Which just isn't fair: he earns gobs of money and enjoys his job more than ever. He only lost his cool once, with a person who attempted to suggest that the profit from Microsoft sales went to the Gates Foundation. Larry cut him off. After acknowledging Bill's generosity he said, "The Gates Foundation is fully funded. If you buy a copy of Microsoft Word the money doesn't go to a person in Peru." Not unless Steve Ballmer has a holiday villa in Lima.

There were two odd things about Larry's session. The first was that Billy Joel came on as the warm-up man. Luckily he hadn't brought his piano - Larry was obviously itching to do his karaoke Uptown Girl. The second oddity was the thing which Larry didn't talk about. The database. Not a peep about the 11g release. So obviously he agrees with me about it being old news. No mention of the rumoured either.

Before Jonathan Schwartz's keynote there was a little piece of Web Two-Point-Noughtery. The screen posed a question and people could SMS their answers. Here are a selection of the answers to the question If I could develop the next killer app it would be:
  • Simple
  • Stable
  • Fast
  • Something to do my laundry
  • Cat cams
  • Portable
  • Mine
  • In the conservatory with a candlestick
  • Simply cowbell
  • Integrate Microsoft and Oracle well
  • Sharks with frikken lasers on their heads
  • Something that takes text messages and projects them on a screen


I have now heard that the patch is coming. Apparently its schedule got bumped to clear the path for the 11g launch. Safe harbour and all that, but it may contain backports of some (minor) 11g functionality as well as bug fixes.

Oh, and if you thought my coverage of Larry's keynote was jaundiced you really ought to read The Register's report. Slammin'!

OOW2K7: Like, er, Day Two Man

Tuesday at OpenWorld 2007 for me opened with Tom Kyte's keynote. This was something different from Tom. Instead of talking SQL he talked about innovative ideas and the creative thinking process. His key point was that we have to recognise the danger of educated incapacity: too much knowledge can blind us to the existence of certain solutions. The answer is not to know nothing but to use our knowledge wisely. In particular we need to distinguish between what we know that really is true and what we think we know is true but which is in fact assumption or prejudice on our part.

One of the examples of creative thinking he used was an example of tuning from Craig Shallamar. The problem was in a stock control application. The workers would run a report which generated a picklist; the picklist was produced on the printer next to the terminal. They took the picklist and went to the warehouse. The problem was the report took three minutes to run. Craig spent a lot of time tuning the query and shaved about ten percent off the runtime. But ten percent of three minutes still means a long time for the workers to be waiting for the picklist to be printed. Then he noticed that the printer was on a network. So he had it moved to the warehouse. The workers ran the report and crossed over to the warehouse; by the time they had arrived the picklist had been printed. That's fast!

Tom finished up with a lament about the state of developer-DBA relations. Too often DBAs seemed to spend their time trying to prevent developers from doing things whilst the developers spend their days inventing ways to circumvent the DBA. This is a terrible waste of creativity.

Ray Roccaforte opened his presentation on Business Intelligence and Data Warehousing in 11g with a similar philosophical perspective. Indeed, a cosmic question. Given Moore's Law and given the pervasiveness of computers in all aspects of our lives why isn't the quality of our lives doubling every eighteen months? More mundanely, why isn't the productivity of our IT departments doubling every eighteen months?

His answer is that data volumes are growing even faster than Moore's Law. Yahoo had 100TB of data in 2005; two years on it has 250TB. And as disk speeds are lagging behind computing is actually slowing down. Oracle's strategy to deal with this:
  • to scale to any size;
  • put analytics in the database;
  • put ELT in the database.

Incidentally that last point is not a typo: the preferred term is now Extract Load Transform because most of the transformation is done in the database.

Ray showed us a screen filled with 11g New Features for BI and DW - far more than he could cover in the session, so he just ran through some key points. There are further enhancements to partitioning, including the much sought after INTERVAL option and also the REF partition, which equi-partitions parent and child tables on the basis of foreign key values. There is the much reported SQL Resultset caching. There are enhancements to automatic statistics gathering, which will monitor the SQL statements run and gather stats on correlations between frequently queried sets of columns.

The big announcement is probably the changes to the OLAP cube. BI is dominated by aggregation queries. The classical solution is to build Materialized Views, but it is often imposible to build enough MVs, let alone manage and refresh huge numbers of them. In 11g the optimizer will treat an OLAP cube as a Materialized View, and will use query rewrite to access data in the cube, just like with MVs now. This means there is only one object to manage and the refresh window is considerably reduced. The OLAP cube can be interrogated using straight SQL statements, so advanced analytic queries can be added to any application.

Finally Oracle Warehouse Builder is included in the database at no extra cost, although one of the big draws, the Data Quality pack, will be a licenceable option.

Then it was off to the Oracle Magazine lunch. I got invited because I'm on the editorial advisory board. I felt guilty about it because I hadn't been asked to do anything in the last year but Tom Haunert was very magnanimous: it was his fault for not asking the board to do things. I was at the same table as Dr Paul Dorsey, which could have been embarrassing but I guess he doesn't read my blog. Also there was Lonneke Dikmans, the SOA Developer of the Year. Fortunately she and Paul have reached a good-tempered rapprochement on the merits or otherwise of SOA. I had an awkward moment with Tom Clark, CIO of PCA because I was out of practice at convesing with guys in suits. However, he turned out to be a really nice guy, and possibly the most unlikely fellow vegetarian I've met in a long time. I also chattered with Patrick Wolf and Jon Waldron.

Alas I missed the XML presentation but I did get to hear Donald Feinberg, VP and Distinguished Analyst at Gartner on the future challenges of OLTP and Data Warehousing for DBMS tools. This was the first time I had attended a Gartner presentation and it was really interesting. He specifically avoided mentioning any vendors because Gartner has to be neutral and this was an Oracle conference. But much of what he said chimed with other presentations I've attended: management of large data volumes, the importance of data quality, the physics of disk speeds, the importance of analytics. In fact I'm going to have to write up this one separately, once I've had some more time to think about it.

I rounded off the day by chilling out in the OTN Lounge on the third floor of Moscone West. I think the crew have finally found the right formula for the lounge as there's always lots of people there. Steven Feuerstein was doing a Meet The Experts session and I lurked at the back. I even managed to answer a question which Steven couldn't. This only happens one every other year so I claim bragging rights. Somebody asked what was the best way to pivot a table. Steven he wasn't sure but he would probably write some PL/SQL. So I pointed out that 11g had a PIVOT function. He laughed and said he didn't do SQL. Of course not, he's cured his addiction.

Then it was down off to the Thirsty Bear for the Bloggers' Meetup. As well as us old lags there were lots of new faces, which is an indication of just how the Oracle blogging community has grown over the last twelve months. Drink was taken and I think everybody had a good time. Thanks to Mark Rittman and OTN's Vikki Lira for organising it.

Tuesday, November 13, 2007

OOW2K7: Non-fake tales from San Francisco

One of the things at OpenWorld which continues to astonish me is the rudeness of people in the audience. At the UKOUG we start each session with a reminder to switch off mobile phones, or at least silence them. Here, people not only have their mobile phones on, they actually take calls whilst the presenter is talking. This is not an isolated event: it has already occurred in three out of the seven sessions I've attended so far. Also people conduct conversations during the presentation, and not in whispers either. This is particularly strange given that, outside of the conference hall, Americans are generally far more courteous and better mannered than most English people these days.

Another thing which is shocking to the out-of-town visitor is the sheer number of beggars on the street. One guy I saw yesterday was holding a sign which said I need a girlfriend. I'm not quite sure what he was expecting to be dropped in his cup. There was another guy playing a drumset improvised out of buckets and plastic containers whose sign included his e-mail address. I didn't catch whether it had a PayPal number too.

This year OpenWorld is a green conference. This primarily means that we could choose to have the conference directory as a thick book or a lightweight USB drive. The catch is we need to have a device which takes a USB drive. For those of us who left our laptop at home the dead-tree option is the appropriate technological choice. However I have seen a couple of people attempt to plug their OOW pen drive into a laptop in the Conference internet villages; unsuccessfully, because these babies are totally locked down. The other interesting green initiative is the opportunity to pre-order an Oracle-branded messenger bag which will be made out of recycled conference material. This item will be even more fashionable than that Sainsburys carrier bag because of the absolute coolness of the Oracle logo.

The OTN team are giving away T-Shirts in the OTN lounge. The front features a SQL statement:

SELECT Java, Linux, PHP, DotNet, Python, Ruby
WHERE Self BETWEEN 'Newbie AND 'Expert'

So, apart from the fact that it doesn't mention PL/SQL and the fact that the statement is invalid due to the missing quote, it's a pretty good attempt.

Gavin Newsom, the mayor of San Francisco would like us all to explore more of the city than a two-block radius of the Moscone Centre:
"You can continue on the F line to Pier 45, walk into the back and experience the fish processing, which is totally frenetic and interesting. Why not see and experience another side, which is the fact that San Francisco remains the largest fish processing center on the West Coast."

Why not indeed.

OOW2K7: Steven Feuerstein - A clarification

Whilst I'm in the mood for clarification I suppose I should make clear that Steven Feuerstein did not threaten me with a steak knife. He pretended to threaten me with a steak knife and I exaggerated for comic effect.

OOW2K7: Doug Burns - A clarification

In my post about the ACEs' dinner I described Doug Burns as being extremely jet-lagged. Some people have interpreted this as meaning Doug had taken too much drink. This is not the case. Doug really is jet-lagged. Honest he is. Anytime you see Doug with a beer in his hand you can be sure it's definitely for medicinal purposes only.

OOW2K7: More about Monday

Bryn Llewellyn's talk overran slightly - there is that much which is new about PL/SQL in 11g - so I only just made it to the Moscone Center for Nitin Vengurlekar's talk on Back-of-the-envelope Database Storage Design. The main thing I took from this talk was that we need to have big envelopes. It was a useful overview for me, as I knew nothing about this area of IT systems. For instance, I learned the difference between throughput and good-put; throughput is the total amount of network traffic whereas good-put is the amount of useful bits transferred i.e. throughput minus network protocol, retransmitted packets, etc. I now know that we can specify our requirements in either MBytes/second or IOPS but they are mutually exclusive. What I don't know is in what scenarios I should be using IOPS rather than MBytes. Transactions/s is a useless metric because it can mean many things to us database practitioners and it means nothing to storage dudes. The other particularly useful part of this presentation was a break down of all the layers of the I/O stack, identifying all of the components which could be a choke-point. A system's I/O is only as fast as its slowest component.

I had a quick wander through the Exhibition hall. I didn't pick up much in the way of loot but AccentureEDS has a really useless but potentially time-consuming geegaw: it's a sticky plasticky bundle of tentacles on a elasticated string - sort of a cross between a yoyo and a primary-coloured Cthulu.

The last session of Monday was Juan Loaiza's talk on Scalability and Performance enhancements in 11g. He opened with some recent benchmarks. He started with the bad boys - 4 million transactions per minute , 1 million physical I/O per second. He said they like these sorts of benchmarks because it's the only time they get to play with such enormous systems; the above mentioned benchmark used 2TB of RAM (which costs about $4m - still a lot of money even at current exchange rates). The interesting benchmarks are the price performance ones: the new ranges of Intel and AMD chips means that Oracle is now outperforming Microsoft in the low-end market as well as the high-end.

The point about the large system benchmarks is that they are the systems of the future. By 2010 Oracle predict there will be at least one 1 PetaByte database out there, at least one 1000 node RAC system and at least one system with 1 TeraByte of RAM. These huge systems are being driven by a number of different trends: government initiatives in healthcare and law enforcement; large internet retailers; and scientific projects in areas like genetic research and particle physics.

The challenge for Oracle is to make such huge resources work as well as smaller resources. For instance, we all know that disks are getting bigger but that their performance is not keeping up. We will soon have 1TB disks, which will be pretty much like tape drives. This is why table compression in 11g is so important. By compressing tables we effectively increase the performance of disks considerably (more data retrieved by each disk read). Furthermore the database can natively interpret the compressed data. Obviously there is a DML overhead but it is a fraction of the gains from improved reads. The problems inherent in backing up a 1PB database are obvious. This is why Oracle has written its own backup utility: the database can use its knowledge of its own state to only backup stuff that needs to be backed up. This can be a lot faster than blindly copying all the system files at the OS level.

It is also obvious why Oracle is still committed to per core licensing. The charges for a 1000 CPU cluster (Enterprise Edition plus RAC plus Partitioning plus all sorts of other 11g goodies) would add up to a hefty chunk of change.

In the Q&A somebody observed that Oracle's own cluster still uses Sun E25K and asked Juan when it is still appropriate to go for SMPs rather than blade clusters. Juan replied that it was a matter of comfort. SMP is a very mature technology but RAC is now mature; leading edge rather than bleeding edge. He did say that he didn't think Oracle would ever purchase another SMP ever again. Larry won't pay for them.

I bumped into Eddie Awad a couple of times yesterday and I kept asking him, "Are you Twittering this?" So at the OTN Night he pulled out his iPhone - Eddie just had to be an earlier adopter - and Twittered Dan Norris. Lo! a few minutes later Dan had located us outside the Jeopardy room. So it can be useful and I will stop being sceptical. OTN night was pretty much the same as last year, only with slightly fewer exotic dancing girls and the introduction of some people from the local D&D guild wandering about on stilts. I'm sure it made sense to somebody. Doug Burns introduced me to Kurt "DUDE" Van Meerbeck and his wife but it was really too noisy to have a proper conversation.

Monday, November 12, 2007

OOW2K7: First sessions

I'm afraid I skipped the opening keynote of the conference in favour of breakfast with Doug Burns and Dmitri Geilis. It was almost certainly more entertaining and more educational.

My first session of OpenWorld was Dr Paul Dorsey on the "Thick" database. That's thick as in full of business logic rather than stupid. Quite why it's not "rich" database like thin client vs rich client I don't know. Paul said his motivation was to annoy people, which was promising. Unfortunately the sort of people he wanted to annoy - OO programmers, J2EE and .NET middle tier evangelists and open source bigots - weren't in the audience. This is an Oracle conference, so pretty much everybody is going to be signed up at least to the idea of proprietary framework , and most of them will be in favour of putting most if not all the business logic in PL/SQL. Although, as he acidly observed, OTN has drunk the SOA Kool-Aid.

The point being that UI standards change far more often than the database does. So it is acually more important to be UI independent than database independent. He also said that ApEx is not - out of the box - a thick database implementation. It is a product and it is possible to build ApEx applications in a dumb fashion. However, the cost of being dumb in ApEx is less than being dumb in Java.

The second half of the presentation was devoted to an overview of techniques for building thick database APIs. This covered table() functions, Views with Instead-Of Triggers, PL/SQL collections and bulk operations. I'm afraid I found this a bit of a waste of time because surely this is all old hat? But apparently not. At least one person in the audience didn't know about bulk exception handling and so is obviously not using FORALL statements.

Funnily enough, the next session also dealt with building PL/SQL wrappers for SQL, but coming from a different angle. This was Steven Feuerstein, who was seeking to cure our addiction to writing SQL with his "code therapy". His basic arguement is that we should not proliferate SQL statements throughout our programs. Rather we should encapsulate SQL in reusable functions which have proper exception handling and well-structured naming conventions. We should think of SQL in PL/SQL functions as a service which is provided to the actual application code.

We should generate these functions because that's the only way to ensure that proper standards are enforced. Especially in naming conventions, which must be adhered to if the functions are to be reused. I would say that this presentation was an advertisement for the Quest CodeGen utility, except that they're giving it away for free. Steven had one phrase I really liked: deja vu coding. This is when we find ourselves coding the same thing we coded last week. That's a definite sign we have a piece of functionality we really ought to extract to an API for reuse. Of course, the OO crowd, especially the XP lot, have been banging this particular drum for years.

Following on from Steven was Bryn Llewellyn with his session on PL/SQL New Features in 11g. He kicked off with the fine grained dependency tracking: I'm not sure I would have started there, as it's not exactly the most exciting new feature. But I liked his explanation of why Oracle had the validation checking in the first place: it's in lieu of a make file. Real Native Compilation is much more interesting. Until 10g Oracle had "outsourced" the compilation to a C compiler on the OS, because it was easier for Oracle. Unfortunately many sites wouldn't install a C compiler on their production box, so they couldn't take advantage of the performance benefits. I currently work on such a system: the security policy forbids it. Yes this does mean our external C procs are probably sub-optimally compiled but you can't argue with the security policy. So this feature will be very useful to us.

There's also Intra-Unit inlining, which basically counteracts the performance of Feuerstein-approved modularity by duplicating the code from called functions at the compile stage. Result set caching has already been widely talked about. As Bryn says, "You can never have too many caches." His test cases went from ~1000 milliseconds to 0 milliseconds. But apparently not every query will improve by a factor of infinity. The results are guaranteed to be correct in every case but this means that the benefits of caching results may not always be high. Obviously there's a cost to invalidating the cache. What else? DBMS_SQL is here to stay, but only for Method 4 queries. That is when we don't know either the number of defines (selected columns) or the number of binds. For everything else we can now use EXECUTE IMMEDIATE. In case you're wondering, there is no Method 1, 2 or 3.

They are closing the internet village now, so it's time to sign off.

OOW2K7: What's the story?

The opening salvo in the OpenWorld 2007 campaign was the Sunday Night keynote. When I arrived at Moscone Center at 5:30 the queue for stretched around the block. Literally round the block. I'm sure I wasn't the only person asking whether we were all mad. At least it wasn't raining and at least we all got in. Actually it turned out to be a good thing to be one of the last people into Hall D, because it meant I had to sit through considerably less Billy Joel than the early birds must have had to endure.

Most of the event was devoted to Larry's account of the history of Oracle's thirty years. I'm sure he's told these stories a thousand times already but many of them were new to me. The best story was John Kemp, the first Chief Finance Officer, who used to deliver pizza to the Oracle offices but who was studying Accountancy at Berkley. Also the famous first sale to the CIA was for a product, Oracle version 2.0 (there was no version 1.0) which hadn't been written yet. How times have changed. Unfortunately the speech became less focused and eventually turned into a recitation of people who joined the company over the years.

The rest of the session consisted of some slightly desperate sketches from the cast of Saturday Night Live. There was also a worthy presentation on Oracle's global responsibility programme. There are a number of initiatives in the education arena (mainly encouraging schools and students to use Oracle software). Also, Oracle employees do lots of volunteering. The event was finished off by a pub rock band composed of Oracle employees and customers, fronted by the Senior Vice President for Alliances, playing a song about Oracle rocking for ever. The band was called Eardriver but I'm afraid Hallclearer would have been more appropriate.

Of course I had to leave anyway because of the ACEs' dinner. The Fly Trap is a nice place. I got to drink Pinot Noir from Francis Ford Coppola's vineyard and the food was good. The one problem with these occasions is that the restaurant layout prevents you circulating too much, so you end up talking to a handful of people. Eddie Awad asked my advice about user groups; I'm not sure I completely answered his question but it has stimulated some thoughts which I may blog later. Doug Burns was extremely jet-lagged; you may think that's a euphemism, I couldn't possibly comment. We all got issued with our new Oracle ACE fleeces, which are actually gilets with a large Ace of Spades symbol on the back. I don't know whether the ACE directors' fleeces have the tiara as well. Inevitably Mogens Norgaard wore his for the entire evening. Steven Feuerstein threatened me with a steak knife. These liberal peaceniks are all the same.

So, what is the story?

Is there a big theme? Beats me. One of the Oracle people I spoke to last night said they all don't know what's going on. If there is a big announcement it will be just as much of a surprise to the employees as to the rest of us. One of the ACE Directors who had spent the day in the Directors' briefing sessions said they had been told that there would be no mention of features currently under development. Apparently all that stuff is being saved up for next year's OpenWorld when Oracle are going to make a big deal out of 11gR2.

In the afternoon I was wondering through the Yerba Buena Gardens. A young chap on a bench clocked my OpenWorld pass. Jerking his head at the big tent which is blocking off Howard Street he asked me, "Do you know what the party's about?" "Oracle," I said. "Database software." He turned to his friends and told them, "Computer shit."