Wednesday, February 13, 2008

Data modelling and other dying arts

Martin Widlake sent me an e-mail after last months UKOUG Unix SIG:
"At my presentation at the UKOUG Unix SIG yesterday I suggested that formal design was almost dead, replaced with organic design and asked if anyone still used ERDs. No one did. Not one.
This kind of bothered me. Does it mean that just ERDs are dead? Or that a room full of DBAs is a room full of people who do not do systems design (I am just as shocked by that if it is true)? Or maybe formal design is a dead concept."
There's two points here. The first, the utter lack of DBAs who do data modelling tasks, doesn't surprise me in the slightest. This is the nature of the modern DBA's job. Production DBAs look after live systems: they don't design them. Increasingly people are becoming DBAs straight out of college. These guys have never worked as developers and probably never will. The older geezers, who followed the more traditional route of starting out as programmers and progressing into the DBA role, probably haven't worked on development projects in years.

Also the IT landscape has changed. Even ten years ago many organisations had one or at most a handful of databases. It was possible for a DBA to be responsible for a single database; knowing its purpose and its value to their organisation was part of the job description. These days it is not uncommon to find DBAs working in teams looking after dozens even hundreds of databases. Furthermore the production DBA may well work for a different company (i.e. an outsourcer), possibly in a different continent from the users. Their relationship with the databases they administer is mediated through SLAs and ITIL compliant procedures. So they have little incentive and even less time to appreciate the databases under their care. Indeed, given the prevalence of Sarbanes-Oxley and similar pressures, production DBAs will be increasingly encouraged to remain in ignorance. A production DBA is somebody who knows the metadata of everything and the business purpose of nothing.

Of course, there are DBAs who do work on development projects. They are often combine the role with that of being a developer, especially on smaller projects. They often get called database engineers rather than DBAs. And production DBAs tend to regard database engineers as being developers not "proper" DBAs. I have been a database engineer on sites where I wasn't allowed the SYSTEM or SYS passwords for my project's development database. I would bet that everybody who goes to the Unix SIG is a production DBA.

The second question is whether anybody uses entity relationship diagrams, or more broadly, whether anybody still does logical data modelling. I can't answer this one from personal experience. I've been on a data warehouse project for four years now: I only deal in existing schemas. Even when I have done design it has been for ETL infrastructure and similar, so I have leapt straight to physical tables. As I started out with SSADM I do feel a bit guilty about this. Although I must say I haven't exactly missed drawing Entity Life History diagrams.

Anecdotally, there does seem to be a general decline in the practice of data modelling. There were hardly any presentations on modelling at the last UKOUG conference or at Open World 2007. The Modelling and Design is one of the smaller UKOUG SIGs. The ODTUG Designer listserver has flurries of activity but since Oracle announced the death of Designer it has - understandably - experienced a major drop in traffic. There are occasional questions about data modelling in the OTN forums, but these are frequently from students rather than practitioners. It is depressing to consider that the most commonly referenced data model seems to be the fundamentally flawed Entity-Attribute-Value. My last piece of circumstantial evidence is that the Oracle blogosphere rarely features posts about data modelling. The only blog I know which regularly discusses data modelling is The Database Programmer and even Ken Downs only talks about tables.

Of course people are doing system design. There's lots of design about but I would guess that it all happens in UML. So the majority of logical data modelling these days produces class models rather than ERDs. The physical database design stage is much more likely to consist of ORM than mapping entities to tables. Now that's not the sort of party you invite a DBA to, because you just know they're going to glower in the corner, drinking heavily and muttering to themselves. So the mappings and the database design will be done by middle-tier developers. Our communal prejudices tell us this is unlikely to produce a correct and peformant database design, not least because projects which use such an approach tend to make a fetish of database agnosticism and platform independence. So in the long run we might see a resurgence in data modelling, as part of the tool set for rescuing poorly performing class models.

As a tangent, Dominic Delmolino observed in a recent blog that
"many of the people I’ve been interviewing seem to be taken aback by a few simple SQL questions, telling me that DBA’s (sic) don’t do SQL."
Again, why is this surprising? SQL knowledge is going the way of data modelling for production DBAs. There is a whole raft of GUI administration tools - Quest Spotlight, BMC Patrol, Embarcadero, OEM, etc - whose sole purpose is to allow DBAs to monitor and manage large numbers of databases without using the command line and without knowing SQL. Again this is inevitable given the landscape I described above. Old skool DBAs - the ones who started out managing a single database - will have accreted a personal library of SQL scripts, shell scripts and utilities which do all these things. But people starting out now will probably find themselves operating in shops with dozens of databases and no time to roll their own tools. If they are lucky there will be an old lag to pass on some skills and some scripts; more likely there will be a shrink-wrapped GUI tool. Besides, remember that Oracle Enterprise Manager was introduced in Oracle7: it is perfectly feasible for somebody to describe themselves as an experienced DBA who has never administered a database in any other way.

11 comments:

Nigel said...

Hi

I'm currently finalising a logical data model on a large government project. Designer still lives - yay!!

However, it should also be stated that this is (in effect) a documentation cul-de-sac. The solution is Oracle Apps plus a (quite significant) customisation; programmers are working off a UML process model combined with a physical/relational schema.

Seems in some ways like we have gone back to the 1970s....

Regards Nigel

KenDowns said...

Andrew, two things come to mind about your blog entry.

First, I believe you are observing a general trend that has been going on in computers since the beginning. Once upon a time every Computer scientist was a physicist. Then along came those "mere" programmers who keyed in machine code and didn't care about electron shells and doping, and then that awful wave of programmers who used FORTAN and learned Assembler, and so on.

I think that each successive wave, which builds on the work of the prior, is usually larger than the prior, and tends to become identified with computer science during their time in the sun. Hence the current belief among many young programmers that all the world's a web page running MySQL. Everywhere they look they see clones of themselves, and so they think that's all there is.

I think there are always the same number of people in those lower layers, we just don't get the attention anymore.

The second point is specific to the use of ERD versus direct table design. I believe (only from experience, no proof) that tables are unique in the world of computers in that they can be intelligently discussed by any participant regardless of background, and without any abstraction. This simple fact is so astonishing that it long ago drove me to the conclusion that I would never put a layer of any kind between users and the schema, since any such layer would become something I would have to teach the user without providing a benefit already provided by the tables. I know that many would disagree, but that is my own experience anyway. That is why my blog always jumps from user requirements to tables.

Also on this point, there are some who believe a logical model has the value of being able to be implemented in different worlds than the SQL/Table world. I believe this is neither true nor desirable, and that most data does fit into tables very well, so we may as well talk about it in those terms.

Noons said...

I've got a very simple message to any "dba" who wants to works for me and thinks that clicking on GUI tools is all that's needed and that SQL is a foreign language:

- With extreme ease, I'can hire 10 times more people like them, 10 times cheaper, if all I wanted was a mouse-clicker! So, start THINKING and LEARNING.

It usually works.

I've got to try that with the developers as well: after all, cut-and-paste from forums and books is not that hard as far as "coding" goes..

Anonymous said...

Agree.

But there are a few of us who cannot do their usual job without PowerDesigner or Oracle Designer or...

Cheers.

Carlos.

DomBrooks said...

Nice post.

Recently, I have had a series of clients whose databases are littered with the old EAV "design pattern". So, no constraints because of it.

When trying to get to know the database, not only is there no ERD to look through, but the database isn't even self-documenting (what I'd call a properly constrained database). No doubt the business object layer now contains the only documentation required! However, within the database you can't go from child table to parent table via the constraints because all roads lead to the EAV tables (one for character lookups and one for number lookups - great!) and you're left scratching your head wondering which particular definition of the reference value "3" is the one you're looking for.

And, nothing wrong with using a single physical entitiy to model more than one logical entity, but there are also far too many elements of the everything-is-a "design pattern" which is only a few steps away from the one table called Stuff or Thing (or "Data", I think, was another example from the "Tales of the Oak Table" book).

In terms of ERD usage, there are just far two many non-specialist, all-purpose developers who trivialise the database design but then struggle to string four tables together in a sensible way.

Most recently, and most worringly, I've found that proper database design does not fit into many companies implementation/interpretation of agile development.

Database design, I would argue, is all about big picture - which no one seems to give a stuff about anymore.

Rant on.

Cheers,
Dominic

Chris Muir said...

Hi Andrew

To sidestep the DBA angle of your post, and placing database modeling under the heading "development" my feeling is at the moment that we're seeing a "retraction" of the development market in general. We believe we're seeing a trend with our customers where they're doing less development than before. What we haven't yet worked out is why? Is it just an "Oracle" thing, are clients moving to other technologies, are they more willing to buy packages (or are packages more readily available), have we built most systems needed by our customers, or is it some other reason?

This isn't to say development isn't occurring, it certainly is, just not the scale we've seen before. This means good ol' development skills such as traditional data modeling will shrink too because there's less demand for them on a day by day basis.

My $0.02 worth.

Thanks for the interesting post :)

Cheers,

CM.

Anonymous said...

In my defense, before allowing myself to be shocked, I had found out (via someone’s earlier questions) that this room full of DBA had a good percentage of “Old Lags” like myself. It did - about 40% of the room, I think, had been a DBA for 5 years plus, and until recently you did not become a DBA before spending a few years in the development trenches or System Admin’s central command.
I would not expect the whole room or even most of the room to do design, Production DBA is a full time commitment after all, but the room had a good complement of the more mature (snaggle-toothed, world-weary) database professionals and not one did ERDs anymore, or anything replacing them.

I talked to a friend of mine who has been doing Oracle for about the same length of time as me and he backed up your point Andrew that Back Then DBAs did a lot more, partly as there was less technology you had to know, but we both thought there was still a need for the formal design. After all we figured, the DBA is supposed to be an expert on the abilities of the database technology and so should appreciate how the database design interacts with the database processes and abilities. Maybe we are just getting too old.

I can maybe understand the demise of logical design prior to physical design (though personally I think it is a shame and a missed opportunity to get the system right), but the total lack of ERDs (or a replacement for them) across the room did and still strikes me as astounding. As a couple of the comments have said, something that people often want is a diagram of the data and how it relates, but it seems to be becoming a rare item. If the RI is not there, constructing such relationships is very hard indeed (and without RI how do we know that the data is staying logically consistent? My experience is that where RI is missing, often it worryingly does not).

Adding to such systems where no overall picture is available must be so much harder and so much more reliant on existing staff having all of this in their heads or in a UML somewhere.

This problem of understanding the system also crops up when I run training courses in Oracle and MySQL. A common question asked is “how do I know what tables I have?”. In MySQL the answer is simple (“show tables”), in Oracle it needs a little script. This is usually followed by “how do I know how the tables are related to each other?”. Another script is needed for Oracle systems (to show the foreign keys). With MySQL a brief discussion on similar column names and needing to have a diagram usually ensues. This often happens in respect of Oracle too. When the final question of “where do I get the diagram from?” comes up, all I can do is suggest asking around.

My current conclusion is that database design is dying. Like many ideas in technology, it may have a resurgence, but in the meantime I will be interested to see if fan traps and chasm traps (how many of us remember what they are) continue to become more common in the databases I come across.

Gints Plivna said...

I'm working in a development side and we have logical data model (be it ER diagram or UML class diagram) for each and every our project which invloves more than 3 tables. Some of them are synchronizing physical and logical models also in later phases (production, changes) but I cannot imagine a serious project without diagrams.
Lack of logical design probably is simply in because of nature of mankind - let's start coding we don't need stinkin architecture! Let's start building tables, we don't need models. Neither system analysts nor managers see immediate value of it, so it is dropped. BTW this is with many things much closer to each programmer and system analyst than ERD. Many drink only cola and eat only fast food. There isn't immediate death after that, so no need to worry. The same is with logical modelling. We can throw it away, there isn't immediate increase of expenses or project shutdown. Even more in a short term we probably get sooner initial tables and can start real job - coding! :)

Carlovski said...

Where I work a lot of the 'old hands' got loads of training in logical and physical database design (mainly from Oracle themselves) - we used to be a lot more of an oracle shop and do a lot more development ourselves. We do a lot less these days, so no one gets any formal training in these subjects, however we still do some database design - I normally get involved in most of it, but I have never been trained in any formal data modelling techniques, it's usually try and get a handle of the data and requirements, scribble a rough design on paper and then create some tables. It usually works out ok, but I'd be wary of tackling anything very complicated, and i know I have got things wrong in the past.

Anonymous said...

With the spectacular growth of the web, new opportunities are available for business across the world. An organization can now make its presence felt to customer across continents. It has become an essential marketing strategy to hold strong web presence. India with its advantages, of low input costs and increased productivity, has become a hub for website development and data entry for companies across the world.

APC said...

>> India with its advantages ... has become a hub for website development and data entry for companies across the world.

Not to mention spam farms, alas.