Agile Databases and other paradoxes
I hope we can all agree that the problems that Agile seeks to address are valid ones. Applications are all too frequently delivered which are bug ridden, late, not what the customer wanted, not what the customer now needs, and sometimes all of the above simultaneously. Let’s disregard considerations of whether there are other approaches that can also solve these problems and take the Agile approach as a given. The question that needs to be answered is, "Can Agile techniques sensibly and profitably be applied to database development?"1
From the Agile side of the fence the answer is mostly a resounding raspberry. Agile practioners tend to regard regard databases as persistence repositories of last resort. They have a number of strategies to avoid writing SQL. Not just by deferral, using Mock Objects in the development stage, but by making deliberate architectural decisions to not use databases. These range from serialization to files through use of in-memory object stores like Prevayler to hands-off Object-Relational mapping tools like Hibernate. Last year I was talking to a UKOUG member who no longer attended the SIG I help run. He explained that his project was currently being migrated to a new version using Hibernate, Spring, Velocity, etc. I said, that sounds exciting, could someone do a presentation at the SIG? Back came the answer: No. Being Java developers they literally could not see the point in talking to a database user group. The whole point about their approach to application development was to avoid thinking about the database at all.
Some of this resistance may be down to sheer pigheadedness, but (at the risk of seeming to patronise Ron Jeffries) many of these people are actually very bright and aware people. They just don’t think databases can be Agile. Why is this? Let’s look at some of the Agile practices that are made harder by working with an RDBMS.
Agile starts with rather sparsely-defined requirements and uses techniques such as Test Driven Design to flesh them out. This is not building a prototype, this is working, production quality code. By building real code the Agile practioner can show his user a working application and get meaningful feedback that allows him to refine the requirements. Hence together they build the system that the user actually wants. The anti-pattern for this is Big Design Up Front. Here we spend ages drawing a mesh of boxes called an Entity Relationship Diagram. We show it to a bemused user who nods guardedly. We turn it into a data model but the user still isn’t as excited as we are. Finally the database gets built and then the front end gets built on it and whereupon the user wails, "But that’s not what I meant at all!" The question is, does the fault lie in the database or the process? Are databases inherently BDUF?
Obviously, we can just build one table at a time. But is that enough? Probably not. We still need to normalise our database2. Consequently, any attempt to build our database incrementally means constantly refactoring of our existing tables, to eliminate duplication. Can we do this? Yes we can! Of course it requires that we hide all our tables from public gaze behind an API - PL/SQL packages, views or types, which some people will just hate. The larger question is Tom Kyte's point about the importance of designing for performance right from the start. Does that imply BDUF? Yes, at least some. The other problem is that refactoring an object is a lot simpler than splitting one table filled with twenty million rows into two tables. Our database design can be reasonably plastic in development but once we go into production inertia kicks in. I think here is the killer for Agile practices: database refactoring is hard. Scott Ambler seems to be ploughing a lonely furrow in this area but his thoughts on The Process of Database Refactoring make interesting reading.
Agile practices tend to re-inforce each other and that’s the case with User Stories and Incremental Design. User stories describe how the customer’s business works. These stories will not mention data storage, except by implication. Customers don’t specify a database, they specify an application, that is, a front end. Customers cannot accept or validate databases. Even if we gave them TOAD and they wrote SQL queries that would not prove the database met any of their requirements, because their requirements are specified in terms of the front end. And without continuous user feedback we cannot be sure that we are coding in the right direction.
Kent Graziano has been working on building a datawarehouse using Agile techniques. He has presented on Agile Methods and DataWarehousing several times this year. When I saw him talk at Open World he said he had got some grief from Agilistas because the ETL section of datawarehouses doesn’t have users. So there cannot be any user stories. But as Kent points out, just because the business user (the customer) doesn’t know or care about the ETL process doesn’t mean that ETL has no users. On Kent's project they put the BI report writer in the user role. Which is a nimble, indeed agile, extension of the practice which doesn’t, as far as I can see, break any tenet.
Common Ownership of Code
When programming an agile practioner drives a path through the code base. If she finds an existing part of the application that blocks her implementation she can change, extend or fix that code. She can do this safely because the existing code has unit tests and the existing application has integration tests. More importantly, she can do this new work because the existing code base is in Java and she knows Java. But if this piece in the code base was in PL/SQL she would be lost: she doesn’t know PL/SQL. Even worse, suppose her change requires a new table? Then she would have to go to the DBA to find out about schemas, tablespaces, etc. This is too slow for the Agile way of working.
Teams of specialists are not Agile. We know this is true. How often have we been trying to progress a project only to be told, “We can’t do that, Gary’s on leave today.” To be fair this does not just apply to databases. I think Agile practices militates against heterogeneous programming environments and would be equally suspicious of (say) extensive use of shell programming. So, unless pretty much the whole team is competent in PL/SQL, using stored procedures just does not fit in an Agile project.
What this boils down to is the matter of where we should put the business logic. Unfortunately this frequently descends into a religious war. Either you believe it’s obvious that business rules belong in the database or it’s obvious to you that they belong in the middle tier. So let’s put it another way: in a client/server application where does the business logic belong? I think a large part of the middle tier proponents would then grudgingly accept the merits of the database as the repository of business logic. In most sites today we probably have heterogenous applications written in a variety of styles, techniques and languages, including many client/server applications. Our best hope of consolidating business rules is to use the database. But, this is an emotional debate and not one I expect either side to cede.
Object Oriented Programming
The alert amongst you will have noticed how often the term object appears in that list of persistence techniques: this is key. Agile programming languages are object-oriented languages. I was recently embroiled in a debate on the Test Driven Design list when I was told this surprising fact. Surprisng because I thought for several years I had been doing Test Driven Design by using Steven Feuerstein’s utPLSQL to build database applications. Apparently there’s more to Test First than merely writing our tests first.
Obviously an RDBMS is not object-oriented. A non-OO language like PL/SQL lacks certain key features supposedly necessary to the full panoply of Agile practices - abstraction, interfaces, reflection. Does this matter? I remain unconvinced. XP originated in the Smalltalk community and pretty much everybody you come across on Agile/Test Driven arenas seems to be an OO programmer. However, this is a self-fulfilling prophecy. Come out as a relational database developer and you get labelled as a handyman with only a hammer in your toolbox.
Use of Tools
Agile practioners make full use of utilities to improve their productivity - smart IDEs, code completion, continuous integration, automated unit testing, test coverage estimnators, refactoring browsers, etc. Compared to this panoply of riches PL/SQL looks very Spartan. There is the aforementioned utPLSQL but not much else. Of course, this is a vicious circle. PL/SQL lacks the tools to support Agile practices so people who want to be Agile don’t use PL/SQL so nobody builds the tools for PL/SQL.
Database development tools tend to focus on browsing the data dictionary and running SQL statements. There is very little around that genuinely supports the developer in building better PL/SQL or helps the DBA build better databases. This is a situation that is only going to get worse, as Oracle Designer slowly fades into the sunset. Paul Dorsey’s BRIM project (AKA The Tool) may be one to watch. Interestingly there is very little open source stuff for PL/SQL developers, compared to the wealth of freely downloadable utilities for Java heads and Pythonistas. I think this illustrates the lack of people who are competent in both an application (i.e. a UI building) language and PL/SQL.
Still, the Agile Manifesto says we have come to value individuals and interactions over processes and tools, so that’s all right then.
There is an old rule of thumb from the HCI arena that states that a user will regard as “too slow” any task that takes longer than ten seconds; if it takes longer than a minute they will start on another task and come back to the first task later. In Test Driven circles, running a suite of unit tests is such a task. The JUnit list regularly features complaints about how long it takes to run unit tests that require population of a database. This interferes with flow. Hence the popularity of Mock Objects. Hence the drive to isolate the database elements to as small a subset of the application as possible.
I think the database has to cop to this one. If your sole experience of database population is writing tedious XML files for use in DbUnit then you will naturally have a jaundiced view of RDBMS systems. But even people who know how to write SQL must admit that deleting and inserting records takes longer than instantiating some objects (once the JVM has been warmed up!). That’s the price we happily pay for rigour.
Well, database people are notoriously misanthropic. Especially DBAs. We can still talk to each other.
The Agile argument is databases are slow to build and inherently hard to understand. This is true. But it’s true because Agile is primarily a methodology targeted at building user-facing applications. Its adherents tend to work from the user interface downwards. Of course it’s crucial that we build an application that meets the user’s needs. But the user’s needs extend beyond the functional requirements of the application screens.
That doesn’t mean we cannot apply Agile techniques to database development projects. Certainly I regard Test First to be an invaluable technique in building stored procedures; I’m always nevervous when I’m not coding to a utPLSQL test case. My experience of working on DSDM projects tells me that incremental development - in particular placing working code in front of users early and often - is a very good idea. Kent’s work proves that Agile techniques can be profitably applied even to a juggernaut like a datawarehouse. The pain comes when OO-fixated bean-mongers clash with constraint obsessed databasers. Perhaps if they drank less coffee they wouldn’t be so hyped up all the time.