Wednesday, July 15, 2009

No SQL, so what?

It's been a fortnight since Log Buffer rounded up the reaction to the nascent No SQL movement. But there is a lively thread still running on Oracle-L. The entire thread is worth reading, but I was particularly struck by something Nuno Souto wrote:
"Now: the simple fact here is that folks from Google, Facebook, Myspace, Ning etcetc, and what they do as far as IT goes, are absolutely and totally irrelevant to the VAST majority of enterprise business."
This is so true. For starters, there is no SLA for users of Google's search engine. If Google doesn't include a page because it hasn't been indexed yet, well that's just the way it is. Ditto if Google returns duplicate hits because the same page has been indexed in multiple places, or returns different results to different uses because the indexes haven't been replicated across the entire estate. It doesn't really matter because Google's results are usually "good enough". Besides, it is jolly hard to spot missing hits or inconsistent results. Whereas in regular IT a similar casualness would undermine our users' faith in the system and lead to developers' heads being paraded on pikes.

It is also worth noting the major omission from the list of the usual suspects which are trotted out in these arguments: Amazon. Amazon's business model is most like regular enterprise IT - focused data retrieval, highly transactional, and with a premium on data integrity, security and performance. Consequently Amazon runs Oracle.

Why is there this widespread antipathy to SQL databases? It's not just because SQL is hard. I mean, Hibernate is complicated to understand and fiddly to implement. It goes beyond mere effort. Seth Godin wrote the following while discussing what qualities a computer game must possess in order to turn a customer into a die-hard fan:
"For World of Warcraft, [the learning curve is] huge. It's very difficult to spend just an hour or two. There's a chasm between encounter and enjoyable experience. Tetris was oriented in precisely the other way--everyone who tried it instantly became almost as smart as an expert."
I think this applies to development software too. Hibernate may be complex but it is couched in objects and Java and XML configuration files, so if you're already experienced in J2EE you already have an innate understanding of its fundamentals. You can become productive quite quickly. Many of the data storage tools present at the NoSQL briefing come with APIs in Java, Python and similar development languages. In fact, ease of use for developers is a big play; Voldemort even celebrates its mockability (which is a reference to the Mock Objects school of test driven development and not a measurement of ridiculousness).

We are all aware of the cost of context switching in Oracle. Embedding SQL in unnecessary PL/SQL constructs is less performant than using set-based SQL statements. The NoSQL movement is addressing a similar problem: concept switching. It is easier for application developers to maintain their velocity if all their work uses the same languages, approach and indeed IDE.

It is obvious what the attraction is for developers. That does not make a NoSQL product suitable for any given business. Sure, if the application is primarily concerned with the storage, retrieval and emendation of documents it probably makes more sense to use a product like CouchDB than to try to shred the document into relational tables. But if the application is highly transactional and/or handles valuable data then something like MongoDB is definitely a bad fit. To be fair MongoDB does list the applications for which it is less suited.

It comes down to understanding what is appropriate for the project in hand. That is an assessment which really belongs to the users, because they are the people who know - or at least ought to know - the value of the data to the business. The Daily WTF recently published this cautionary tale showing the consequences for a company and all its employees which entailed from underestimating the value of its data and disrespecting the importance of adequate data storage.