Thursday, October 10, 2013

T-Shirt slogans

One of the Cloudera chaps at the Oracle Big Data meetup had a T-shirt with this cool slogan:
Data is the new bacon
Even as a vegetarian I can appreciate the humour. However I think it has a corollary, which would also make a good T-shirt:
Metadata is the new Kevin Bacon
Because metadata is the thing which connects us all.

Oracle Big Data Meetup - 09-OCT-2013

The Oracle guys running the Big Data 4 the Enterprise Meetup are always apologetic about marketing. The novelty is quite amusing. They do this because most Big Data Meetups are full of brash young people from small start-ups who use cool open source software. They choose cool open source software partly because they're self-styled hackers who like being able to play with their software any way they choose. But mainly it is because the budgetary constraints of being a start-up mean they have to choose between a Clerkenwell office and Aeron chairs, or enterprise software licenses, and that's no choice at all.

But an Oracle Big Data meetup has a different constituency. We come from an enterprise background, we've all been using Oracle software for a long time and we know what to expect from an Oracle event. We're prepared to tolerate a certain amount of Oracle marketing because we want to hear the Oracle take on things, and we come prepared with our shields up. Apart from anything else, the Meetup sponsor is always cut some slack, in exchange for the beer'n'pizza.

Besides the Oracle Big Data Appliance is quite at easy sell, certainly compared to the rest of the engineered systems. The Exa stack largely comprises machines which replace existing servers whereas Big Data is a new requirement. Most Oracle shops probably don't have a pool of Linux/Java/Network hackers on hand to cobble together a parallel cluster of machines and configure them to run Hadoop. A pre-configured Exadoop appliance with Oracle's imprimatur is just what those organisations need. The thing is, it seems a bit cheeky to charge a six figure sum for a box with a bunch of free software on it. No matter how good box is. Particularly when it can be so hard to make the business case for a Big Data initiative.

Stephen Sheldon's presentation on Big Data Analytics As A Service addressed exactly this point. He works for Detica. They have stood up an Oracle BDA instance which they rent out for a couple of months to organisations who want to try a Big Data initiative. Detica provide a pool of data scientists and geeks to help out with the processing and analytics. At the end of the exercise the customer has a proven case showing whether Big Data can give them sufficient valuable insights into their business. This strikes me as a highly neat idea, one which other companies will wish they had thought of first.

Ian Sharp (one of the apologetic Oracle guys) presented on Oracle's Advanced Analytics. The big idea here is R embedded in the database. This gives data scientists access to orders of magnitude more data than they're used to having on their desktop R instances. Quants working in FS organisations will most likely have an accident when they realise just how great an idea this is. Unfortunately, Oracle R Enterprise is part of the Advanced Analytics option, so probably only the big FS companies will go for it. But the Oracle R distro is still quite neat, and free.

Mark Sampson from Cloudera rounded off the evening with a talk on a new offering, Cloudera Search. This basically provides a mechanism for building a Google / Amazon style search facility over a Hadoop cluster. The magic here is that Apache Solr is integrated into the Hadoop architecture instead of as a separate cluster, plus a UI building tool. I spent five years on a project which basically did this with an Oracle RDBMS, hand-rolled ETL and XML generators and lots of Java code plumbing an external search engine into the front-end. It was a great system, loved by its users and well worth the effort at the time. But I expect we could do the whole thing again in a couple of months with this tool set. Which is good news for the next wave of developers.

Some people regard attending technical meetups a bit odd. I mean, giving up your free time to listen to a bunch of presentations on work matters? But if you find this stuff interesting you can't help yourself. And if you work with Oracle tech and are interested in data then this meetup is definitely worth a couple of hours of your free time.

Monday, July 15, 2013

PL/SQL Coding Standards, revisited

Formatting is the least important aspect of Coding Standards. Unfortunately, most sets of standards expend an inordinate number of pages on the topic. Because:

  1. The standards are old, or the person who wrote them is.
  2. Code formatting is an easy thing to codify and formalise.

Perhaps the source of most wasted energy is formatting keywords. Back in mediaeval times, when the only editors in use were vi or Notepad, or perhaps PFE, this was a pressing issue. But modern editors support syntax highlighting: now that we can have keywords in a different colour there is much less need to distinguish them with a different case.

Personally I prefer everything in lower case; I save about 23 seconds a day from not having to use the [shift] key. But other people have different preferences, and for the sake of the team it is better to have all the source code in a consistent format. But the way to achieve this is with automation not a Word document. SQL Developer, PLSQL Developer and TOAD all have code formatters (or beautifiers, yuck) , as do other tools. Let's put the rules into the machine and move on.

What should the rules be? Well, everybody has an opinion, but here are my definitive PL/SQL Coding Standards, with an addendum of formatting guidance.

APC's Damn Fine PL/SQL Coding Standards



  1. Your code must implement the requirements correctly and completely.
  2. Your code must have a suite of unit and integration tests (preferably automated) to prove it implements the requirements correctly and completely.
  3. Your code must implement the requirements as efficiently and performantly as possible.

APC's PL/SQL Code Formatting Guidelines



  1. Case. ALL CAPS is Teh Suck! Anything else is fine.
  2. Indentation. Align consistently. Spaces not tabs. Four spaces is the Goldilocks indent.
  3. Short statements. One statement per line.
  4. Long statements Use line breaks, don't make me scroll.
  5. Naming conventions. Use prefixes to distinguish local variables, global variables and parameters from each other and from database objects.
  6. Comments. A comment is an apology.
If you prefer something less minimal, William Robertson's PL/SQL Coding Standards remains the most complete and best annotated set on the web. Okay, so he does specify "3 spaces for each nesting level" (why? computing is all about powers of 2) but nobody's perfect.

Sunday, July 14, 2013

The personal is technical

On Friday evening I attended an IT Job Fair at the Amerigo Vespucci in Canary Wharf. Let me say straight away that hanging out with a random bunch of techies and recruiters would not be my first choice for a Friday evening. But, hey! I'm looking for my next role, and right now the market is too tough to turn down opportunities to find it. Besides I was interested to see whether the Meetup template would translate into a recruitment fair.

On the day the translation was a mixed success. Unlike most Meetups, which can work with any number of attendees, a job fair requires a goodly number of recruiters, and recruiters will only turn up if they think there will enough candidates to make it worth their while. This first event didn't achieve that critical mass, and I would be quite surprised if I get an opportunity from it. Nevertheless I will try to attend the next event, whenever that may be, because pop-up job fairs in bars are a great idea. Not for the reason you're thinking (I drank cola all evening), but because it was enjoyable. I talked with some interesting people, and got a couple of email addresses as well.

But more than that I was impressed with the concept. The informal social setting is good for understanding what a person is really like, specifically what they might be like to work with. This has to be worthwhile. The CV is a dead letter: it lists skills and accomplishments but doesn't animate or demonstrate them. The formality of the technical interview makes it hard to judge somebody as a person. Anyway it's generally aimed at establishing how much of their CV is true. The social element is often missed entirely. Big mistake. Software is like Soylent Green, it's made of people.

Personality matters: the toughest problems most projects face are political (organizational, personal) rather than technical (except for system integration - that's always going to be the biggest pain in the neck). A modern development project is a complex set of relationships. There are external relationships, with users, with senior managers, with other stakeholders (Security, Enterprise Architecture, etc) any of which can jeopardize the success of the project if handled badly. But the internal relationships - between Project Manager and staff, between developers and testers, or developers and DBAs - are just as fraught with difficulty.

The personal is technical because the team dynamic is a crucial indicator of the likely success of the project. You don't just need technical competence, you need individuals who communicate well and share a common vision; people who are (dread phrase) team players. That's why Project Managers generally like to work with people they already know, because they already know they can work with them.

For a new hire, nobody knows the answer to the burning question, "Can I stand to be in this person's company eight hours a day, five days a week, for the duration of the project?" Hence the value of chatting about work and other things in a bar on a hot summer's eve over a glass of something with ice. I'm not sure how well the model would works for recruitment agents, but I think it would suit both hirers and hirees. It's not a technique that scales, but if people made better hiring decisions perhaps that wouldn't matter?

Thursday, July 11, 2013

UKOUG Analytics Event: a semi-structured analysis

Yesterday's UKOUG Analytics event was a mixture of presentations about OBIEE with sessions on the frontiers of data analysis. I'm not going to cover everything, just dipping into a few things which struck me during the day

During the day somebody described dashboards as "Fisher Price activity centres for managers". Well, Neil Sellers showed a mobile BI app called RoamBI which is exactly that. Swipe that table, pinch that graph, twirl that pie chart! (No really, how have we survived so long with pie charts which can't be rotated?) The thing is so slick, it'll keep the boss amused for hours. Neil's theme on the importance of data visualization to convey a message or tell a story was picked up by Claudio Bastia and Nicola Sandol.   Their presentation included a demo of IConsulting's Location Intelligence extension for OBIEE. The tool not only does impressive things with the display of geographic data, it also allows users to interact with the maps to refine queries and drill down into the data. This is visualization which definitely goes beyond the gimmick: it's an extremely powerful way of communicating complex data sets.

A couple of presentations quoted the statistic that 90% of our data was created in the last two years. This is a figure which has been bandied about but I've never seen a citation which explains who calculated it and what method they used (although it's supposed to have originated at IBM). It probably comes from the same place as most other statistics (and project estimates). What is the "data" the figure measures? I'm sure in some areas of human endeavour (bioinformatics, say, or CERN) the amount of data they produce has gone metastatic. And obviously digital cameras, especially on phones, are now ubiquitous, so video and photographs account for a lot of the data growth. But are selfies, instagrammed burgers and cute kittens really data? Same with other content: how much of this data explosion is mirroring, retweets, quoting, spam and AdSense farms? Not to mention the smut. Anyway, that 90% was first cited in 2012; it's now 2013 and somebody needs to invent derive a new figure.

The day rounded off with a panel and a user presentation. Toby Price opened the Q&A by asking Oracle's Nick Whitehead, how does Hadoop fit into an Oracle estate? It's a good question. After all, Oracle has been able to handle unstructured data, i.e. text, since the introduction of ConText in 8.0 (albeit as a chargeable extra in those days). And there's nothing special about MapReduce: PL/SQL can do that. So what's the deal with Hadoop? Here's the impertinent answer to this pertinent question: Hadoop allows us to run massively parallel jobs without paying Oracle's per processor licenses. Let's face it, not even Tony Stark could afford to run a one-thousand core database.

The closing session was a presentation from James Wyper & Dirk Shelley about upgrading the BI architecture at John Lewis Partnership. They described it as a war story, but actually it was a report from the front lines, because the implementation is not yet finished. James and Dirk covered the products - which ones worked as advertised, which ones gave them grief (integration was a particular source of grief). They also discussed their approach to the project, relating what they did well and what they would do differently with the advantage of hindsight. This sort of session is the best part of any user group: real users sharing their experiences with the community. We need more of them.

Monday, June 24, 2013

Let me SLEEP!

DBMS_LOCK is a slightly obscure built-in package. It provides components which so we build our own locking schemes. Its obscurity stems from the default access on the package, which is restricted to its owner SYS and the other power user accounts. Because implementing your own locking strategy is a good way to wreck a system, unless you really know what you're doing. Besides, Oracle's existing functionality is such that there is almost no need to need to build something extra (especially since 11g finally made the SELECT ... FOR UPDATE SKIP LOCKED syntax legal). So it's just fine that DBMS_LOCK is private to SYS. Except ...

... except that one of the sub-programs in the package is SLEEP(). And SLEEP() is highly useful. Most PL/SQL applications of any sophistication need the ability to pause processing for a short while, either a fixed time or perhaps polling for a specific event. So it is normal for PL/SQL applications to need access to DBMS_SLEEP.LOCK().

Commonly this access is granted at the package level, that is grant execute on dbms_lock to joe_dev. Truth to be told, there's not much harm in that. The privilege is granted to a named account, and if somebody uses the access to implement a roll-your-own locking strategy which brings Production to its knees, well, the DBAs know who to look for.

But we can employ a schema instead. The chief virtue of a schema is managing rights on objects. So let's create a schema for mediating access to powerful SYS privileges:

create user sys_utils identified by &pw
temporary tablespace temp
/
grant create procedure, create view, create type to sys_utils
/

Note that SYS_UTILS does not get the create session privilege. Hence nobody can connect to the account, a sensible precaution for a user with potentially damaging privileges. Why bar connection in all databases and not just Production? The lessons of history tell us that developers will assume they can do in Production anything they can do in Development, and write their code accordingly.

Anyway, as well as granting privileges, the DBA user will need to build SYS_UTIL's objects on its behalf:
grant execute on dbms_lock to sys_utils
/
create or replace procedure sys_utils.sleep
    ( i_seconds in number)
as
begin
    dbms_lock.sleep(i_seconds);
end sleep;
/
create public synonym sleep for sys_utils.sleep
/
grant execute on sys_utils.sleep to joe_dev
/

I think it's a good idea to be proactive about creating an account like this; granting it some obviously useful privileges before developers ask for them, simply because some developers won't ask. The forums occasionally throw up extremely expensive PL/SQL loops whose sole purpose is to burn CPU cycles or wacky DBMS_JOB routines which run every second. These WTFs have their genesis in ignorance of, or lack of access to, DBMS_LOCK.SLEEP().

Oracle 10g - a time traveller's tale

Time travel sucks, especially going back in time. Nobody takes a bath, there are no anaesthetics and you can't get a decent wi-fi signal anywhere. As for killing your own grandfather, forget about it.

The same is true for going back in database versions. In 2009 I had gone straight from an Oracle 9i project to an Oracle 11g one. So when I eventually found myself on a 10g project it was rather disorientating. I would keep reaching for tools which weren't in the toolbox: LISTAGG(), preprocessor scripts for external tables, generalized invocation for objects.

I had missed out on 10g while it was shiny and new, and now it just seemed limited. Take Partitioning. Oracle 10g supported exactly the same composite partitioning methods as 9i: just Range-hash and Range-List, whereas 11g is full of wonders like Interval-Range, Hash-Hash and the one I needed, List-List.

Faking a List-List composite partitioning scheme in 10g

Consider this example of a table with a (slightly forced) need for composite List-List partitioning. It is part of a engineering stock control system, in which PRODUCTS are grouped in LINES (Ships, Cars, Planes) and COMPONENTS are grouped into CATEGORIES (Frame, interior fittings, software, etc). We need an intersection table which links components to products.

There are hundreds of thousands of components and tens of thousands of products. But we are almost always only interested in components for a single category within a single product line (or product) so composite partitiong on (product_line, component_category) is a good scheme. In 11g the List-List method works just fine:
SQL> create table product_components
  2      (product_line varchar2(10) not null
  3          , product_id number not null
  4          , component_category varchar2(10) not null
  5          , component_id number not null
  6          , constraint pc_pk primary key (product_id, component_id )
  7          , constraint pc_prd_fk foreign key (product_id )
  8             references products (product_id)
  9          , constraint pc_com_fk foreign key (component_id )
 10             references components (component_id)
 11      )
 12  partition by range(product_line) subpartition by list(component_category)
 13       subpartition template
 14           (subpartition sbody values ('BODY')
 15            , subpartition sint values ('INT')
 16            , subpartition selectr values ('ELECTR')
 17            , subpartition ssoft values ('SOFT')
 18           )
 19      (partition pship values ('SHIP')
 20       , partition pcar values  ('CAR')
 21       , partition pplane values ('PLANE')
 22       )
 23  /

Table created.

SQL> 

But in 10g the same statement hurls ORA-00922: missing or invalid option . The workaround is a bit of a nasty hack: replace the first List with a Range, producing a legitimate Range-List composite:
SQL> create table product_components
  2      (product_line varchar2(10) not null
  3          , product_id number not null
  4          , component_category varchar2(10) not null
  5          , component_id number not null
  6          , constraint pc_pk primary key (product_id, component_id )
  7          , constraint pc_prd_fk foreign key (product_id )
  8             references products (product_id)
  9          , constraint pc_com_fk foreign key (component_id )
 10             references components (component_id)
 11      )
 12  partition by range(product_line) subpartition by list(component_category)
 13       subpartition template
 14           (subpartition sbody values ('BODY')
 15            , subpartition sint values ('INT')
 16            , subpartition selectr values ('ELECTR')
 17            , subpartition ssoft values ('SOFT')
 18           )
 19      (partition pcar values less than ('CAS')
 20       , partition pplane values less than ('PLANF')
 21       , partition pship values less than ('SHIQ')
 22       )
 23  /

Table created.

SQL> 

Note the wacky spellings which ensure that 'CAR' ends up in the right partition. Also we have to re-order the partition clause so that the partition bounds don't raise an ORA-14037 exception. We are also left with the possibility that a rogue typo might slip records into the wrong partition, so we really ought to have a foreign key constraint on the product_line column:
alter table product_components add constraint pc_prdl_fk foreign key (product_line) 
           references product_lines (line_code)
/

I described this as a nasty hack. It is not really that nasty, in fact it actually works very well in daily processing. But managing the table is less intuitive. Say we want to manufacture another line, rockets. We cannot just add a new partition:
SQL> alter table product_components 
    add partition prock values less than ('ROCKEU')
/
  2    3      add partition prock values less than ('ROCKEU')
                  *
ERROR at line 2:
ORA-14074: partition bound must collate higher than that of the last partition


SQL> 

Instead we have to split the PSHIP partition in two:
SQL> alter table product_components split partition pship
  2     at ('ROCKEU')
  3     into (partition  prock, partition pship)
  4  /

Table altered.

SQL> 

The other snag is, that once we do get back to the future it's a bit of a chore to convert the table to a proper List-List scheme. Probably too much of a chore to be worth the effort. Even with a time machine there are only so many hours in the day.

Friday, June 21, 2013

Where's SCOTT?

The database on the Developer Days Database App VBox appliance doesn't have the SCOTT schema. This is fair enough, as the sample schemas aren't include by default any more (for security reasons, obviously). I know the official sample schemas used in the documentation - HR, OE, and so on - are more realistic and useful for developing prototypes. But nothing beats the simplicity of SCOTT.EMP for explaining something in an online forum.

So, where is the script for building the SCOTT schema?

Back in the day it was part of the SQL*Plus build: $ORACLE_HOME/sqlplus/demo/demobld.sql (or something, I'm doing this from memory). But in 11gR2 there are no demo scripts in the sqlplus sub-directory. This was also probably the case in 10g but I never had occasion to look for it on that platform. Anyway, in 11gR2 its location is $ORACLE_HOME/admin/rdbms/utlsampl.sql.

Presumably Oracle have obscured it because they want everybody to stop using SCOTT and standardise on the modern samples. But this schema is part of the Oracle heritage. It should have Grade II listed status.

Sunday, January 20, 2013