Radio Free Tooting: SQL

Showing posts with label SQL. Show all posts

Thursday, July 30, 2020

Minimal declaration of foreign key columns

Here is the full declaration of an inline foreign key constraint (referencing a primary key column on a table called PARENT):

, parent_id number(12,0) constraint chd_par_fk foreign key references parent(parent_id)

But what is the fewest number of words required to implement the same constraint? Two. This does exactly the same thing:

, parent_id references parent

The neat thing about this minimalist declaration is the child column inherits the datatype of the referenced primary key column. Here's what it looks like (with an odd primary key declaration, just to prove the point):

 
SQL> create table parent1
  2  (parent_id number(15,3) primary key)
  3  /

Table PARENT1 created.

SQL> create table child1
  2  ( id        number(12,0) primary key
  3   ,parent_id references parent1) 
  4  /

Table CHILD1 created.

SQL> desc child1
Name      Null?    Type         
--------- -------- ------------ 
ID        NOT NULL NUMBER(12)   
PARENT_ID          NUMBER(15,3) 
SQL>

If we want to specify a name for the foreign key we need to include the constraint keyword:

 
SQL> create table parent2
  2  (parent_id number(15,3) constraint par1_pk primary key)
  3  /

Table PARENT2 created.

SQL> create table child2
  2  ( id        number(12,0) constraint chd2_pk primary key
  3   ,parent_id              constraint chd2_par2_fk references parent2) 
  4  /

Table CHILD2 created.

SQL> desc child2
Name      Null?    Type         
--------- -------- ------------ 
ID        NOT NULL NUMBER(12)   
PARENT_ID          NUMBER(15,3) 
SQL>

This minimal declaration always references the parent table's primary key. Suppose we want to reference a unique key rather than the primary key. (I would regard this as a data model smell, but sometimes we need to do it.) To make this work we need merely explicitly reference the unique key column:

SQL> create table parent3
  2  ( parent_id  number(15,3)          constraint par3_pk primary key
  3   ,parent_ref varchar2(16) not null constraint par3_uk unique
  4  )
  5  /

Table PARENT3 created.

SQL> create table child3
  2  ( id         number(12,0) constraint chd3_pk primary key
  3   ,parent_ref              constraint chd3_par3_fk references parent3(ref)) 
  4  /

Table CHILD3 created.

SQL> desc child3
Name       Null?    Type         
---------- -------- ------------ 
ID         NOT NULL NUMBER(12)   
PARENT_REF          VARCHAR2(16) 
SQL>

Hmmm, neat. What if we have a compound primary key? Well, that's another data model smell but it still works. Because we're constraining multiple columns we need to use a table level constraint and so the syntax becomes more verbose; we need to include the magic words foreign key:

SQL> create table parent4
  2  ( parent_id  number(15,3)   
  3   ,parent_ref varchar2(16) 
  4   ,constraint par4_pk primary key (id, ref)
  5  )
  6  /

Table PARENT4 created.

SQL> create table child4
  2  ( id number(12,0) constraint chd4_pk primary key
  3   ,parent_id
  4   ,parent_ref
  5   ,constraint chd4_par4_fk foreign key (parent_id, parent_ref) references parent4) 
  6  /

Table CHILD4 created.

SQL> desc child4
Name       Null?    Type         
---------- -------- ------------ 
ID         NOT NULL NUMBER(12)   
PARENT_ID           NUMBER(15,3) 
PARENT_REF          VARCHAR2(16) 
SQL>

Okay, but supposing we change the declaration of the parent column, does Oracle ripple the change to the child table?

 
SQL> alter table parent4 modify parent_ref varchar2(24);

Table PARENT4 altered.

SQL> desc child4
Name       Null?    Type         
---------- -------- ------------ 
ID         NOT NULL NUMBER(12)   
PARENT_ID           NUMBER(15,3) 
PARENT_REF          VARCHAR2(16) 
SQL>

Nope. And rightly so. This minimal syntax is a convenience when we're creating a table, but there's no object-style inheritance mechanism.

Generally I prefer a verbose declaration over minimalism, because clarity trumps concision. I appreciate the rigour of enforcing the same datatype on both ends of a foreign key constraint. However, I hope that in most cases our CREATE TABLE statements have been generated from a data modelling tool. So I think this syntactical brevity is a neat thing to know about, but of limited practical use.

Monday, June 24, 2013

Let me SLEEP!

DBMS_LOCK is a slightly obscure built-in package. It provides components which so we build our own locking schemes. Its obscurity stems from the default access on the package, which is restricted to its owner SYS and the other power user accounts. Because implementing your own locking strategy is a good way to wreck a system, unless you really know what you're doing. Besides, Oracle's existing functionality is such that there is almost no need to need to build something extra (especially since 11g finally made the SELECT ... FOR UPDATE SKIP LOCKED syntax legal). So it's just fine that DBMS_LOCK is private to SYS. Except ...

... except that one of the sub-programs in the package is SLEEP(). And SLEEP() is highly useful. Most PL/SQL applications of any sophistication need the ability to pause processing for a short while, either a fixed time or perhaps polling for a specific event. So it is normal for PL/SQL applications to need access to DBMS_SLEEP.LOCK().

Commonly this access is granted at the package level, that is grant execute on dbms_lock to joe_dev. Truth to be told, there's not much harm in that. The privilege is granted to a named account, and if somebody uses the access to implement a roll-your-own locking strategy which brings Production to its knees, well, the DBAs know who to look for.

But we can employ a schema instead. The chief virtue of a schema is managing rights on objects. So let's create a schema for mediating access to powerful SYS privileges:

create user sys_utils identified by &pw
temporary tablespace temp
/
grant create procedure, create view, create type to sys_utils
/

Note that SYS_UTILS does not get the create session privilege. Hence nobody can connect to the account, a sensible precaution for a user with potentially damaging privileges. Why bar connection in all databases and not just Production? The lessons of history tell us that developers will assume they can do in Production anything they can do in Development, and write their code accordingly.

Anyway, as well as granting privileges, the DBA user will need to build SYS_UTIL's objects on its behalf:

grant execute on dbms_lock to sys_utils
/
create or replace procedure sys_utils.sleep
    ( i_seconds in number)
as
begin
    dbms_lock.sleep(i_seconds);
end sleep;
/
create public synonym sleep for sys_utils.sleep
/
grant execute on sys_utils.sleep to joe_dev
/

I think it's a good idea to be proactive about creating an account like this; granting it some obviously useful privileges before developers ask for them, simply because some developers won't ask. The forums occasionally throw up extremely expensive PL/SQL loops whose sole purpose is to burn CPU cycles or wacky DBMS_JOB routines which run every second. These WTFs have their genesis in ignorance of, or lack of access to, DBMS_LOCK.SLEEP().

Oracle 10g - a time traveller's tale

Time travel sucks, especially going back in time. Nobody takes a bath, there are no anaesthetics and you can't get a decent wi-fi signal anywhere. As for killing your own grandfather, forget about it.

The same is true for going back in database versions. In 2009 I had gone straight from an Oracle 9i project to an Oracle 11g one. So when I eventually found myself on a 10g project it was rather disorientating. I would keep reaching for tools which weren't in the toolbox: LISTAGG(), preprocessor scripts for external tables, generalized invocation for objects.

I had missed out on 10g while it was shiny and new, and now it just seemed limited. Take Partitioning. Oracle 10g supported exactly the same composite partitioning methods as 9i: just Range-hash and Range-List, whereas 11g is full of wonders like Interval-Range, Hash-Hash and the one I needed, List-List.

Faking a List-List composite partitioning scheme in 10g

Consider this example of a table with a (slightly forced) need for composite List-List partitioning. It is part of a engineering stock control system, in which PRODUCTS are grouped in LINES (Ships, Cars, Planes) and COMPONENTS are grouped into CATEGORIES (Frame, interior fittings, software, etc). We need an intersection table which links components to products.

There are hundreds of thousands of components and tens of thousands of products. But we are almost always only interested in components for a single category within a single product line (or product) so composite partitiong on (product_line, component_category) is a good scheme. In 11g the List-List method works just fine:

SQL> create table product_components
  2      (product_line varchar2(10) not null
  3          , product_id number not null
  4          , component_category varchar2(10) not null
  5          , component_id number not null
  6          , constraint pc_pk primary key (product_id, component_id )
  7          , constraint pc_prd_fk foreign key (product_id )
  8             references products (product_id)
  9          , constraint pc_com_fk foreign key (component_id )
 10             references components (component_id)
 11      )
 12  partition by range(product_line) subpartition by list(component_category)
 13       subpartition template
 14           (subpartition sbody values ('BODY')
 15            , subpartition sint values ('INT')
 16            , subpartition selectr values ('ELECTR')
 17            , subpartition ssoft values ('SOFT')
 18           )
 19      (partition pship values ('SHIP')
 20       , partition pcar values  ('CAR')
 21       , partition pplane values ('PLANE')
 22       )
 23  /

Table created.

SQL>

But in 10g the same statement hurls ORA-00922: missing or invalid option . The workaround is a bit of a nasty hack: replace the first List with a Range, producing a legitimate Range-List composite:

SQL> create table product_components
  2      (product_line varchar2(10) not null
  3          , product_id number not null
  4          , component_category varchar2(10) not null
  5          , component_id number not null
  6          , constraint pc_pk primary key (product_id, component_id )
  7          , constraint pc_prd_fk foreign key (product_id )
  8             references products (product_id)
  9          , constraint pc_com_fk foreign key (component_id )
 10             references components (component_id)
 11      )
 12  partition by range(product_line) subpartition by list(component_category)
 13       subpartition template
 14           (subpartition sbody values ('BODY')
 15            , subpartition sint values ('INT')
 16            , subpartition selectr values ('ELECTR')
 17            , subpartition ssoft values ('SOFT')
 18           )
 19      (partition pcar values less than ('CAS')
 20       , partition pplane values less than ('PLANF')
 21       , partition pship values less than ('SHIQ')
 22       )
 23  /

Table created.

SQL>

Note the wacky spellings which ensure that 'CAR' ends up in the right partition. Also we have to re-order the partition clause so that the partition bounds don't raise an ORA-14037 exception. We are also left with the possibility that a rogue typo might slip records into the wrong partition, so we really ought to have a foreign key constraint on the product_line column:

alter table product_components add constraint pc_prdl_fk foreign key (product_line) 
           references product_lines (line_code)
/

I described this as a nasty hack. It is not really that nasty, in fact it actually works very well in daily processing. But managing the table is less intuitive. Say we want to manufacture another line, rockets. We cannot just add a new partition:

SQL> alter table product_components 
    add partition prock values less than ('ROCKEU')
/
  2    3      add partition prock values less than ('ROCKEU')
                  *
ERROR at line 2:
ORA-14074: partition bound must collate higher than that of the last partition


SQL>

Instead we have to split the PSHIP partition in two:

SQL> alter table product_components split partition pship
  2     at ('ROCKEU')
  3     into (partition  prock, partition pship)
  4  /

Table altered.

SQL>

The other snag is, that once we do get back to the future it's a bit of a chore to convert the table to a proper List-List scheme. Probably too much of a chore to be worth the effort. Even with a time machine there are only so many hours in the day.

Friday, June 21, 2013

Where's SCOTT?

The database on the Developer Days Database App VBox appliance doesn't have the SCOTT schema. This is fair enough, as the sample schemas aren't include by default any more (for security reasons, obviously). I know the official sample schemas used in the documentation - HR, OE, and so on - are more realistic and useful for developing prototypes. But nothing beats the simplicity of SCOTT.EMP for explaining something in an online forum.

So, where is the script for building the SCOTT schema?

Back in the day it was part of the SQL*Plus build: $ORACLE_HOME/sqlplus/demo/demobld.sql (or something, I'm doing this from memory). But in 11gR2 there are no demo scripts in the sqlplus sub-directory. This was also probably the case in 10g but I never had occasion to look for it on that platform. Anyway, in 11gR2 its location is $ORACLE_HOME/admin/rdbms/utlsampl.sql.

Presumably Oracle have obscured it because they want everybody to stop using SCOTT and standardise on the modern samples. But this schema is part of the Oracle heritage. It should have Grade II listed status.

Monday, November 07, 2011

Wildly Over-ambitious Book Title of the Week

One of my co-workers has on his desk "Teach yourself SQL in 10 minutes". Yes, it is a SAMS book.

According to Ben Forta, the author, it is one of the best selling SQL books of all time. Not surprisng: who could resist a title like that?

I like his emphasis on getting stuff done. Even so, I think ten minutes is just about long enough to decide whether to pronounce it "sequel" or "ess queue ell".

Wednesday, July 15, 2009

No SQL, so what?

It's been a fortnight since Log Buffer rounded up the reaction to the nascent No SQL movement. But there is a lively thread still running on Oracle-L. The entire thread is worth reading, but I was particularly struck by something Nuno Souto wrote:

"Now: the simple fact here is that folks from Google, Facebook, Myspace, Ning etcetc, and what they do as far as IT goes, are absolutely and totally irrelevant to the VAST majority of enterprise business."

This is so true. For starters, there is no SLA for users of Google's search engine. If Google doesn't include a page because it hasn't been indexed yet, well that's just the way it is. Ditto if Google returns duplicate hits because the same page has been indexed in multiple places, or returns different results to different uses because the indexes haven't been replicated across the entire estate. It doesn't really matter because Google's results are usually "good enough". Besides, it is jolly hard to spot missing hits or inconsistent results. Whereas in regular IT a similar casualness would undermine our users' faith in the system and lead to developers' heads being paraded on pikes.

It is also worth noting the major omission from the list of the usual suspects which are trotted out in these arguments: Amazon. Amazon's business model is most like regular enterprise IT - focused data retrieval, highly transactional, and with a premium on data integrity, security and performance. Consequently Amazon runs Oracle.

Why is there this widespread antipathy to SQL databases? It's not just because SQL is hard. I mean, Hibernate is complicated to understand and fiddly to implement. It goes beyond mere effort. Seth Godin wrote the following while discussing what qualities a computer game must possess in order to turn a customer into a die-hard fan:

"For World of Warcraft, [the learning curve is] huge. It's very difficult to spend just an hour or two. There's a chasm between encounter and enjoyable experience. Tetris was oriented in precisely the other way--everyone who tried it instantly became almost as smart as an expert."

I think this applies to development software too. Hibernate may be complex but it is couched in objects and Java and XML configuration files, so if you're already experienced in J2EE you already have an innate understanding of its fundamentals. You can become productive quite quickly. Many of the data storage tools present at the NoSQL briefing come with APIs in Java, Python and similar development languages. In fact, ease of use for developers is a big play; Voldemort even celebrates its mockability (which is a reference to the Mock Objects school of test driven development and not a measurement of ridiculousness).

We are all aware of the cost of context switching in Oracle. Embedding SQL in unnecessary PL/SQL constructs is less performant than using set-based SQL statements. The NoSQL movement is addressing a similar problem: concept switching. It is easier for application developers to maintain their velocity if all their work uses the same languages, approach and indeed IDE.

It is obvious what the attraction is for developers. That does not make a NoSQL product suitable for any given business. Sure, if the application is primarily concerned with the storage, retrieval and emendation of documents it probably makes more sense to use a product like CouchDB than to try to shred the document into relational tables. But if the application is highly transactional and/or handles valuable data then something like MongoDB is definitely a bad fit. To be fair MongoDB does list the applications for which it is less suited.

It comes down to understanding what is appropriate for the project in hand. That is an assessment which really belongs to the users, because they are the people who know - or at least ought to know - the value of the data to the business. The Daily WTF recently published this cautionary tale showing the consequences for a company and all its employees which entailed from underestimating the value of its data and disrespecting the importance of adequate data storage.

Monday, March 31, 2008

Comparing CHAR values

Here is a table with a single row of data.


SQL> create table my_tab (c2 char(2))
  2  /

SQL>  insert into my_tab (c2) values ('Y ')
  2  /

1 row created.

SQL>

Which of the following queries will match that row?


select * from my_tab where c2 = 'Y'
/

select * from my_tab where c2 = 'Y '
/

select * from my_tab where c2 = 'Y  '
/

If you said all three you get a cigar (providing you live in a jurisdiction where such infernal devices are still permitted).


SQL> select * from my_tab where c2 = 'Y';
c2
--
Y

SQL> select * from my_tab where c2 = 'Y ';
c2
--
Y

SQL> select * from my_tab where c2 = 'Y  ';
c2
--
Y

The reason is due to Oracle's rules for comparing blank-padded datatypes. If the two values are of differing sizes Oracle pads the smaller variable with the requisite number of blank spaces. Obviously it is documented.

On the whole I think this is a boon - CHAR columns are a pain in the neck at the best of times, without having to worry unnecessarily about additional RPAD calls. This is unfortunate if you are relying on 'Y ' being different to 'Y'; but if your application depends on trailing spaces for data integrity then you probably have bigger problems.

Wednesday, February 13, 2008

Data modelling and other dying arts

Martin Widlake sent me an e-mail after last months UKOUG Unix SIG:

"At my presentation at the UKOUG Unix SIG yesterday I suggested that formal design was almost dead, replaced with organic design and asked if anyone still used ERDs. No one did. Not one.
This kind of bothered me. Does it mean that just ERDs are dead? Or that a room full of DBAs is a room full of people who do not do systems design (I am just as shocked by that if it is true)? Or maybe formal design is a dead concept."

There's two points here. The first, the utter lack of DBAs who do data modelling tasks, doesn't surprise me in the slightest. This is the nature of the modern DBA's job. Production DBAs look after live systems: they don't design them. Increasingly people are becoming DBAs straight out of college. These guys have never worked as developers and probably never will. The older geezers, who followed the more traditional route of starting out as programmers and progressing into the DBA role, probably haven't worked on development projects in years.

Also the IT landscape has changed. Even ten years ago many organisations had one or at most a handful of databases. It was possible for a DBA to be responsible for a single database; knowing its purpose and its value to their organisation was part of the job description. These days it is not uncommon to find DBAs working in teams looking after dozens even hundreds of databases. Furthermore the production DBA may well work for a different company (i.e. an outsourcer), possibly in a different continent from the users. Their relationship with the databases they administer is mediated through SLAs and ITIL compliant procedures. So they have little incentive and even less time to appreciate the databases under their care. Indeed, given the prevalence of Sarbanes-Oxley and similar pressures, production DBAs will be increasingly encouraged to remain in ignorance. A production DBA is somebody who knows the metadata of everything and the business purpose of nothing.

Of course, there are DBAs who do work on development projects. They are often combine the role with that of being a developer, especially on smaller projects. They often get called database engineers rather than DBAs. And production DBAs tend to regard database engineers as being developers not "proper" DBAs. I have been a database engineer on sites where I wasn't allowed the SYSTEM or SYS passwords for my project's development database. I would bet that everybody who goes to the Unix SIG is a production DBA.

The second question is whether anybody uses entity relationship diagrams, or more broadly, whether anybody still does logical data modelling. I can't answer this one from personal experience. I've been on a data warehouse project for four years now: I only deal in existing schemas. Even when I have done design it has been for ETL infrastructure and similar, so I have leapt straight to physical tables. As I started out with SSADM I do feel a bit guilty about this. Although I must say I haven't exactly missed drawing Entity Life History diagrams.

Anecdotally, there does seem to be a general decline in the practice of data modelling. There were hardly any presentations on modelling at the last UKOUG conference or at Open World 2007. The Modelling and Design is one of the smaller UKOUG SIGs. The ODTUG Designer listserver has flurries of activity but since Oracle announced the death of Designer it has - understandably - experienced a major drop in traffic. There are occasional questions about data modelling in the OTN forums, but these are frequently from students rather than practitioners. It is depressing to consider that the most commonly referenced data model seems to be the fundamentally flawed Entity-Attribute-Value. My last piece of circumstantial evidence is that the Oracle blogosphere rarely features posts about data modelling. The only blog I know which regularly discusses data modelling is The Database Programmer and even Ken Downs only talks about tables.

Of course people are doing system design. There's lots of design about but I would guess that it all happens in UML. So the majority of logical data modelling these days produces class models rather than ERDs. The physical database design stage is much more likely to consist of ORM than mapping entities to tables. Now that's not the sort of party you invite a DBA to, because you just know they're going to glower in the corner, drinking heavily and muttering to themselves. So the mappings and the database design will be done by middle-tier developers. Our communal prejudices tell us this is unlikely to produce a correct and peformant database design, not least because projects which use such an approach tend to make a fetish of database agnosticism and platform independence. So in the long run we might see a resurgence in data modelling, as part of the tool set for rescuing poorly performing class models.

As a tangent, Dominic Delmolino observed in a recent blog that

"many of the people I’ve been interviewing seem to be taken aback by a few simple SQL questions, telling me that DBA’s (sic) don’t do SQL."

Again, why is this surprising? SQL knowledge is going the way of data modelling for production DBAs. There is a whole raft of GUI administration tools - Quest Spotlight, BMC Patrol, Embarcadero, OEM, etc - whose sole purpose is to allow DBAs to monitor and manage large numbers of databases without using the command line and without knowing SQL. Again this is inevitable given the landscape I described above. Old skool DBAs - the ones who started out managing a single database - will have accreted a personal library of SQL scripts, shell scripts and utilities which do all these things. But people starting out now will probably find themselves operating in shops with dozens of databases and no time to roll their own tools. If they are lucky there will be an old lag to pass on some skills and some scripts; more likely there will be a shrink-wrapped GUI tool. Besides, remember that Oracle Enterprise Manager was introduced in Oracle7: it is perfectly feasible for somebody to describe themselves as an experienced DBA who has never administered a database in any other way.

Thursday, February 07, 2008

Scoping with SQL Types

The scoping rules for function calls are quite clear. Given a package with a function which has the same name as a standalone function, another function in that package will call the packaged function not the standalone one:


SQL> create or replace function toto 
  2      return varchar2 
  3  as
  4  begin
  5      return 'TOOTING';
  6  end toto;
  7  /

Function created.

SQL> create or replace package a as
  2      function toto return varchar2;
  3      function tata return varchar2;
  4  end a;
  5  /

Package created.

SQL> create or replace package body a as
  2      function toto return varchar2 
  3      as
  4      begin
  5          return 'KANSAS';
  6      end toto;
  7      function tata return varchar2 
  8      as
  9      begin
 10          return 'We''re not in '||toto||' anymore';
 11      end tata;
 12  end a;
 13  /

Package body created.

SQL> select a.tata from dual
  2  /
TATA
-----------------------------------
We're not in KANSAS anymore

SQL>

The rules apply the same way if we're working with an object rather than a package ....


SQL> drop package a
  2  /

Package dropped.

SQL> create or replace type a as object (
  2      attr1 varchar2(20)
  3      , member function toto return varchar2
  4      , member function tata return varchar2
  5  ) NOT FINAL;
  6  /    

Type created.

SQL> create or replace type body a as 
  2      member function toto return varchar2 
  3      as
  4      begin
  5          return attr1;
  6      end toto;
  7      member function tata return varchar2 
  8      as
  9      begin
 10          return 'We''re not in '||toto||' anymore';
 11      end tata;
 12  end;
 13  /

Type body created.

SQL> set serveroutput on
SQL> declare 
  2      my_a a := new a('KANSAS');
  3  begin
  4      dbms_output.put_line(my_a.tata);
  5  end;
  6  /
We're not in KANSAS anymore

PL/SQL procedure successfully completed.

SQL>

However, there is a gotcha: the scoping rules do not work that way when our type inherits from a super-type....

 
SQL> create or replace type b under a (
  2      overriding member function tata return varchar2
  3  );
  4  /    

Type created.

SQL> create or replace type body b as 
  2      overriding member function tata return varchar2 
  3      as
  4      begin
  5          return 'We''re not in '||toto||' anymore!!!';
  6      end tata;
  7  end;
  8  /

Type body created.

SQL> declare 
  2      my_b b := new b('KANSAS');
  3  begin
  4      dbms_output.put_line(my_b.tata);
  5  end;
  6  /
We're not in TOOTING anymore!!!

PL/SQL procedure successfully completed.

SQL>

The solution is quite straightforward: use the SELF keyword to make the scope explicit.

 
SQL> create or replace type body b as 
  2      overriding member function tata return varchar2 
  3      as
  4      begin
  5          return 'We''re not in '||SELF.toto||' anymore!!!';
  6      end tata;
  7  end;
  8  /

Type body created.

SQL> 
SQL> declare 
  2      my_b b := new b('KANSAS');
  3  begin
  4      dbms_output.put_line(my_b.tata);
  5  end;
  6  /
We're not in KANSAS anymore!!!

PL/SQL procedure successfully completed.

SQL>

I admit I am not clear about the rules for using SELF. Sometimes it is compulsory, sometimes it is optional. So it's just easier to always include it whenever we reference anything inside a type body.

NB: I ran these tests on 9.2.0.6, if that makes any difference.

Monday, February 04, 2008

Who needs SQL injection?

I - probably like many of you - thought the prevention of SQL injection (the passing of additional SQL statements through the parameters of dynamic SQL calls) was the low hanging fruit of web app security. Not at all. This latest post from The Daily WTF really takes database (in)security to another level.

Monday, January 28, 2008

Is this string a number? Really?

Jared Still ponders this question in a recent blog post. He runs some benchmarks against the various approaches and comes to the conclusion that using TRANSLATE() is the fastest approach. Which is fine, as far as it goes. It's a solution which works for Jared's situation but is not universally applicable.

Note that I have slightly complicated Jared's approach to allow for fake_number values of varying length:


SQL> select * from detect_numeric
  2  order by 1
  3  /
FAKE_NUMBER
--------------------
000000
000001
000002
000010
000011
000012
000020
000021
000022

9 rows selected.

SQL> select fake_number
  2  from detect_numeric
  3  where lpad('|', length(fake_number), '|')
  4       = translate(fake_number,'0123456789','||||||||||')
  5  order by 1
  6  /
FAKE_N
------
000000
000001
000002
000010
000011
000012
000020
000021
000022

9 rows selected.
SQL>

So far so good. Let's add another record....


SQL> insert into detect_numeric values ('123.45')
  2  /

1 row created.

SQL> select fake_number
  2  from detect_numeric
  3  where lpad('|', length(fake_number), '|')
  4       = translate(fake_number,'0123456789','||||||||||')
  5  order by 1
  6  /
FAKE_N
------
000000
000001
000002
000010
000011
000012
000020
000021
000022

9 rows selected.

SQL>

Wha'pen? Isn't 123.45 is a number? Well, no, not in this context. The TRANSLATE() call is only counting digits. Hmmm, obviously we need to allow for decimal points.


SQL> select fake_number
  2  from detect_numeric
  3  where lpad('|', length(fake_number), '|')
  4       = translate(fake_number,'0123456789.','|||||||||||')
  5  order by 1
  6  /
FAKE_N
------
000000
000001
000002
000010
000011
000012
000020
000021
000022
123.45

10 rows selected.

SQL>

Problem solved? Not quite. There's more to being numeric than just comprising digits and decimal points. They have to be in the right quantities and right places. Let's add an IP address to the mix....

 
SQL> insert into detect_numeric values ('127.0.0.1')
  2  /

1 row created.

SQL> select fake_number
  2  from detect_numeric
  3  where lpad('|', length(fake_number), '|')
  4       = translate(fake_number,'0123456789.','|||||||||||')
  5  order by 1
  6  /
FAKE_NUMBER
--------------------
000000
000001
000002
000010
000011
000012
000020
000021
000022
123.45
127.0.0.1

11 rows selected.

SQL>

And that's why we might need a function like IS_NUMERIC(), which wraps a TO_NUMBER call:


SQL> create or replace function is_numeric
  2      (p_str in varchar2, p_fmt_msk in varchar2 := null)
  3      return varchar2
  4  as
  5    return_value varchar2(5);
  6    n number;
  7  begin
  8      begin
  9         if p_fmt_msk is null then
 10         n := to_number(p_str);
 11         else
 12         n := to_number(p_str, p_fmt_msk);
 13         end if;
 14         return_value := 'TRUE';
 15      exception
 16        when others then
 17         return_value := 'FALSE';
 18      end;
 19    return return_value;
 20  end;
 21  /

Function created.

SQL> column is_numeric format a10
SQL> select fake_number, is_numeric(fake_number) is_numeric
  2  from detect_numeric
  3  order by 2,1
  4  /
FAKE_NUMBER          IS_NUMERIC
-------------------- ----------
127.0.0.1            FALSE
000000               TRUE
000001               TRUE
000002               TRUE
000010               TRUE
000011               TRUE
000012               TRUE
000020               TRUE
000021               TRUE
000022               TRUE
123.45               TRUE

11 rows selected.

SQL>

All of which underlines the importance of understanding the data with which we are working. If we just need to assert that a string consists solely of digits then a simple TRANSLATE() will suffice and will be very efficient. But if we need to assert something more precise - that the string contains a valid number - then we may require a slower but more reliable approach.

Incidentally, anyone who is interested in seeing how to use 10g's regex functionality to winnow numeric strings from non-numeric strings should read this OTN Forum post from CD.

Friday, January 25, 2008

How about a date, baby?

Laurent Schneider has posted another interesting insight into the limits of Oracle. I always thought that the highest date we could have in Oracle was 31-DEC-9999. Well that's certainly the highest date we can fit into the standard NLS date format. But the date buffer will actually go up to Friday 1st January 15402 A.D. Read Laurent's post to see how he does it.

Incidentally, you will probably need to tweak your NLS settings to see the results:


SQL> alter session set nls_date_format='dd-mon-yyyy';

Session altered.

SQL> select round(date '9999-01-01','CC') from dual;
ERROR:
ORA-01801: date format is too long for internal buffer



no rows selected

SQL> alter session set nls_date_format='FMDay ddth Month YYYY B.C.';

Session altered.

SQL>  select round(date '9999-01-01','CC') from dual;
ROUND(DATE'9999-01-01','CC')
--------------------------------
Monday 1st January 10001 AD

SQL>

Incidentally, the Julian dates break down at the outer limits...


SQL> select to_char(round(date '9999-01-01','CC'), 'J') from dual;
TO_CHAR
-------
0000000


SQL> select to_char(trunc(date '-4712-1-1','CC'), 'J') from dual;
TO_CHAR
-------
0000000

SQL>

... and without resorting to RAW twiddling, 1st January 4800 BC is the furthest back our time machine will go....


SQL> select trunc(date '-4712-1-1','CC')-1 from dual;
select trunc(date '-4712-1-1','CC')-1 from dual
                                   *
ERROR at line 1:
ORA-01841: (full) year must be between -4713 and +9999, and not be 0

SQL>

Thursday, December 13, 2007

NOT IN, NOT EXISTS and MINUS: an aide-memoir

A colleague asked me whether NOT IN would return the same as MINUS. I said it would depend on whether the results contained nulls. I confess to not being clear as to how the results would be affected by the presence of nulls, but it's easy enough to knock up a test case.

We start with both tables containing nulls:


SQL> select id from a10
  2  /
        ID
----------
         1

         3

3 rows selected.

SQL> select id from a20
  2  /
        ID
----------
         3
         2

         4

4 rows selected.

SQL> select * from a10
  2  where id not in ( select id from a20)
  3  /

no rows selected

SQL>  select * from a10
  2  where not exists
  3  (select id from a20 where a20.id = a10.id)
  4  /
        ID
----------
         1


2 rows selected.

SQL> select * from a10
  2  minus
  3  select * from a20
  4  /
        ID
----------
         1

1 row selected.

With a null in the top table but not in the bottom table:


SQL> delete from a20 where id is null
  2  /

1 row deleted.

SQL> select * from a10
  2  where id not in ( select id from a20)
  3  /
        ID
----------
         1

1 row selected.

SQL>  select * from a10
  2  where not exists
  3  (select id from a20 where a20.id = a10.id)
  4  /
        ID
----------
         1


2 rows selected.

SQL> select * from a10
  2  minus
  3  select * from a20
  4  /
        ID
----------
         1


2 rows selected.

SQL>

With a null in the bottom table but not in the top table:



SQL> delete from a10 where id is null
  2  /

1 row deleted.

SQL> insert into a20 values (null)
  2  /

1 row created.

SQL> select * from a10
  2  where id not in ( select id from a20)
  3  /

no rows selected

SQL>  select * from a10
  2  where not exists
  3  (select id from a20 where a20.id = a10.id)
  4  /
        ID
----------
         1

1 row selected.

SQL> select * from a10
  2  minus
  3  select * from a20
  4  /
        ID
----------
         1

1 row selected.

SQL>

With no nulls in either table:


SQL> delete from a20 where id is null
  2  /

1 row deleted.

SQL> select * from a10
  2  where id not in ( select id from a20)
  3  /
        ID
----------
         1

1 row selected.

SQL>  select * from a10
  2  where not exists
  3  (select id from a20 where a20.id = a10.id)
  4  /
        ID
----------
         1

1 row selected.

SQL> select * from a10
  2  minus
  3  select * from a20
  4  /
        ID
----------
         1

1 row selected.

SQL>

Wednesday, October 10, 2007

`xkcd` on SQL injection

Top geek humour comic xkcd has just published a strip about SQL injection. Check it out.

Update

Other people enjoyed this strip too. John Emmons for instance. And Tom Kyte, of course. And Boing Boing, ditto. And Pete Finnegan. In fact I was probably the last person on the whole internet to post about it. Sigh.

Wednesday, October 03, 2007

More on DISABLE VALIDATE

Yesterday I blogged about a "bug" with relational integrity. Due to lack of time I didn't really have time to explain why this behaviour occurs.

When we disable the constraint with the proper syntax the constraint is marked as NOT VALIDATED:


SQL> ALTER TABLE temp_child DISABLE CONSTRAINT what_the_fk;

Table altered.

SQL> SELECT status, validated FROM user_constraints u
  2  WHERE table_name= 'TEMP_CHILD'
  3  AND u.constraint_type ='R';
STATUS   VALIDATED
-------- -------------
DISABLED NOT VALIDATED

SQL>

Whereas, when we disable it the wrong way this doesn't happen:


SQL> ALTER TABLE temp_child DISABLE VALIDATE CONSTRAINT what_the_fk;

Table altered.

SQL> SELECT status, validated FROM user_constraints u
  2  WHERE table_name= 'TEMP_CHILD'
  3  AND u.constraint_type ='R';
STATUS   VALIDATED
-------- -------------
DISABLED VALIDATED

SQL>

The bug occurs because the database trusts the value of VALIDATED rather than actually validating the constraint. So I thought I should see whether it applies to other kinds of constraints. This is what happens with a check constraint:


SQL> create table nn (col1 number)
  2  /

Table created.

SQL> alter table nn add constraint nn_ck check (col1 is not null)
  2  /

Table altered.

SQL> insert into nn values (null)
  2  /
insert into nn values (null)
*
ERROR at line 1:
ORA-02290: check constraint (APC.NN_CK) violated


SQL> alter table nn disable validate constraint nn_ck
  2  /

Table altered.

SQL> insert into nn values (null)
  2  /
insert into nn values (null)
*
ERROR at line 1:
ORA-25128: No insert/update/delete on table with constraint (APC.NN_CK)
disabled and validated


SQL>

And this with a unique key:


SQL> create table my_nu_tab (col1 number)
  2  /

Table created.

SQL> alter table my_nu_tab add constraint nu_uk unique (col1)
  2  /

Table altered.

SQL> insert into my_nu_tab values (1)
  2  /

1 row created.

SQL> insert into my_nu_tab values (1)
  2  /
insert into my_nu_tab values (1)
*
ERROR at line 1:
ORA-00001: unique constraint (APC.NU_UK) violated


SQL> alter table my_nu_tab disable validate constraint nu_uk
  2  /

Table altered.

SQL> insert into my_nu_tab values (1)
  2  /
insert into my_nu_tab values (1)
*
ERROR at line 1:
ORA-25128: No insert/update/delete on table with constraint (APC.NU_UK)
disabled and validated


SQL>

Interesting. So what happens when we try to insert into the child table when the foreign key is disabled yet validated?


SQL> insert into temp_child values (2, 11)
  2  /
insert into temp_child values (2, 11)
*
ERROR at line 1:
ORA-25128: No insert/update/delete on table with constraint (APC.WHAT_THE_FK)
disabled and validated


SQL>

So at least the database prevents us from inserting orphaned records into the referencing table. It just doesn't protect the integrity of the referenced table. Which is fair enough I suppose - the foreign key is disabled after all.

I'm trying to think of a scenario in which we would want to disable a constraint but still keep it validated but so far nothing has occurred to me. Obviously there must be a good reason for this. Any suggestions?

Interesting relational integrity bug

Over in the OTN forums, Wilhelm demonstrates a method for disabling foreign key constraints in such a way that we can delete the parent data and then re-enable the foreign key without throwing an ORA-02298 exception.

The proper syntax for disabling a constraint is this:


SQL> ALTER TABLE temp_child DISABLE CONSTRAINT what_the_fk;

Table altered.

SQL>  delete from temp_parent;

2 rows deleted.

SQL>  ALTER TABLE temp_child ENABLE VALIDATE CONSTRAINT what_the_fk;
 ALTER TABLE temp_child ENABLE VALIDATE CONSTRAINT what_the_fk
                                                   *
ERROR at line 1:
ORA-02298: cannot validate (APC.WHAT_THE_FK) - parent keys not found


SQL>

But if we include the mystical keyword VALIDATE in the disabling command we can corrupt our relational integrity quite nicely:


SQL> INSERT INTO temp_parent VALUES(1);

1 row created.

SQL> INSERT INTO temp_parent VALUES(2);

1 row created.

SQL> ALTER TABLE temp_child ENABLE CONSTRAINT what_the_fk;

Table altered.

SQL> ALTER TABLE temp_child DISABLE VALIDATE CONSTRAINT what_the_fk;

Table altered.

SQL> delete from temp_parent;

2 rows deleted.

SQL> ALTER TABLE temp_child ENABLE CONSTRAINT what_the_fk;

Table altered.

SQL> select * from temp_parent;

no rows selected

SQL>

Tuesday, September 18, 2007

What's worse than finding a NULL in a NUMBER column?

Finding half a worm! Oops, wrong punchline. No, the worst thing is finding a zero. Twice in the last week I have had to rewrite some code to handle an application which was using zero as a magic value instead of populating a numeric column with null.

One of these columns was a code which described the nature of some free text held in another column. 1 through 6 were assigned to pre-determined categories but the users can enter text outside of those categories. My program did a lookup to expand those codes with the category description; what I hadn't anticipated was that the rows for extra-mural text would have a code of 0 instead of null. NO_DATA_FOUND. Irritating but simple enough to handle.

The other case was slightly more worrying. Here the column was one of two; the other column had a data type of date. One or the other column or neither could be populated, but not both. Overwhelmingly the date column was populated . But when the date column was null roughly have the number columns were zero. Given the nature of the data it was not credible that half those rows would really have a value of zero. Most of them should have been null. However, for a handful of those records zero was probably a legitimate value. It was just impossible to tell which rows they were.

And that is the problem with "handling" nulls by using magic values instead. The magic values often need special handling themselves. Also they can subvert our data integrity. For instance substituting a null with zero may not affect the calculation of a total but will it will change the value of an average.

Nulls are annoying and it is a good idea to avoid them in a data model. But few of us are prepared to go to 6NF. Thus we have columns for which we cannot assign values. So be it. Much better to face the fact and use nulls than to fill our columns with magic values. In a very real sense disguising nulls in such a fashion is just like having an exception handler to suppress exceptions:


    ...
exception
    when no_data_found then
        null;
    ....

We need to know.

Thursday, September 13, 2007

Stupid things to do with triggers: Pt.42 in a series of 94

As is well known, the concept of :OLD values in an INSERT trigger makes no sense. Obviously there is no OLD record in such a scenario case. What is perhaps less well known is that Oracle will let us build an INSERT trigger which references an :OLD value.


SQL> create or replace trigger t1
  2  before insert on t1
  3  for each row
  4  when ((old.col2 is null) or (new.col3 is null))
  5  begin
  6      if :old.col2 is null then
  7          :new.col3 := 'COL2 is null';
  8      elsif :new.col3 is null then
  9          :new.col3 := 'COL3 is null';
 10      end if;
 11  end;
 12  /

Trigger created.

SQL>

This means we can use the trigger to corrupt our own data without really trying...


SQL> insert into t1 (col1) values (1)
  2  /

1 row created.

SQL> insert into t1 (col1,col2) values (99,56)
  2  /

1 row created.

SQL> select * from t1
  2  /
      COL1       COL2 COL3
---------- ---------- --------------------------------
         1            COL2 is null
        99         56 COL2 is null

SQL>

Obviously it is unlikely that anybody would deliberately set out to do this. But it is the sort of mistake we might make by confusing the scope of variables between the INSERT and UPDATE events. Another good reason for having separate triggers for each event. Or indeed, if you are of the anti-trigger persuasion, another good reason why triggers should never have been invented in the first place.

Wednesday, February 07, 2007

CREATE SCHEMA: a SQL curiosity

Pete Finnegan picked up on my recent piece USER != SCHEMA and linked to his article on the CREATE SCHEMA statement. This was prescient on his part, as I had decided against discussing it statement in that article (for reasons of length). There is nothing wrong with Pete's piece but I thought expanding on CREATE SCHEMA might be helpful, as one of the other people who commented seemed confused about it.

And who wouldn't be? For start, it's not really CREATE SCHEMA, it's CREATE SCHEMA AUTHORIZATION. ....


SQL> create schema c
  2  create table t3 (col1 number, col2 number)
  3  /
create schema c
              *
ERROR at line 1:
ORA-02420: missing schema authorization clause

SQL>

Furthermore, we are not creating a schema, we are adding new objects to a pre-existing schema.


SQL> create schema authorization c
  2  create table t3 (col1 number, col2 number)
  3  /
create schema authorization c
*
ERROR at line 1:
ORA-02421: missing or invalid schema authorization identifier


SQL>

The schema authorization identifier is invalid because my database does not have a user C. So let's try again with good ol' user A, who already has some objects in his schema.


SQL> select object_type, object_name from user_objects
  2  where object_type in ('TABLE', 'VIEW')
  3  /
OBJECT_TYPE        OBJECT_NAME
------------------ ----------------
TABLE              T1
TABLE              T2
VIEW               V1

3 rows selected.

SQL> create schema authorization a
  2  create view v2 as select * from t2
  3  /

Schema created.

SQL>

A misleading response there: the schema already existed. But I suppose "Schema authorization applied" is a bit of a mouthful.

The cool thing about CREATE SCHEMA is that we can put together several CREATE statements and run them as a single transaction. So if one of the CREATE statements fails they all fail. It's the closest Oracle gets to being able to rollback DDL statements.


SQL> create schema authorization a
  2  create table t3 (col1 number, col2 number)
  3  create table t1  (col1 number, col2 number)
  4  /
create table t1  (col1 number, col2 number)
*
ERROR at line 3:
ORA-02425: create table failed
ORA-00955: name is already used by an existing object

SQL> desc t3
ERROR:
ORA-04043: object t3 does not exist

SQL>

Unlike with the table statement a CREATE VIEW exception doesn't give out the underlying error when it fails (at least in 9.2.0.6)....


SQL> create schema authorization a
  2  create table t3 (col1 number, col2 number)
  3  create view v1 as select * from b.t1
  4  /
create schema authorization a
*
ERROR at line 1:
ORA-02427: create view failed
SQL>

As it happens we know the view already exists, so let's presume the error is ORA-955. Normally we could work around that with CREATE OR REPLACE VIEW but ...


SQL> create schema authorization a
  2  create table t3 (col1 number, col2 number)
  3  create or replace  view v1 as select * from b.t1
  4  /
create or replace  view v1 as select * from b.t1
       *
ERROR at line 3:
ORA-02422: missing or invalid schema element

SQL>

CREATE SCHEMA is a "contractual obligation" command: Oracle has it because the ANSI standard says it has to be there. In Oracle SQL only three commands are supported: CREATE TABLE, CREATE VIEW and GRANT. (In DB2 the statement also supports creating indexes). It also only supports standard SQL, so there are some proprietary Oracle SQL which will cause the statement to hurl. CREATE SCHEMA is actually very restricted in its scope and consequently is of limited usefulness. I don't think it can replace a proper regression script for doing database deployments.

There is one last gotcha. Although the CREATE statements bundled with the CREATE SCHEMA statement are transactional and appear to be rollback-able, the statement itself is still plain DDL and issues the implicit commit before it executes.


SQL> insert into t1 values (566, 888)
  2  /

1 row created.

SQL> create schema authorization a
  2  create table t3 (col1 number, col2 number)
  3  create   view v1 as select * from b.t1
  4  /
create schema authorization a
*
ERROR at line 1:
ORA-02427: create view failed

SQL> rollback
  2  /

Rollback complete.

SQL> select * from t1
  2  /
      COL1 COL2
---------- ----------
       566 888
         1 AAAAAAAAAA

SQL>

Now, the ability to suspend the transactionality of individual DDLs would be quite helpful. My last installation (which was a change to an existing system) required the deployment of two new schemas with over a hundred tables each, plus indexes, several hundred types, procedures, packages and bodies, not to mention a number of changes to existing schemas. It would be nice to be able to rollback such gargantuan deployments if something goes wrong. But, even if the syntax allowed it, CREATE SCHEMA is not appropriate. Putting all that into a CREATE SCHEMA statement would have resulted in a command almost 2MB long. Debug that! The coming 11g Editioning feature (caveat: BETA!) strikes me as a more practical alternative.

So, CREATE SCHEMA looks like the SQL equivalent of the human appendix. It's there but it doesn't really do anything useful.

Friday, February 02, 2007

USER != SCHEMA

Most of us tend to bandy around the terms USER and SCHEMA is if they were synonyms, but in Oracle they are different objects. USER is the account name, SCHEMA is the set of objects owned by that user. True, Oracle creates the SCHEMA object as part of the CREATE USER statement and the SCHEMA has the same name as the USER but it is quite easy to demonstrate that they are different things.


SQL> conn b/b
Connected.
SQL> select * from my_tab
  2  /
 OBJECT_ID OBJECT_NAME
---------- ------------------------------
     80116 BIG_PK
     80114 BIG_TABLE
     45709 BP
     45710 BP_PK

SQL> grant select on my_tab to a
  2  /

Grant succeeded.

SQL> conn a/a
Connected.
SQL> select * from my_tab
  2  /
 OBJECT_ID OBJECT_NAME
---------- ------------------------------
     80404 ACTOR
     80414 ACTOR_NT
     45707 AP
     45708 AP_PK
     52765 ASSIGNMENT
     49747 A_ID
     52768 A_OBJTYP

7 rows selected.

SQL> alter session set current_schema=b
  2  /

Session altered.

SQL> select * from my_tab
  2  /
 OBJECT_ID OBJECT_NAME
---------- ------------------------------
     80116 BIG_PK
     80114 BIG_TABLE
     45709 BP
     45710 BP_PK

SQL> select username, schemaname 
  2  from v$session 
  3  where sid in (select sid from v$mystat)
  4  /
USERNAME  SCHEMANAME
--------- -----------
A         B

SQL>

The important thing to remember about alter session set current_schema is that it only changes the default schema. It does not change the privileges we have on that schema and it does not change the results when we issues queries that depend upon username, for instance against the USER_ views.


SQL> conn u2/u2
Connected.
SQL> select table_name from user_tables
  2  /
TABLE_NAME
------------------------------
T1
T2
T3

SQL> select * from t1
  2  /
      COL1 COL2
---------- ----------
         1 BBBBBBBBBB

SQL> grant select on t1 to u1
  2  /

Grant succeeded.

SQL> conn u1/u1
Connected.
SQL> select table_name from user_tables
  2  /
TABLE_NAME
------------------------------
T1
T2

SQL> select * from t1
  2  /
      COL1 COL2
---------- ----------
         1 AAAAAAAAAA

SQL> alter session set current_schema=U2
  2  /

Session altered.

SQL> select * from t1
  2  /
      COL1 COL2
---------- ----------
         1 BBBBBBBBBB

SQL> select * from t2
  2  /
select * from t2
              *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> select * from u1.t2
  2  /

no rows selected

SQL> select table_name from user_tables
  2  /
TABLE_NAME
------------------------------
T1
T2

SQL>

Of course, the confusion stems partly from the fact that there is a one-to-one correspondence between USER and SCHEMA, and a user's schema shares its name. But the fact that people who ought to know better use them interchangeably (me included) doesn't help matters.

Update

If you found this interesting you might also want to read a piece I have published on the CREATE SCHEMA statement.