Friday, September 18, 2020

The use and misuse of %TYPE and %ROWTYPE attributes in PL/SQL APIs

PL/SQL provides two attributes which allow us to declare a data structure with its datatype derived from a database table or a previously declared variable.

We can use %type attribute for defining a constant, a variable, a collection element, record field or PL/SQL program parameters. While we can reference a previously declared variable, the most common use case is to tie the declaration to a table column. The following snippet declares a variable with the same datatype and characteristics (length, scale, precision) as the SAL column of the EMP table.

l_salary emp.sal%type;
We can use the %rowtype attribute to declare a record variable which matches the projection of a database table or view, or a cursor variable. The following snippet declares a variable with the same projection as the preceeding cursor.
cursor get_emp_dets is
  select emp.empno
         , emp.ename
         , emp.sal
         , dept.dname
  from   emp
  inner join dept on dept,deptno = emp.deptno;        
l_emp_dets get_emp_dets%rowtype;
Using these attributes is considered good practice. PL/SQL development standards will often mandate their use. They deliver these benefits:
  1. self-documenting code: if we see a variable with a definition which references emp.sal%type we can be reasonably confident this variable will be used to store data from the SALARY column of the EMP table.
  2. datatype conformance: if we change the scale or precision of the the SALARY column of the EMP table all variables which use the %type attribute will pick up the change automatically. If we add a new column to the EMP table, all variables defined with the %rowtype attribute will be able to handle that column without us needing to change those programs.
That last point comes with an amber warning: the automatic conformance only works when the %rowtype variable is populated by SELECT * FROM queries. If we are using an explicit projection with named columns then we have now broken our code and we need to fix it. More generally, this silent propagation of changes to our data structures means we need to pay more attention to impact analysis. Is it right that we can just change a column's datatype or amend a table's projection without changing the code which depends on them? Maybe it's okay, maybe not. By shielding us from the immediate impact of broken code, using these attributes also withholds the necessity to revisit our programs: so we have to remember to do it.

Overall I think the benefits listed above outweigh the risks, and I think we should always use these attributes whenever it is appropriate, for the definition of local variables and constants. However, complications arise if we use them to declare PL/SQL program parameters, specifically for procedures in package specs and standalone program units. It's not so bad if we're writing an internal API but it becomes a proper headache when we are dealing with a public API, one which will be called by programs owned by another user, one whose developers are in another team or outside our organisation, or even using Java, dotNet or whatever. So why is the use of these attributes so bad for those people?

  1. obfuscated code: these attributes are only self-documenting when we have a familiarity with the underlying schema, or have easy access to it. This will frequently not be the case for developers in other teams (or outside the organisation) who need to call our API. They may be able to guess at the datatype of SALARY or HIREDATE, but they really shouldn't have to. And, of course, a reference to emp%rowtype is about as unhelpful as it could be. Particularly when we consider ...
  2. loss of encapsulation: one purpose of an API is to shield consumers of our application from the gnarly details of its implementation. However, the use of %type and %rowtype is actually exposing those details. Furthermore, a calling program cannot define their own variables using these attributes unless we grant them SELECT on the tables. Otherwise the declaration will hurl PLS-00201. This is particularly problematic for handling %rowtype, because we need to define a record variable which matches the row structure.
  3. breaking the contract: an interface is an agreement between the provider and the calling program. The API defines input criteria and in return guarantees outcomes. It forms a contract, which allows the consumer to write code against stable definitions. Automatically propagating changes in the underlying data structures to parameter definitions creates unstable dependencies. It is not simply that the use of %type and %rowtype attributes will cause the interface to change automatically, the issue is that there is no mechanism for signalling the change to an API's consumers. Interfaces demand stable dependencies: we must manage any changes to our schema in a way which ideally allows the consumers to continue to use the interface without needing to change their code, but at the very least tells them that the interface has changed.

Defining parameters for public APIs

The simplest solution is to use PL/SQL datatypes in procedural signatures. These seem straightforward. Anybody can look at this function and understand that input parameter is numeric and the returned value is a string.
function get_dept_manager (p_deptno in number) return varchar2;
So clear but not safe. How long is the returned string? The calling program needs to know, so it can define an appropriately sized variable to receive it. Likewise, in this call, how long is can a message be?
procedure log_message (p_text in varchar2);
Notoriously we cannot specify length, scale or precision for PL/SQL parameters. But the calling code and the called code will write values to concretely defined types. The interface needs to communicate those definitions. Fortunately PL/SQL offers a solution: subtypes. Here we have a substype which explicitly defines the datatype to be used for passing messages:
subtype st_message_text is varchar2(256);

procedure log_message (p_text in st_message_text);
Now the calling program knows the maximum permitted length of a message and can trim its value accordingly. (Incidentally, the parameter is still not constrained in the called program so we can pass a larger value to the log_message() procedure: the declared length is only enforced when we assign the parameter to something concrete such as a local variable.)

We can replace %rowtype definitions with explicit RECORD defintions. So a function which retrieves the employee records for a department will look something like this:

subtype st_deptno is number(2,0);

type r_emp is record(
  empno          number(4,0),
  ename          varchar2(10),
  job            varchar2(9),
  mgr            number(4,0),
  hiredate       date
  sal            number(7,2),
  comm           number(7,2),
  deptno         st_deptno
);
  
type t_emp is table of r_emp;  

function get_dept_employees (p_deptno in st_deptno) return t_emp;
We do this for all our public functions.
subtype st_manager_name is varchar2(30);

function get_dept_manager (p_deptno in st_deptno) return st_manager_name;
Now the API clearly documents the datatypes which calling programs need to pass and which they will receive as output. Crucially, this approach offers stability: the datatype of a parameter cannot be changed invisibly, as any change must be implemented in a new version of the publicly available package specification. Inevitably this imposes a brake on our ability to change the API but we ought not to be changing public APIs frequently. Any such change should arise from either new knowledge about the requirements or a bug in the data model. Wherever possible we should try to handle bugs internally within the schema. But if we have to alter the signature of a procedure we need to communicate the change to our consumers as far ahead of time as possible. Ideally we should shield them from the need to change their code at all. One way to achieve that is Edition-Based Redefinition. Other ways would be to deploy the change with overloaded procedures or even using a different procedure name, and deprecate the old procedure. Occasionally we might have no choice but to apply the change and break the API: sometimes with public interfaces the best we can do is try to annoy the fewest number of people.

Transitioning from a private to a public interface

There is a difference between internal and public packages. When we have procedures which are intended for internal usage (i.e. only called by other programs in the same schema) we can define their parameters with %type and %rowtype attributes. We have access and - it is to be hoped! - familiarity with the schema's objects, so the datatype anchoring supports safer coding. But what happens when we have a package which we wrote as an internal package but now we need to expose its functionality to a wider audience? Should we re-write the spec to use subtypes instead?

No. The correct thing to do is to write a wrapper package which acts as a facade over the internal one, and grant EXECUTE privileges on the wrapper. The wrapper package will obviously have the requisite subtype definitions in the spec, and procedures declared with those subtypes. The package body will likely consist of nothing more than those procedures, which simply call their equivalents in the internal package. There may be some affordances for translating data structures, such as populating a table %rowtype variable from the public record type, but those will usually be necessary only for the purposes of documentation (this publicly defined subtype maps to this internally defined table column). There is an obvious overhead to writing another package, especially one which is really just a pass-through to the real functionality, but there are clear benefits which justify the overhead:

  • Stability. Not re-writing an existing package is always a good thing. Even if we are mechanically just replacing one set of datatype definitions with a different set which have the same characteristics we are still changing the core system, and that's a chunk of regression testing we've just added to the task.
  • Least privilege escalation. Even if the internal package has been written with a firm eye on the SOLID principles, the chances are it contains more functionality than we need to expose to other consumers. Writing a wrapper package gives us the opportunity to grant access to only the required procedures.
  • Composition. It is also likely that the internal package doesn't have the exact procedure the other team needs. Perhaps there are actually two procedures they need to call, or there's one procedure but it has some confusing internal flags in its signature. Instead of violating the Law of Demeter we can define one simple procedure in the wrapper package spec and handle the internal complexity in the body.
  • Future proofing. Writing a wrapper package gives us an affordance where we can handle subsequent changes in the internal data model or functionality without affecting other consumers. By definition a violation of YAGNI, but as it's not the main reason why we're doing this I'm allowing this as a benefit.

Design is always a trade off

The use of these attributes is an example of the nuances which Coding Standards often lack. In many situations their use is good practice, and we should employ them in those cases. But we also need to know when their use is a bad practice, and why, so we can do something better instead.

Part of the Designing PL/SQL Programs series

Wednesday, September 09, 2020

Ten character classes your project team needs

A dungeon-crawling party requires a good mix of character classes to be successful. If everyone is a wizard there's nobody who can fight off the orc warband. Similarly, a software development team needs a range of character traits and aspects to successfully deliver working software which meets the project's goals. Here's my take.

Scavenger

The Scavenger understands the importance of not re-inventing the wheel. To this end they acquire an encyclopaedic understanding of our languages' built-in libraries, the existing features of our system and other systems in the wider organisation, and open-source libraries.

Unless given a precise list of the project's wants a Scavenger will become a Mutant Renegade, scouring the post-atomic wasteland for useless relics, which are broken or undocumented or both.

Aspect: "Here's one we made earlier"
Traits: Focused laziness, Unfocused research, GitHub

Rat King

The Rat King understands that software development is a communal task. Consequently they work to forge a collection of disparate individuals into a team. Their remit includes facilitating meetings and arranging after-work socials. Despite their fearsome appearance and collectivist instincts the Rat King is extremely sensitive to what each person brings to the party, and strives to ensure that introverts and teetotallers are included without feeling pressurized.

Unless met with a smidgeon of friendly scepticism a Rat King will become a Facebook.

Aspect: "We must hang together or we will surely hang separately"
Traits: Teamwork, Communications, Contacts

Paladin

The Paladin is the defender of the project but is also committed to the ideal of a project which is worth defending. They ensure everybody follows best practice, adheres to coding standards and observes the agile ceremonies.

Unless there's a Rogue to balance them a Paladin will become a Grand Inquisitor (although maybe without the thumbscrews).

Aspect: "Just do it right"
Traits: Rigour, Weird inner light

Rogue

The Rogue is pragmatic where the Paladin is dogmatic. They have a swashbuckling approach to getting things done. They understand the concept of technical debt, they just tip the trade-off toward delivering stuff over following the rules. Very fond of observing that there's no such thing as "best practice".

Unless kept in line a Rogue will become a Cowboy. Yee-hah!

Aspect: "Let's do the show right here"
Traits: Resourcefulness, Acute bullshit detector, Cynefin

Mad Scientist

The Mad Scientist has a deep technical understanding of software development, both practice and theory. They are obsessed with innovative and extremely clever solutions to business problems.

Unless your business problem actually requires an extremely clever solution a Mad Scientist will become an Evil Supervillain, who will derail the project (but, to be fair, will not destroy the entire planet. Probably).

Aspect: "My monster lives!"
Traits: Single-mindedness, Visionary

Lab Assistant

The Lab Assistant is vital to delivering the work of a Mad Scientist. They document APIs on wiki pages, they write build scripts and unit tests, they productionize the PoC code. In short, they undertake all the tedious essential tasks which would distract a Mad Scientist from their creation.

A Lab Assistant to an Evil Supervillain is still a Lab Assistant, but the wiki pages are half-complete, the unit tests don't run and the code isn't fit to be checked into source control.

Aspect: "Here is the brain you wanted"
Traits: Flexibility, Service to the higher cause

Major-domo

The Major-domo helps the project run smoothly by taking care of all the little things everybody else forgets. They clean the whiteboard before a meeting starts, they bring Sharpies and Post-It notes to the retrospective, they write Jiras for the stories we just agreed we needed and they circulate minutes after decision-making meetings.

Unless other people occasionally do some of these tasks a Major-domo will become a Resentful Skivvy.

Aspect: "I'll add that to my To-Do list"
Traits: Well-stocked stationary cupboard, Scrivener

Court Jester

The Court Jester says out loud the things everybody else is thinking. They aren't afraid to appear ridiculous in order to make a point. Their role is to speak truth to power.

If they go too far a Jester becomes an Angry Ranter, ignored and shunned by everybody.

Aspect: "The true fool stays silent in the face of foolishness"
Traits: Humour, Insight, Lack of inhibition

Bounty Hunter

The Bounty Hunter lives for finding and fixing bugs. They are never happier when writing test cases to reproduce a bug or stepping through lines of code in debug mode. They understand that fixing production code is more important than delivering a new feature.

Unless kept on a tight leash a Bounty Hunter will become a Mindless Delver or a Tinkerer.

Aspect: “To defeat the bug, we must understand the bug”
Traits: Sense of purpose, Perseverance, Debugging

Druid

The Druid has an understanding of ecosystem beyond our project's bounded context. They know what the business seeks to achieve and how our project furthers those goals. They also know about other projects in the organisation, and work to ensure our project integrates with them harmoniously.

Unless given a clear sense of our project's priorities and direction a Druid is still a Druid, just servicing the needs of other projects.

Aspect: "Listen to the trees, dude"
Traits: Awareness, Empathy, Balance

Multi-faceted characters

Obviously these aren't main character classes. A project team comprises base classes such as Developers, Testers, Analysts, Architects, heck maybe even a Project Manager. What I list here are ancillary classes, which modify a base class. An Architect, a Developer, an Analyst or a Project Manager can benefit from having a touch of the Druid about them. Any Developer should spend some time being a Bounty Hunter or Scavenger. Different circumstances demand different class behaviours. When there's a major outage in Production we need Rogues to fix it, not Paladins muttering about process and sign-off. But after Production is back it is the Paladins who make sure the problem and its resolution are properly documented, and appropriate preventative measures put in place. Most people on the team will flow through several of these class behaviours, even over the course a single sprint.

When we're forming a new team to deliver some piece of software we focus on the hard skills. the main character classes. We need this many Developers, this many Analysts, a UX expert, an SEO specialist, and so forth. These are the easy things to define. But the success of the project will in large part depend on the soft skills and temperaments of the individuals in the team. This is a lot harder to measure. It's why personality tests like Myers-Briggs and Insights exist: some people think they're hokum but they provide a framework for assessing the make-up of a team in an age when we're uncomfortable casting horoscopes or taking auguries from the liver of a freshly-slaughtered goose. Using RPG character classes as metaphors for desirable behaviours has the advantage of jokiness. There is a categorical absence of pyschological research underpinning this article. Also it doesn't require us to obtain live waterfowl.

One last thing. The next time you find yourself at a retrospective with no marker pens and nothing to write on, look around for a Major-domo. And if you can't spot one why not appoint yourself to the role?

Epilogue

That final paragraph makes me sad for the times when retrospectives happened in a room with other people, with a whiteboard covered in post-It notes. Let's hope we can do them like that again.

Tuesday, August 25, 2020

How PL/SQL Development Standards work

I have been gigging at a place which has documented PL/SQL Development Standards. This is not so unusual: most Oracle shops have such a document. What makes it unusual is that they enforce the standards. With code reviews. And I mean properly enforce: programs fail QA for egregious breaches of the standards or a sufficient accumulation of minor breaches. This is less common than it ought to be.

Many coders are sceptical about development standards; I have been in the past. Standards generally focus on things which are easy to standardise (indentation, case, naming conventions) rather functional correctness or design principles. They frequently codify arbitrary or outdated practices (mandating explicit cursors is a particular bugbear of mine). They either go into so much detail that they are unreadably long (and dull) or are so sketchy that they operate as easy-to-ignore guidelines. But I think many experienced developers' objections boil down to: I don't like being told how to write my code; my style is the best style; my code is clean, clear and readable.

The catch is, readability is not simply a function of personal style: it emerges from consistency across the entire codebase. Just because I find my personal coding style clear doesn't mean everbody else will. At the very least a colleague reading my program will have to invest time in understanding how I name my variables, how I use table aliases, and a dozen other things, none of them important individually but all together adding friction to the crucial task of understanding how a program works (or does not work).

This particular set of standards certainly had a lot to say about layout. Many strictures fitted with my natural coding style (all lower case, one column per line in a SQL projection, comma before the column name rather than after it). Others were rather tiresome: the rules for clause alignment entail a lot of spacing and backspacing to ensure elements line up. There are a few strictures I actively disagree with (notably mandatory use of SQL-89 syntax i.e. impicit joins). But here's the rub: I didn't get to pick and choose which of the standards I followed. I just had to knuckle down and follow them all. Because the discipline of the code review meant my programs failed QA when I hadn't applied the standards.

There's more to consistency than just layout and naming conventions. There's also functional consistency: use of SQL and PL/SQL idioms, how to organise programs within a package, and so forth. Too many things to cover in a single document. But again, code reviews enforce standardisation of these aspects, by applying undocumented conventions with the same rigour as documented standards. A couple of times I tripped over such an undocumented convention and it didn't feel fair: my code failed the review because I wrote something which was wrong even though not explicitly covered by the standards. One of these times it was something awry in the layout. "That's wrong", the reviewer said. It was a difference I hadn't even noticed, and probably you wouldn't have noticed either, and even if I had have noticed it I wouldn't have thought it was wrong. But it was different from what everybody else was doing. That made it wrong.

Everybody undertakes code reviews and everybody's code is reviewed. Thus code reviews shape the codebase, by enforcing documented standards and undocumented conventions. As a result this is the most readable codebase I have ever worked on. It's almost impossible to tell who wrote any given program, because all programs look the same. It's easy to reason about a piece of code because it follows rigorous naming conventions and consistent architectural principles. The code is habitable. A colleague can read a program I wrote and feel comfortable doing so. The layout, the naming conventions, the consistent selection of one approach in situations where PL/SQL offers more than one way of doing something, all these factors mean my program looks just like the program anybody else would have written. So the reader is freer to understand what the program actually does and how it works. Standardisation reduces friction.

It is a virtuous circle. Code reviews enforce a consistent programming style, which eliminates trivial (i.e. non-functional) differences in the program. In turn this makes the program easier to review: all the programs look basically the same which highlights the things which need to be different, the business logic and the data structures.

Readability is a feature

Readility is a feature. It's a feature our code must have. We all know readability makes code easier to maintain, easier to re-use, easier to debug. Yet still many developers bridle at the suggestion that their PL/SQL must look like everybody else's PL/SQL. I get this. It's not that I think the way I write PL/SQL is intrinsically correct, it just looks the way I have evolved to write it over the years. A new set of coding standards, rigorously applied, disrupts my flow. I must slow down to correct the variable names or fix the layout. It's tedious.

Tedious but also necessary. A sofware system is a shared enterprise. It's not "my" code, it's the project's code; I am just the person checking it into source control. As a discipline, programming is a craft not an art. PL/SQL is simply a device for turning data into business value. It's more important that other people on the team can work with our code than that it has our signature style. So let's not be precious about appearance. We must follow the rules, and save our self-expression for our poems and our tweets.

Above all, know this: there are no development standards without code reviews.

Thursday, July 30, 2020

Minimal declaration of foreign key columns

Here is the full declaration of an inline foreign key constraint (referencing a primary key column on a table called PARENT):
, parent_id number(12,0) constraint chd_par_fk foreign key references parent(parent_id)
But what is the fewest number of words required to implement the same constraint? Two. This does exactly the same thing:
, parent_id references parent
The neat thing about this minimalist declaration is the child column inherits the datatype of the referenced primary key column. Here's what it looks like (with an odd primary key declaration, just to prove the point):
 
SQL> create table parent1
  2  (parent_id number(15,3) primary key)
  3  /

Table PARENT1 created.

SQL> create table child1
  2  ( id        number(12,0) primary key
  3   ,parent_id references parent1) 
  4  /

Table CHILD1 created.

SQL> desc child1
Name      Null?    Type         
--------- -------- ------------ 
ID        NOT NULL NUMBER(12)   
PARENT_ID          NUMBER(15,3) 
SQL> 
If we want to specify a name for the foreign key we need to include the constraint keyword:
 
SQL> create table parent2
  2  (parent_id number(15,3) constraint par1_pk primary key)
  3  /

Table PARENT2 created.

SQL> create table child2
  2  ( id        number(12,0) constraint chd2_pk primary key
  3   ,parent_id              constraint chd2_par2_fk references parent2) 
  4  /

Table CHILD2 created.

SQL> desc child2
Name      Null?    Type         
--------- -------- ------------ 
ID        NOT NULL NUMBER(12)   
PARENT_ID          NUMBER(15,3) 
SQL> 
This minimal declaration always references the parent table's primary key. Suppose we want to reference a unique key rather than the primary key. (I would regard this as a data model smell, but sometimes we need to do it.) To make this work we need merely explicitly reference the unique key column:
SQL> create table parent3
  2  ( parent_id  number(15,3)          constraint par3_pk primary key
  3   ,parent_ref varchar2(16) not null constraint par3_uk unique
  4  )
  5  /

Table PARENT3 created.

SQL> create table child3
  2  ( id         number(12,0) constraint chd3_pk primary key
  3   ,parent_ref              constraint chd3_par3_fk references parent3(ref)) 
  4  /

Table CHILD3 created.

SQL> desc child3
Name       Null?    Type         
---------- -------- ------------ 
ID         NOT NULL NUMBER(12)   
PARENT_REF          VARCHAR2(16) 
SQL> 
Hmmm, neat. What if we have a compound primary key? Well, that's another data model smell but it still works. Because we're constraining multiple columns we need to use a table level constraint and so the syntax becomes more verbose; we need to include the magic words foreign key:
SQL> create table parent4
  2  ( parent_id  number(15,3)   
  3   ,parent_ref varchar2(16) 
  4   ,constraint par4_pk primary key (id, ref)
  5  )
  6  /

Table PARENT4 created.

SQL> create table child4
  2  ( id number(12,0) constraint chd4_pk primary key
  3   ,parent_id
  4   ,parent_ref
  5   ,constraint chd4_par4_fk foreign key (parent_id, parent_ref) references parent4) 
  6  /

Table CHILD4 created.

SQL> desc child4
Name       Null?    Type         
---------- -------- ------------ 
ID         NOT NULL NUMBER(12)   
PARENT_ID           NUMBER(15,3) 
PARENT_REF          VARCHAR2(16) 
SQL> 
Okay, but supposing we change the declaration of the parent column, does Oracle ripple the change to the child table?
 
SQL> alter table parent4 modify parent_ref varchar2(24);

Table PARENT4 altered.

SQL> desc child4
Name       Null?    Type         
---------- -------- ------------ 
ID         NOT NULL NUMBER(12)   
PARENT_ID           NUMBER(15,3) 
PARENT_REF          VARCHAR2(16) 
SQL> 
Nope. And rightly so. This minimal syntax is a convenience when we're creating a table, but there's no object-style inheritance mechanism.

Generally I prefer a verbose declaration over minimalism, because clarity trumps concision. I appreciate the rigour of enforcing the same datatype on both ends of a foreign key constraint. However, I hope that in most cases our CREATE TABLE statements have been generated from a data modelling tool. So I think this syntactical brevity is a neat thing to know about, but of limited practical use.