Friday, September 18, 2020

The use and misuse of %TYPE and %ROWTYPE attributes in PL/SQL APIs

PL/SQL provides two attributes which allow us to declare a data structure with its datatype derived from a database table or a previously declared variable.

We can use %type attribute for defining a constant, a variable, a collection element, record field or PL/SQL program parameters. While we can reference a previously declared variable, the most common use case is to tie the declaration to a table column. The following snippet declares a variable with the same datatype and characteristics (length, scale, precision) as the SAL column of the EMP table.

l_salary emp.sal%type;
We can use the %rowtype attribute to declare a record variable which matches the projection of a database table or view, or a cursor variable. The following snippet declares a variable with the same projection as the preceeding cursor.
cursor get_emp_dets is
  select emp.empno
         , emp.ename
         , emp.sal
         , dept.dname
  from   emp
  inner join dept on dept,deptno = emp.deptno;        
l_emp_dets get_emp_dets%rowtype;
Using these attributes is considered good practice. PL/SQL development standards will often mandate their use. They deliver these benefits:
  1. self-documenting code: if we see a variable with a definition which references emp.sal%type we can be reasonably confident this variable will be used to store data from the SALARY column of the EMP table.
  2. datatype conformance: if we change the scale or precision of the the SALARY column of the EMP table all variables which use the %type attribute will pick up the change automatically. If we add a new column to the EMP table, all variables defined with the %rowtype attribute will be able to handle that column without us needing to change those programs.
That last point comes with an amber warning: the automatic conformance only works when the %rowtype variable is populated by SELECT * FROM queries. If we are using an explicit projection with named columns then we have now broken our code and we need to fix it. More generally, this silent propagation of changes to our data structures means we need to pay more attention to impact analysis. Is it right that we can just change a column's datatype or amend a table's projection without changing the code which depends on them? Maybe it's okay, maybe not. By shielding us from the immediate impact of broken code, using these attributes also withholds the necessity to revisit our programs: so we have to remember to do it.

Overall I think the benefits listed above outweigh the risks, and I think we should always use these attributes whenever it is appropriate, for the definition of local variables and constants. However, complications arise if we use them to declare PL/SQL program parameters, specifically for procedures in package specs and standalone program units. It's not so bad if we're writing an internal API but it becomes a proper headache when we are dealing with a public API, one which will be called by programs owned by another user, one whose developers are in another team or outside our organisation, or even using Java, dotNet or whatever. So why is the use of these attributes so bad for those people?

  1. obfuscated code: these attributes are only self-documenting when we have a familiarity with the underlying schema, or have easy access to it. This will frequently not be the case for developers in other teams (or outside the organisation) who need to call our API. They may be able to guess at the datatype of SALARY or HIREDATE, but they really shouldn't have to. And, of course, a reference to emp%rowtype is about as unhelpful as it could be. Particularly when we consider ...
  2. loss of encapsulation: one purpose of an API is to shield consumers of our application from the gnarly details of its implementation. However, the use of %type and %rowtype is actually exposing those details. Furthermore, a calling program cannot define their own variables using these attributes unless we grant them SELECT on the tables. Otherwise the declaration will hurl PLS-00201. This is particularly problematic for handling %rowtype, because we need to define a record variable which matches the row structure.
  3. breaking the contract: an interface is an agreement between the provider and the calling program. The API defines input criteria and in return guarantees outcomes. It forms a contract, which allows the consumer to write code against stable definitions. Automatically propagating changes in the underlying data structures to parameter definitions creates unstable dependencies. It is not simply that the use of %type and %rowtype attributes will cause the interface to change automatically, the issue is that there is no mechanism for signalling the change to an API's consumers. Interfaces demand stable dependencies: we must manage any changes to our schema in a way which ideally allows the consumers to continue to use the interface without needing to change their code, but at the very least tells them that the interface has changed.

Defining parameters for public APIs

The simplest solution is to use PL/SQL datatypes in procedural signatures. These seem straightforward. Anybody can look at this function and understand that input parameter is numeric and the returned value is a string.
function get_dept_manager (p_deptno in number) return varchar2;
So clear but not safe. How long is the returned string? The calling program needs to know, so it can define an appropriately sized variable to receive it. Likewise, in this call, how long is can a message be?
procedure log_message (p_text in varchar2);
Notoriously we cannot specify length, scale or precision for PL/SQL parameters. But the calling code and the called code will write values to concretely defined types. The interface needs to communicate those definitions. Fortunately PL/SQL offers a solution: subtypes. Here we have a substype which explicitly defines the datatype to be used for passing messages:
subtype st_message_text is varchar2(256);

procedure log_message (p_text in st_message_text);
Now the calling program knows the maximum permitted length of a message and can trim its value accordingly. (Incidentally, the parameter is still not constrained in the called program so we can pass a larger value to the log_message() procedure: the declared length is only enforced when we assign the parameter to something concrete such as a local variable.)

We can replace %rowtype definitions with explicit RECORD defintions. So a function which retrieves the employee records for a department will look something like this:

subtype st_deptno is number(2,0);

type r_emp is record(
  empno          number(4,0),
  ename          varchar2(10),
  job            varchar2(9),
  mgr            number(4,0),
  hiredate       date
  sal            number(7,2),
  comm           number(7,2),
  deptno         st_deptno
);
  
type t_emp is table of r_emp;  

function get_dept_employees (p_deptno in st_deptno) return t_emp;
We do this for all our public functions.
subtype st_manager_name is varchar2(30);

function get_dept_manager (p_deptno in st_deptno) return st_manager_name;
Now the API clearly documents the datatypes which calling programs need to pass and which they will receive as output. Crucially, this approach offers stability: the datatype of a parameter cannot be changed invisibly, as any change must be implemented in a new version of the publicly available package specification. Inevitably this imposes a brake on our ability to change the API but we ought not to be changing public APIs frequently. Any such change should arise from either new knowledge about the requirements or a bug in the data model. Wherever possible we should try to handle bugs internally within the schema. But if we have to alter the signature of a procedure we need to communicate the change to our consumers as far ahead of time as possible. Ideally we should shield them from the need to change their code at all. One way to achieve that is Edition-Based Redefinition. Other ways would be to deploy the change with overloaded procedures or even using a different procedure name, and deprecate the old procedure. Occasionally we might have no choice but to apply the change and break the API: sometimes with public interfaces the best we can do is try to annoy the fewest number of people.

Transitioning from a private to a public interface

There is a difference between internal and public packages. When we have procedures which are intended for internal usage (i.e. only called by other programs in the same schema) we can define their parameters with %type and %rowtype attributes. We have access and - it is to be hoped! - familiarity with the schema's objects, so the datatype anchoring supports safer coding. But what happens when we have a package which we wrote as an internal package but now we need to expose its functionality to a wider audience? Should we re-write the spec to use subtypes instead?

No. The correct thing to do is to write a wrapper package which acts as a facade over the internal one, and grant EXECUTE privileges on the wrapper. The wrapper package will obviously have the requisite subtype definitions in the spec, and procedures declared with those subtypes. The package body will likely consist of nothing more than those procedures, which simply call their equivalents in the internal package. There may be some affordances for translating data structures, such as populating a table %rowtype variable from the public record type, but those will usually be necessary only for the purposes of documentation (this publicly defined subtype maps to this internally defined table column). There is an obvious overhead to writing another package, especially one which is really just a pass-through to the real functionality, but there are clear benefits which justify the overhead:

  • Stability. Not re-writing an existing package is always a good thing. Even if we are mechanically just replacing one set of datatype definitions with a different set which have the same characteristics we are still changing the core system, and that's a chunk of regression testing we've just added to the task.
  • Least privilege escalation. Even if the internal package has been written with a firm eye on the SOLID principles, the chances are it contains more functionality than we need to expose to other consumers. Writing a wrapper package gives us the opportunity to grant access to only the required procedures.
  • Composition. It is also likely that the internal package doesn't have the exact procedure the other team needs. Perhaps there are actually two procedures they need to call, or there's one procedure but it has some confusing internal flags in its signature. Instead of violating the Law of Demeter we can define one simple procedure in the wrapper package spec and handle the internal complexity in the body.
  • Future proofing. Writing a wrapper package gives us an affordance where we can handle subsequent changes in the internal data model or functionality without affecting other consumers. By definition a violation of YAGNI, but as it's not the main reason why we're doing this I'm allowing this as a benefit.

Design is always a trade off

The use of these attributes is an example of the nuances which Coding Standards often lack. In many situations their use is good practice, and we should employ them in those cases. But we also need to know when their use is a bad practice, and why, so we can do something better instead.

Part of the Designing PL/SQL Programs series

No comments: