Radio Free Tooting: 2017

Sunday, December 31, 2017

Data Access Layer vs Table APIs

One of the underlying benefits of PL/SQL APIs is the enabling of data governance. Table owners can shield their tables behind a layer of PL/SQL. Other users have no access to the tables directly but only through stored procedures. This confers many benefits:

Calling programs code against a programmatic interface. This frees the table owner to change the table's structure whenever it's necessary without affecting its consumers.
Likewise the calling programs get access to the data they need without having to know the details of the table structure, such as technical keys.
The table owner can use code to enforce complicated business rules when data is changed.
The table owner can enforce sophisticated data access policies (especially for applications using Standard Edition without DBMS_RLS).

So naturally the question arises, is this the same as Table APIs?

Table APIs used to be a popular approach to encapsulating tables. The typical Table API comprised two packages per table; one package provided methods for inserting, updating and deleting records, and the other package provided query methods. The big attraction of Table APIs was that they could be 100% generated from the data dictionary - both Oracle Designer and Steven Feuerstein's QNXO library provided TAPI generators. And they felt like good practice because, y'know, access to the tables was shielded by a PL/SQL layer.

But there are several problems with Table APIs.

The first is that they entrench row-by-agonising-row processing. Table APIs have their roots in early versions of Oracle so the DML methods only worked with a single record. Even after Oracle 8 introduced PL/SQL collection types TAPI code in the wild tended to be RBAR: there seems to something in the brain of the average programmer which predisposes them to prefer loops executing procedural code rather than set operations.

The second is that they prevent SQL joins. Individual records have to be selected from one table to provide keys for looking up records in a second table. Quite often this leads to loops within loops. So-called PL/SQL joins prevent the optimizer from choosing good access paths when handling larger amounts of data.

The third issue is that it is pretty hard to generate methods for all conceivable access paths. Consequently the generated packages had a few standard access paths (primary key, indexed columns) and provided an dynamic SQL method which accepted a free text WHERE clause. Besides opening the package to SQL injection this also broke the Law of Demeter: in order to pass a dynamic WHERE clause the calling program needed to know the structure of the underlying table, which defeats the whole objective of encapsulation.

Which leads on to the fourth, more philosophical problem with Table APIs: there is minimal abstraction. Each package is generated so it fits very closely to the structure of the Table. If the table structure changes we have to regenerate the TAPI packages: the fact that this can be done automatically is scant recompense for the tight coupling between the Table and the API.

So although Table APIs could be mistaken for good practice in actuality they provide no real benefit. The interface is 1:1 with the table structure so it has no advantage over granting privileges on the table. Combined with the impact of RBAR processing and PL/SQL joins on performance and the net effect of Table APIs is disastrous.

We cannot generate good Data Access APIs: we need to write them. This is because the APIs should be built around business functions rather than tables. The API packages granted to other users should comprise procedures for executing transactions. A Unit Of Work is likely to touch more than one table. These have to be written by domain experts who understand the data model and the business rules.

Part of the Designing PL/SQL Programs series

Friday, December 29, 2017

On hitting 100K on StackOverflow

100,000 is just another number. It's one more than 99,999. And yet, and yet. We live in a decimal cultural. We love to see those zeroes roll up. Order of magnitude baby! It's the excitement of being a child, going on a journey in the family car when the odometer reads 99994. knowing you'll see 100000. Of course everybody got distracted by the journey and next time you look at the dial it reads 100002.

Earlier this year my StackOverflow reputation passed 100,000. Like the car journey I missed the actual moment. My rep had been 99,986 when I last checked the previous evening and 100,011 the next day. Hey ho.

Reputation is a big deal on StackOverflow because it is the prime measure of contribution. As a Q&A site (not a forum - that confuses a lot of people) it needs content, it needs good questions and good answers. Reputation points are the reward for good posts. In this context good is determined democratically: people vote up good questions and good answers, and - crucially - vote down poor questions and answers. Votes are the main way of gaining reputation points: +5 for an upvoted question, +10 for an upvoted answer and +15 for an accepted answer. (There are other ways of gaining - and losing - rep) but posting is the main one.

"Reputation is a rough measurement of how much the community trusts you; it is earned by convincing your peers that you know what you’re talking about." Meta Stack Exchange FAQ

So is reputation just a way of keeping score? Nope: it is gamification but there is more to it than that. Reputation means points and what do points make? ~~Prizes~~ Privileges. StackOverflow is largely a self-policing community. There are full-on (elected) moderators but most moderation is actually carried out by regular SO users with sufficient rep. Somebody has asked an unclear question: once you have 50 rep you can post a comment asking for clarification. Got a user who doesn't know how to turn off the CAPSLOCK key? With 2000 rep you can just edit their post and apply sentence case. And so on.

Hmmm, so StackOverflow rewards its keenest contributors by allowing them to do chores around the site. Yes and it works. One of the big problems with forums is other users. Not griefers as such but there are a lot of low-level irritations: users who don't know how to search the site, or how to format their posts, or just generally fail to understand etiquette. Granting increasing moderation privileges at reputation milestones allows committed users to smooth away soem of those irritations.

But still, getting to 100,000 took eight years and almost 3000 answers. Was it worth it? Of course. It's nice to give back to the community. We are here to help: upvotes and accepted answers provide a nice feedback that we've succeeded. Downvotes also provide a necessary corrective (even if it is annoying when some rando dings you on an answer from five years back without leaving comment). And while there are no prizes, when you get to 100,000 you do get swag. A big box of swag:

Here is the box with a standard reference pear so you can see just how big it is.

Inside there is - a pen ....

Some stickers ....

A StackOverflow T-shirt (I have negotiated with my better half to keep this one) ...

And an over-sized coffee mug...

One more thing. There are also badges. Badges are nudges to encourage desirable behaviour such as editing posts, voting in moderator elections, reviewing posts, offering bounties, being awesome. Because let's face it, badges are cool. More badges = more flair. And who doesn't want more flair? Got flair? Heck yeah!

Wednesday, May 31, 2017

Avoiding Coincidental Cohesion

Given that Coincidental Cohesion is bad for our code base so obviously we want to avoid writing utilities packages. Fortunately it is mostly quite easy to do so. It requires vigilance on our part. Utilities packages are rarely planned. More often we are writing a piece of business functionality when we find ourselves in need of some low level functionality. It doesn't fit in the application package we're working on, perhaps we suspect that it might be more generally useful, so we need somewhere to put it.

The important thing is to recognise and resist the temptation of the Utilities package. The name itself (and similarly vague synonyms like helper or utils) should be a red flag. When we find ourselves about to type create or replace package utilities we need to stop and think: what would be a better name for this package? Consider whether there are related functions we might end up needing? Suppose we're about to write a function to convert a date into Unix epoch string. It doesn't take much imagine to think we might need a similar function to convert a Unix timestamp into a date. We don't need to write that function now but let's start a package dedicated to Time functions instead of a miscellaneous utils package.

Looking closely at the programs which comprise the DBMS_UTILITY package it is obviously unfair to describe them as a random selection. In fact that there seven or eight groups of related procedures.

DB Info

INSTANCE_RECORD Record Type
DBLINK_ARRAY Table Type
INSTANCE_TABLE Table Type
ACTIVE_INSTANCES Procedure
CURRENT_INSTANCE Function
DATA_BLOCK_ADDRESS_BLOCK Function
DATA_BLOCK_ADDRESS_FILE Function
DB_VERSION Procedure
GET_ENDIANNESS Function
GET_PARAMETER_VALUE Function
IS_CLUSTER_DATABASE Function
MAKE_DATA_BLOCK_ADDRESS Function
PORT_STRING Function

Runtime Messages

FORMAT_CALL_STACK Function
FORMAT_ERROR_BACKTRACE Function
FORMAT_ERROR_STACK Function

Object Management

COMMA_TO_TABLE Procedures
COMPILE_SCHEMA Procedure
CREATE_ALTER_TYPE_ERROR_TABLE Procedure
INVALIDATE Procedure
TABLE_TO_COMMA Procedures
VALIDATE Procedure

Object Info (Object Management?)

INDEX_TABLE_TYPE Table Type
LNAME_ARRAY Table Type
NAME_ARRAY Table Type
NUMBER_ARRAY Table Type
UNCL_ARRAY Table Type
CANONICALIZE Procedure
GET_DEPENDENCY Procedure
NAME_RESOLVE Procedure
NAME_TOKENIZE Procedure

Session Info

OLD_CURRENT_SCHEMA Function
OLD_CURRENT_USER Function

SQL Manipulation

EXPAND_SQL_TEXT Procedure
GET_SQL_HASH Function
SQLID_TO_SQLHASH Function

Statistics (deprecated))

ANALYZE_DATABASE Procedure
ANALYZE_PART_OBJECT Procedure
ANALYZE_SCHEMA Procedure

Time

GET_CPU_TIME Function
GET_TIME Function
GET_TZ_TRANSITIONS Procedure

Unclassified

WAIT_ON_PENDING_DML Function
EXEC_DDL_STATEMENT Procedure
GET_HASH_VALUE Function
IS_BIT_SET Function

We can see an alternative PL/SQL code suite, with several highly cohesive packages. But there will be some procedures which are genuinely unrelated to anything else. The four procedures in the Unclassified section above are examples. But writing a miscellaneous utils package for these programs is still wrong. There are better options.

Find a home. It's worth considering whether we already have a package which would fit the new function. Perhaps WAIT_ON_PENDING_DML() should have gone in DBMS_TRANSACTION; perhaps IS_BIT_SET() properly belongs in UTL_RAW.
A package of their own. Why not? It may seem extravagant to have a package with a single procedure but consider DBMS_DG with its lone procedure INITIATE_FS_FAILOVER(). The package delivers the usual architectural benefits plus it provides a natural home for related procedures we might discover a need for in the future.
Standalone procedure. Again, why not? We are so conditioned to think of a PL/SQL program as a package that we forget it can be just a Procedure or Function. Some programs are suited to standalone implementation.

So avoiding the Utilities package requires vigilance. Code reviews can help here. Preventing the Utilities package becoming entrenched is crucial: once we have a number of packages dependent on a Utilities package it is pretty hard to get rid of it. And once it becomes a fixture in the code base developers will consider it more acceptable to add procedures to it.

Part of the Designing PL/SQL Programs series

Utilities - the Coincidental Cohesion anti-pattern

One way to understand the importance of cohesion is to examine an example of a non-cohesive package, one exhibiting a random level of cohesion. The poster child for Coincidental Cohesion is the utility or helper package. Most applications will have one or more of these, and Oracle's PL/SQL library is no exception. DBMS_UTILITY has 37 distinct procedures and functions (i.e. not counting overloaded signatures) in 11gR2 and 38 in 12cR1 (and R2). Does DBMS_UTILITY deliver any of the benefits the PL/SQL Reference says packages deliver?

Easier Application Design?

One of the characteristics of utilities packages is that they aren't designed in advance. They are the place where functionality ends up because there is no apparently better place for it. Utilities occur when we are working on some other piece of application code; we discover a gap in the available functionality such as hashing a string. When this happens we generally need the functionality now: there's little benefit to deferring the implementation until later. So we write a GET_HASH_VALUE() function,x stick it in our utilities package and proceed with the task at hand.

The benefit of this approach is we keep our focus on the main job, delivering business functionality. The problem is, we never go back and re-evaluate the utilities. Indeed, now there is business functionality which depends on them: refactoring utilities introduces risk. Thus the size of the utilities package slowing increases, one tactical implementation at a time.

Hidden Implementation Details?

Another characteristic of utility functions is that they tend not to share concrete implementations. Often a utilities package beyond a certain size will have groups of procedures with related functionality. It seems probable that DBMS_UTILITY.ANALYZE_DATABASE(), DBMS_UTILITY.ANALYZE_PART_OBJECT() and DBMS_UTILITY.ANALYZE_SCHEMA() share some code. So there are benefits to co-locating them in the same package. But it is unlikely that CANONICALIZE() , CREATE_ALTER_TYPE_ERROR_TABLE() and GET_CPU_TIME() have much code in common.

Added Functionality?

Utility functions are rarely part of a specific business process. They are usually called on a one-off basis rather than being chained together. So there is no state to be maintained across different function calls.

Better Performance?

For the same reason there is no performance benefit from a utilities package. Quite the opposite. When there is no relationship between the functions we cannot make predictions about usage. We are not likely to call EXPAND_SQL_TEXT() right after calling PORT_STRING(). So there is no benefit in loading the former into memory when we call the latter. In fact the performance of EXPAND_SQL_TEXT() is impaired because we have to load the whole DBMS_UTILITY package into the shared pool, plus it uses up a larger chunk of memory until it gets aged out. Although to be fair, in these days of abundant RAM, some unused code in the library cache need not be our greatest concern. But whichever way we bounce it, it's not a boon.

Grants?

Privileges on utility packages is a neutral concern. Often utilities won't be used outside the owning schema. In cases where we do need to make them more widely available we're probably granting access on some procedures that the grantee will never use.

Modularity?

From an architectural perspective, modularity is the prime benefit of cohesion. A well-designed library should be frictionless and painless to navigate. The problem with random assemblages like DBMS_UTILITY is that it's not obvious what functions it may contain. Sometimes we write a piece of code we didn't need to.

The costs of utility packages

Perhaps your PL/SQL code base has a procedure like this:

create or replace procedure run_ddl
  ( p_stmt in varchar2)
is
  pragma autonomous_transaction;
  v_cursor number := dbms_sql.open_cursor;
  n pls_integer;
begin
  dbms_sql.parse(v_cursor, p_stmt, dbms_sql.native);
  n := dbms_sql.execute(v_cursor);
  dbms_sql.close_cursor(v_cursor);
exception
  when others then
    if dbms_sql.is_open(v_cursor) then
      dbms_sql.close_cursor(v_cursor);
    end if;
    raise;
end run_ddl;
/

It is a nice piece of code for executing DDL statements. The autonomous_transaction pragma prevents the execution of arbitrary DML statements (by throwing ORA-06519), so it's quite safe. The only problem is, it re-implements DBMS_UTILITY.EXEC_DDL_STATEMENT().

Code duplication like this is a common side effect of utility packages. Discovery is hard because their program units are clumped together accidentally. Nobody sets out to deliberately re-write DBMS_UTILITY.EXEC_DDL_STATEMENT(), it happens because not enough people know to look in that package before they start coding a helper function. Redundant code is a nasty cost of Coincidental Cohesion. Besides the initial wasted effort of writing an unnecessary program there are the incurred costs of maintaining it, testing it, the risk of introducing bugs or security holes. Plus each additional duplicated program makes our code base a little harder to navigate.

Fortunately there are tactics for avoiding or dealing with this. Find out more.