Here is a simplified version of an ETL framework which uses SQL Types in a similar fashion to the approach described in my blog post here. The loading process is defined using an abstract non-instantiable Type like this:
create or replace type load_t force as object ( txn_date date , tgt_name varchar2(30) , member function load return number , final member function get_tgt return varchar2 ) not final not instantiable; / create or replace type body load_t as member function load return number is begin return 0; end load; final member function get_tgt return varchar2 is begin return self.tgt_name; end get_tgt; end; /
The concrete behaviour for each target table in the ABC feed is defined by sub-types like this:
create or replace type load_tgt1_t under load_t ( overriding member function load return number , constructor function load_tgt1_t (self in out nocopy load_tgt1_t , txn_date date) return self as result ) ; / create or replace type body load_tgt1_t as overriding member function load return number is begin insert into tgt1 (col1, col2) select to_number(col_a), col_b from stg_abc stg where stg.txn_date = self.txn_date; return sql%rowcount; end load; constructor function load_tgt1_t (self in out nocopy load_tgt1_t , txn_date date) return self as result is begin self.txn_date := txn_date; self.tgt_name := 'TGT1'; return; end load_tgt1_t; end; /This approach is neat because ETL is a fairly generic process: the mappings and behaviour for a particular target table are specific but the shape of the loading process is the same for any and all target tables. So we can build a generic PL/SQL procedure to handle them. This simplistic example does some logging, loops through a set of generic objects and, through the magic of polymorphism, calls a generic method which executes specific code for each target table:
procedure load (p_txn_date in date , p_load_set in sys_refcursor) is type loadset_r is record ( tgtset load_t ); lrecs loadset_r; load_count number; begin logger.logm('LOAD START::txn_date='||to_char(p_txn_date,'YYYY-MM-DD')); loop fetch p_load_set into lrecs; exit when p_load_set%notfound; logger.logm(lrecs.tgtset.get_tgt()||' start'); load_count := lrecs.tgtset.load(); logger.logm(lrecs.tgtset.get_tgt()||' loaded='||to_char(load_count)); end loop; logger.logm('LOAD FINISH'); end load;So far, so abstract. The catch is the procedure which instantiates the objects:
procedure load_abc_from_stg (p_txn_date in date) is rc sys_refcursor; begin open rc for select load_tgt1_t(p_txn_date) from dual union all select load_tgt2_t(p_txn_date) from dual; load(p_txn_date, rc); end load_abc_from_stg;On casual inspection it doesn't seem problematic but the call to the
load()
procedure gives the game away. Both procedures are in the same package:
create or replace package loader as procedure load (p_txn_date in date , p_load_set in sys_refcursor); procedure load_abc_from_stg (p_txn_date in date); end loader; /So the package mixes generic and concrete functionality. What makes this a problem? After all, it's all ETL so doesn't the package follow the Single Responsibility Principle? Well, up to a point. But if we want to add a new table to the ABC feed we need to update the LOADER package. Likewise if we want to add a new feed, DEF, we need to update the LOADER package. So it breaks the Stable Abstractions principle. It also creates dependency problems, because the abstract
load()
process has dependencies on higher level modules. We can't deploy the LOADER package without deploying objects for all the feeds.
Applying the Dependency Inversion Principle.
The solution is to extract theload_abc()
procedure into a concrete package of its own. To make this work we need to improve the interface between the load()
procedure and programs which call it. Both sides of the interface should depend on a shared abstraction.
The LOADER package is now properly generic:
create or replace package loader as type loadset_r is record ( tgtset load_t ); type loadset_rc is ref cursor return loadset_r; procedure load (p_txn_date in date , p_load_set in loadset_rc) authid current_user ; end loader; /The
loadset_r
type has moved into the package specification, and defines a strongly-typed ref cursor. The load()
procedure uses the strongly-typed ref cursor.
Similarly the LOAD_ABC package is wholly concrete:
create or replace package loader_abc as procedure load_from_stg (p_txn_date in date); end loader_abc; / create or replace package body loader_abc as procedure load_from_stg (p_txn_date in date) is rc loader.loadset_rc; begin open rc for select load_tgt1_t(p_txn_date) from dual union all select load_tgt2_t(p_txn_date) from dual; loader.load(p_txn_date, rc); end load_from_stg; end loader_abc; /Both package bodies now depend on abstractions: the strongly-typed ref cursor in the LOADER specification and the LOADER_T SQL Type. These should change much less frequently than the tables in the feed or even the loading process itself. This is the Dependency Inversion Principle in action.
Separating generic and concrete functionality into separate packages produces a more stable application. Users of a feed package are shielded from changes in other feeds. The LOADER package relies on strongly-typed abstractions. Consequently we can code a new feed package which can call
loader.load()
without peeking into that procedure's implementation to see what it's expecting.
No comments:
Post a Comment