Sunday, January 4, 2015

Continuing the story

One of my 2015 new year's resolutions, was to finish the story I started on this blog. The story being a talk that I had delivered a couple of times 3-4 years ago on various Oracle/usergroup events. The talk is about why I think triggers should *not* be considered harmful. There is one specific use-case for triggers -- implementing validation code for what's called a multi-row constraint -- that I will treat in-depth on this blog.

So here goes.

Let's first recap what we did sofar:
  • In november 2011 I started this blog and set the scene. I talked about DML-triggers and pointed out some of their features and restrictions. Obviously I introduced the infamous 'mutating table' error ORA-04091 that exists for row-level triggers (for each row). Everybody that starts out using triggers to implement a multi-row constraint, will hit this error.
  • Then in december 2011 I explained in detail what that error is all about, and why it is your friend and not your enemy. I also showed that the common workarounds for this error really aren't workarounds and should be avoided.
  • The month thereafter I demonstrated that using a statement trigger (instead of a row trigger) as a workaround has a serious performance disadvantage.
  • I then took a step back and discussed various other use-cases for DML-triggers. For these other use-cases I agree with the common consensus that triggers are to be considered harmful.
  • Here I specified one use-case for triggers being the implementation of multi-row constraints. And explained how the lack of SQL assertions, forces us to employ triggers to implement these kinds of constraints.
  • Next I discussed the broader topic of constraints, and how they can be meaningfully classified: multi-row constraints being one of the classes.
  • I showed how two other constraint classes (attribute and tuple constraints) ideally should be implemented using a declarative CHECK, and not via a trigger.
  • In the following post I discussed different implementation strategies for multi-row constraints, the vast majority of which we cannot simply declare to the DBMS. We are required to write our own procedural validation code to implement these constraints. One implementation strategy however comes very close to being a declarative one: the use of materialized views that fail to refresh in case a violation for the constraint exists.
  • I then discussed the materialized view strategy in some detail in this post.
And that's where the development of my story is right now. Let's continue. And we will do so by discussing the correct workaround for the mutating table error when you try to implement a multi-row constraint. Recall we have looked at the following workarounds sofar:
  • Workaround 1a: a row-trigger where we wrapped the query accessing the mutating table in an autonomous transaction (thereby preventing ORA-04091).
  • Workaround1b: a row-trigger where we queried the mutating table via a loopback database-link, which also prevents the error.
  • Workaround 2: a statement trigger, which was not an acceptable solution performance-wise. It had to always check (not just when a manager got inserted), and it had to check all departments (not just the one that was affected by the DML-statement).
To implement multi-row constraints correctly and efficiently we need to build a combination of a row-trigger and a statement-trigger. The task of the row-trigger is to convey the information about which rows were affected by the DML-statement, to the after-statement-trigger. This statement-trigger can then use this information and query the mutating table (which it is allowed to do) in an efficient manner.


We can employ three 'container'-types to convey this information:
  1. Use a persistent packaged (table) variable
  2. Use compound triggers and its state variables
  3. Use a global temporary table (gtt)
In all three cases the row-trigger would populate the container with the relevant information to be conveyed to the statement-trigger. And the statement-trigger would inspect the container. Note that in the cases of a packaged variable (1) and a gtt (3) you are to make sure that the container is empty when a new DML-statement starts. The state variables of a compound trigger are initialized (by the DBMS) each time the compound trigger fires, even when the triggering statement is interrupted and restarted. Packaged variables do not rollback when the triggering statement is interrupted. The contents of a gtt do of course. Both packaged and state variables live in the PGA of the session and since they involve memory manipulation are likely to be 'faster' to populate and inspect, than the gtt container is.

However, I personally prefer to use a gtt. Perhaps the only reason being that it supports DML-statements that affect many, many rows. With state and package variables you might run the risk of blowing up the PGA in such a case. And with the advent of Oracle12c the overhead of using a gtt is further reduced with the new temporary-undo feature, which causes no redo to be generated for undo of temporary tables (see the temp_undo_enabled spfile parameter).


Now given our earlier attempts to code validation logic for the example constraint (All managers must have a clerk in the same department), we really only need to convey the number of the department for which we need to validate the constraint. And, we only need to convey this when a manager is indeed inserted. However we anticipate more than just (this) one multi-row constraint in which the EMP table is involved. And since we do not want to create a gtt per constraint, but rather one per (application) table, it is necessary to insert the values of all involved columns of the multi-row constraint. And to insert them always. I.e. not just when a manager is inserted, for we can have another multi-row constraint on the EMP table that is interested in other job values (not just managers). We then move the logic of when and for which department the constraint needs to be validated to the statement trigger. This also has makes the code more elegant: all validation logic for the constraint is located in one piece of code.

Here is the validation logic sofar for our constraint then.


Note that all the row-trigger does, is keep track of the relevant column values that are being inserted. The real validation logic for our constraint sits in the statement-trigger. For every department into which a manager just got inserted (we conveniently query our gtt for this) , it calls out to the (earlier created) p_check_for_clerk procedure. A few remarks on this implementation:

  • We query distinct deptno's from the gtt, to prevent checking a department more than once, in case an insert statement just inserted more than one manager into a department.
  • The query that access the EMP table inside the p_check_for_clerk procedure, will not hit the mutating table anymore, as at 'after statement' time the EMP table isn't in a mutating state anymore.
  • That query will also be immune for any order into which a multi-row insert statement will have inserted new EMP rows and it will see these rows: i.e. the issues of the previous two 'workarounds' (autonomous TX and loopback db-link) do not exist now.

Here's a test of this solution, using the same insert from the EMP_STAGE table we used earlier.


So, we now have a trigger based solution to enforce our constraint. But we are of course not done yet. We have only tackled the inserting of managers now. There are however other DML-statements that can potentially violate this constraint, and for which we will need to develop (more) triggers. Can you guess them?

And there is more after that... In a future post we will also discuss the need to serialize multiple transactions for correct enforcement of constraints.

Thanks for tuning in again.

Friday, March 8, 2013

The materialized view approach for implementing a table constraint

In yesterdays post I announced that I'd spent a separate post on how we can use materialized views to enforce table constraints. So here goes.

The high-level cookbook for this approach is as follows:
  1. We create a materialized view that refreshes on commit,
  2. The materialized view is defined in such a manner that it will hold no rows when the table constraint is adhered to by the current transaction trying to commit,
  3. And it is defined such that it will hold (at least) one row when the table constraint is violated by the current transaction trying to commit,
  4. We devise a construct such that on-commit refresh of the materialized view *always* fails whenever (at least) one row is materialized in the view. This can be done in two manners:
    1) we add a check constraint on the underlying table of the materialized view that always fails, or
    2) we add a before-insert row-trigger on the underlying table of the materialized view that always fails.
Here's the slide on this from my 'harmful triggers' presentation:


So let's try this with our example constraint (managers require a clerk in same department). The assertion for this constraint was:
not exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )

With this assertion we can now mechanically generate a materialized view for our constraint, using the DUAL table. Note: we negate the assertion so that the materialized view ends up having characteristics 2 and 3 from our cookbook above. So the 'not exists' turns into an 'exists'.

create materialized view managers_need_clerk
refresh fast on commit
as
select 'wrong' as text
from dual
where exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )
/

Note: we explicitly want this materialized view to be "fast refreshable", meaning that Oracle will use intelligence to minimize the work required to refresh this view. In order for Oracle to be able to do so, we would also need to create a materialized view log on the table involved, which is EMP (and DUAL?) in this case. Creating the MV-log is left for the reader.

And finally we add a CHECK clause to the underlying table segment of this materialized view (whose name is the same as the materialized view). This CHECK clause is such that it always evaluates to FALSE.

alter table managers_need_clerk add CHECK( 0 = 1 )
/

The way this now should work is that whenever a transaction introduces a department that has a manager, but no clerk, and tries to commit, this on-commit refresh materialized view will produce a single row to be inserted into the underlying table segment. This triggers validation of our CHECK clause, which will always fail. Which in turn causes the commit to fail, thereby preventing this transaction to successfully complete.

All seems very well, until you now hit the following error:

ORA-12052: cannot fast refresh materialized view [owner].[mat.view]

There are still various restrictions imposed upon materialized views for them to be fast refreshable. See the Oracle documentation for this. Sometimes you might be surprised though that a bit of rewriting a materialized view could end up such that the materialized view becomes fast refreshable. Rewriting them into straight joins is a strategy that might work here. For instance our materialized view above can be rewritten into this:

create materialized view managers_need_clerk
refresh fast on commit
as
select 'wrong' as text
from (select c.job
      from emp m
          ,emp c
      where m.job = 'MANAGER'
        and m.deptno = c.deptno (+)
        and 'CLERK' = c.job (+))
where job is NULL
/

I haven't tested above alternative: with the appropriate materialized view logs, it could well be fast refreshable now...

On final comment on this approach for implementing table constraints: Oracle (must and) will serialize refreshes of the materialized view among simultaneously executing (or rather, committing) transactions. Rob van Wijk has a nice blogpost on this here.