Friday, March 8, 2013

The materialized view approach for implementing a table constraint

In yesterdays post I announced that I'd spent a separate post on how we can use materialized views to enforce table constraints. So here goes.

The high-level cookbook for this approach is as follows:
  1. We create a materialized view that refreshes on commit,
  2. The materialized view is defined in such a manner that it will hold no rows when the table constraint is adhered to by the current transaction trying to commit,
  3. And it is defined such that it will hold (at least) one row when the table constraint is violated by the current transaction trying to commit,
  4. We devise a construct such that on-commit refresh of the materialized view *always* fails whenever (at least) one row is materialized in the view. This can be done in two manners:
    1) we add a check constraint on the underlying table of the materialized view that always fails, or
    2) we add a before-insert row-trigger on the underlying table of the materialized view that always fails.
Here's the slide on this from my 'harmful triggers' presentation:


So let's try this with our example constraint (managers require a clerk in same department). The assertion for this constraint was:
not exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )

With this assertion we can now mechanically generate a materialized view for our constraint, using the DUAL table. Note: we negate the assertion so that the materialized view ends up having characteristics 2 and 3 from our cookbook above. So the 'not exists' turns into an 'exists'.

create materialized view managers_need_clerk
refresh fast on commit
as
select 'wrong' as text
from dual
where exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )
/

Note: we explicitly want this materialized view to be "fast refreshable", meaning that Oracle will use intelligence to minimize the work required to refresh this view. In order for Oracle to be able to do so, we would also need to create a materialized view log on the table involved, which is EMP (and DUAL?) in this case. Creating the MV-log is left for the reader.

And finally we add a CHECK clause to the underlying table segment of this materialized view (whose name is the same as the materialized view). This CHECK clause is such that it always evaluates to FALSE.

alter table managers_need_clerk add CHECK( 0 = 1 )
/

The way this now should work is that whenever a transaction introduces a department that has a manager, but no clerk, and tries to commit, this on-commit refresh materialized view will produce a single row to be inserted into the underlying table segment. This triggers validation of our CHECK clause, which will always fail. Which in turn causes the commit to fail, thereby preventing this transaction to successfully complete.

All seems very well, until you now hit the following error:

ORA-12052: cannot fast refresh materialized view [owner].[mat.view]

There are still various restrictions imposed upon materialized views for them to be fast refreshable. See the Oracle documentation for this. Sometimes you might be surprised though that a bit of rewriting a materialized view could end up such that the materialized view becomes fast refreshable. Rewriting them into straight joins is a strategy that might work here. For instance our materialized view above can be rewritten into this:

create materialized view managers_need_clerk
refresh fast on commit
as
select 'wrong' as text
from (select c.job
      from emp m
          ,emp c
      where m.job = 'MANAGER'
        and m.deptno = c.deptno (+)
        and 'CLERK' = c.job (+))
where job is NULL
/

I haven't tested above alternative: with the appropriate materialized view logs, it could well be fast refreshable now...

On final comment on this approach for implementing table constraints: Oracle (must and) will serialize refreshes of the materialized view among simultaneously executing (or rather, committing) transactions. Rob van Wijk has a nice blogpost on this here.

Thursday, March 7, 2013

And what about table constraints?

In a previous post we've introduced a classification scheme for constraints:
  • attribute constraints
  • tuple constraints
  • table constraints
  • database constraints
  • dynamic constraints
And talked a bit about how we could implement the first two classes here. In today's post we will make a start talking about how we can implement table constraints using triggers. But before we do that we will offer some thoughts on how the ideal world with regards to this subject would look like.

Long ago, in a galaxy far away, an ANSI/ISO SQL standardization committee came up with the ground breaking concept of a SQL ASSERTION. SQL assertions would make our lives real easy when it comes down to implementing table constraints. The example constraint "we cannot have a manager without a clerk in the same department" could be implemented as:

create assertion managers_need_clerk as
check(not exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )
     )
/

Presto. Done.
It would then be up to the DBMS to maintain this constraint. Of course we require the DBMS to do that in an efficient manner. This will all become clear during the course of the next couple of posts.

By the way you may think, why not just do this with a CHECK constraint?

alter table emp add constraint managers_need_clerk as
check(not exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )
     )
/

Well that's because CHECK constraints do not allow sub-queries. They don't allow that for the same reason as why we still do not have support for assertions. Both require the DBMS vendor to produce some seriously complex piece of software that can accept an arbitrary complex boolean SQL-expression and compute from that the most efficient way to maintain that boolean expression inside all concurrently running transactions. The research and development effort for this still needs to be done.


So, we have to resort to some other means when it comes to implementing table constraints. Here's a list of possible implementation strategies.


So we've discussed the first one already: it's highly preferred, but unfortunately only a very partial solution. The only table constraints that we can deal with declaratively are:

  • Keys (be them primary or unique), and
  • Foreign keys, in case the FK refers back to another column in the same table (in which case the foreign key is a table constraint, and not a database constraint).
The trigger approach is what the rest of this blog will be all about. Every table constraint can be implemented using triggers. Contrary to popular belief this is doable. But it *is* rather complex. The fact that this is a full solution for the table constraint class, is a big pro though.

Then there is the API-approach. This is the approach where you encapsulate all DML statements inside stored procedures, and disallow any direct DML access to your tables. The only way to modify your tables is through the stored procedure API-layer. And inside this API-layer, you deal with constraint validation, just before or right after you issue the DML statements. To me this is a fundamentally flawed solution, since it will always lead to constraint enforcing code duplication. And since it is not a "once and for all" solution. Every time you maintain your application and need to introduce new transactions, you'll have to take care of constraint enforcement again. Many people also tend to completely disregard the complexities involved. To a certain extent, they are the exact same complexities as are involved in the trigger approach: you'll have to take care of serialization and efficiency (all explained in future posts) in this approach too. And finally, in practice it is very difficult to maintain the enforcement of only allowing access to you tables via the API-layer. At some point people will find ways to go around the API-layer, and thereby be able to introduce corrupt data.


And there is the function based index trick. This is where we can use unique, function-based, indexes to implement keys across a subset of rows in a table. This too is a very partial solution for implementing table constraints. I'm assuming, since you read this blog, you are familiar with this trick, otherwise let me know in a comment and I'll provide you with an example.


Another approach is to employ materialized views in an ingenious manner. Like triggers, this is in theory a full solution, but in practice only a very partial one. It all boils down to the same research and development effort mentioned above when we discussed assertions, not having been done yet. We will discuss the materialized view approach in our next post.

Stay tuned.

Sunday, March 3, 2013

Implementing attribute and tuple constraints

In our previous post we have introduced a classification scheme for data integrity constraints. In todays post we will present thoughts & guidelines around how to implement the first two classes:  attribute and tuple constraints.

The examples given in the previous post were:

Attribute constraints
  • Salary must be a positive integer between 750 and 14000.
  • Job must be one of the following list: MANAGER, CLERK, SALESMAN, etc.
Tuple constraints
  • A manager cannot have a salary less than 5000.
  • An employee working in department 10, cannot be a salesman.

We could implement these constraints using a trigger approach. Let's go down that road first since this blog is about triggers. We will have to figure out then which trigger types (we have 12 of them, remember) are best fit to implement these attribute and tuple level constraints?
  • Row triggers or statement triggers?
  • Firing before or after the DML statement?
  • Firing when? On insert, update or delete?

A bit of thought quickly leads to:
  • Row triggers, since they have the :old and :new variables that enable us to easily inspect the column value(s) involved in the constraint;
  • Before triggers seem obvious since we can prevent the rows actually being processed when they are in violation with one of these constraints.
  • And finally, the delete event is of no concern. We only require validation during insert and update. And even then *only* when columns involved in any of the constraints are affected.

Here is the trigger code that could implement the first attribute constraint example mentioned above. Not too difficult.
create or replace trigger emp_aiur1
before insert or update on EMP
for each row
begin
  --
  if INSERTING or UPDATING('SALARY')
  then
     --
     if :new.SALARY < 750 or :new.SALARY > 14000
     then
       --
       raise_application_error(-20000,'Salary must be between 750 and 14000.');
       --
     end if;
     --
  end if;
end;
/

Note that this isn't actually an example of our fourth use-case: that use-case required triggers to execute queries. But as it so happens to be, using a trigger for implementing attribute (or tuple as we'll see hereafter) constraints doesn't require us to code any queries.

Similarly we can code a trigger to implement a tuple constraint. Here's the trigger for the first tuple constraint example given above:

create or replace trigger emp_aiur2
before insert or update on EMP
for each row
begin
  --
  if INSERTING or UPDATING('JOB') or UPDATING('SALARY')
  then
     --
     if :new.JOB = 'MANAGER' and :new.SALARY < 5000
     then
       --
       raise_application_error(-20000,'A manager cannot earn less than 5000.');
       --
     end if;
     --
  end if;
end;
/

Note the numbering that was introduced in the trigger names above. Going down the road of using triggers to implement attribute and tuple constraints, also requires us to think about this. Do we combine all code for these constraints into one "after insert or update for each row" trigger per table? Seems plausible: for one we then control the order in which the validations occur. Yes, I'm aware of the FOLLOWS construct, we could use that also and keep the separate triggers. But it doesn't make the solution more elegant nor better maintainable, does it?

Instead of the procedural approach above, a much, much better approach for implementing attribute and tuple constraints is of course to use the declarative CHECK syntax. Here's the equivalent for above two triggers:

alter table emp add constraint EMP_SAL1
check(SALARY between 750 and 14000)
/

alter table emp add constraint EMP_JOB_SAL1
check(JOB != 'MANAGER' or SALARY >=5000)
/

The second CHECK clause might be difficult for you at first sight. That depends whether you know the important rewrite rule that we have in formal logic: how to rewrite an implication (IF...THEN...) into a disjunction (...OR...). Whenever a predicate A implies a predicate B, written as 'A implies B', then we can rewrite that as 'not A or B'. And the tuple constraint at hand is indeed an implication (as so many tuple rules are btw): if job is MANAGER then salary must be greater than or equal to 5000.

The same question also pops up when using CHECK constraints: do we combine them all into one CHECK constraint (by logically AND-ing them all), or have separate CHECK constraints? My strong preference is to implement them as separate CHECK constraints. Granted you do not control the order in which they 'fire' in this case, but you do get better (ie. more detailed) errors.

In our next post we will proceed to guidelines/thoughts around implementing multi-row (table and database) constraints.

Thanks for tuning in.

Data integrity constraint classification

Before we start investigating the complexities involved in implementing data integrity constraints using database triggers, we will first introduce a classification schema for data integrity constraints. Agreeing upon a classification scheme for constraints, helps us a lot in implementing them: for we can then approach the problem area on a class-by-class basis.

A classification scheme needs to have a few properties:
  1. The constraint classes should be mutually exclusive: any given constraint should clearly fall into one, and only one, of the classes;
  2. The scheme should cover all types of constraints: i.e. it cannot be that we can think of a constraint that doesn't fall into one of the defined classes;
  3. It should be practical: and by this we mean it should help us when implementing constraints. Or, put in another way, the issues we have when implementing constraints of the same class should be similar for these constraints, and the issues we have for implementing constraints of a different class, should be different.
A commonly used classification scheme for constraints, that has all above properties, is one that takes the increasing scope that a constraint can have in a database design, as the driver for defining classes.

Class 1: Attribute constraints.
These are constraints that can be specified (and thus checked) by referring to (or inspecting) only one column value of a tuple (ie. row).

Class 2: Tuple constraints.

These are constraints that can be specified (and thus checked) by referring to (or inspecting), at least two column values of a tuple. This class can also be called 'multi attribute' constraints.

Class 3: Table constraints.
These are constraints that can be specified (and thus checked) by referring to (or inspecting), at least two tuples of a table. This class can also be called 'multi tuple' constraints.

Class 4: Database constraints.
These are constraints that can be specified (and thus checked) by referring to (or inspecting), at least two tables. This class can also be called 'multi table' constraints.

In case the above scheme is totally new for you, here are a few examples:

Attribute constraints
  • Salary must be a positive integer between 750 and 14000.
  • Job must be one of the following list: MANAGER, CLERK, SALESMAN, etc.
Tuple constraints
  • A manager cannot have a salary less than 5000.
  • An employee working in department 10, cannot be a salesman.
Table constraints
  • There cannot be two employees with the same employee number.
  • A manager must always be accompanied by a clerk in the same department (hmm, sounds familiar this one...)
Database constraints
  • Every employee must work in a known department.
  • No (non SF-based) department can have a higher salary budget then any San Francisco (SF) based department.
Did you spot the primary/unique key? And the foreign key?

Just verify for yourself that this classification scheme covers all constraints. Think of a few constraints in your database design, and see in what classes they fall. It could be that you find a constraint that's not covered yet. That would be because above four classes jointly only cover  static constraints. These are constraints that can be verified by looking at a database state (or snapshot) in rest. There is another class in itself for what are called dynamic constraints. Dynamic constraints, sometimes referred to as transaction or database-state change constraints, cannot be verified by looking at a database state at rest. That's because the specification of these constraints require the context of a transaction. Examples of dynamic constraints are:

  • Cannot change the salary of an employee without the same transaction also writing a log-record in [some table] that details the user who performed the change and the time when this was done;
  • We register new hires at most one month before they join the company. Or, put in another way: the hiredate of an employee cannot be more than one month in the future;
  • (a classic one) Salary can only increase, never decrease.

The four static constraint classes plus the dynamic constraint class are,

  • all mutually exclusive, and
  • together cover all possible constraints.

And as it so turns out this classification scheme is also a practical one. Similar issues arise when implementing constraints of the same class. We'll further investigate that in our next post, where we will start with implementation guidelines for constraints on a class-by-class basis.