Thursday, March 7, 2013

And what about table constraints?

In a previous post we've introduced a classification scheme for constraints:
  • attribute constraints
  • tuple constraints
  • table constraints
  • database constraints
  • dynamic constraints
And talked a bit about how we could implement the first two classes here. In today's post we will make a start talking about how we can implement table constraints using triggers. But before we do that we will offer some thoughts on how the ideal world with regards to this subject would look like.

Long ago, in a galaxy far away, an ANSI/ISO SQL standardization committee came up with the ground breaking concept of a SQL ASSERTION. SQL assertions would make our lives real easy when it comes down to implementing table constraints. The example constraint "we cannot have a manager without a clerk in the same department" could be implemented as:

create assertion managers_need_clerk as
check(not exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )
     )
/

Presto. Done.
It would then be up to the DBMS to maintain this constraint. Of course we require the DBMS to do that in an efficient manner. This will all become clear during the course of the next couple of posts.

By the way you may think, why not just do this with a CHECK constraint?

alter table emp add constraint managers_need_clerk as
check(not exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )
     )
/

Well that's because CHECK constraints do not allow sub-queries. They don't allow that for the same reason as why we still do not have support for assertions. Both require the DBMS vendor to produce some seriously complex piece of software that can accept an arbitrary complex boolean SQL-expression and compute from that the most efficient way to maintain that boolean expression inside all concurrently running transactions. The research and development effort for this still needs to be done.


So, we have to resort to some other means when it comes to implementing table constraints. Here's a list of possible implementation strategies.


So we've discussed the first one already: it's highly preferred, but unfortunately only a very partial solution. The only table constraints that we can deal with declaratively are:

  • Keys (be them primary or unique), and
  • Foreign keys, in case the FK refers back to another column in the same table (in which case the foreign key is a table constraint, and not a database constraint).
The trigger approach is what the rest of this blog will be all about. Every table constraint can be implemented using triggers. Contrary to popular belief this is doable. But it *is* rather complex. The fact that this is a full solution for the table constraint class, is a big pro though.

Then there is the API-approach. This is the approach where you encapsulate all DML statements inside stored procedures, and disallow any direct DML access to your tables. The only way to modify your tables is through the stored procedure API-layer. And inside this API-layer, you deal with constraint validation, just before or right after you issue the DML statements. To me this is a fundamentally flawed solution, since it will always lead to constraint enforcing code duplication. And since it is not a "once and for all" solution. Every time you maintain your application and need to introduce new transactions, you'll have to take care of constraint enforcement again. Many people also tend to completely disregard the complexities involved. To a certain extent, they are the exact same complexities as are involved in the trigger approach: you'll have to take care of serialization and efficiency (all explained in future posts) in this approach too. And finally, in practice it is very difficult to maintain the enforcement of only allowing access to you tables via the API-layer. At some point people will find ways to go around the API-layer, and thereby be able to introduce corrupt data.


And there is the function based index trick. This is where we can use unique, function-based, indexes to implement keys across a subset of rows in a table. This too is a very partial solution for implementing table constraints. I'm assuming, since you read this blog, you are familiar with this trick, otherwise let me know in a comment and I'll provide you with an example.


Another approach is to employ materialized views in an ingenious manner. Like triggers, this is in theory a full solution, but in practice only a very partial one. It all boils down to the same research and development effort mentioned above when we discussed assertions, not having been done yet. We will discuss the materialized view approach in our next post.

Stay tuned.

10 comments:

  1. "... At some point people will find ways to go around the API-layer, and thereby be able to introduce corrupt data. ..."

    don't know what you mean by that, can you give an example.
    At some point people can always introduce corrupt data for example - to take it to the extremes - by editing binary data_files.

    secondly, yes, please point me to / give an example of "the function based index trick"

    Thanks !

    ReplyDelete
    Replies
    1. As an example of using a function based index to implement a key in a subset of rows of a table. Suppose we have the following orders table.

      create table orders
      (orderid number primary key
      ,customerid number not null
      ,status number not null
      check(status in ('New','Processing','Done'))
      ,...other columns...
      )
      /

      And we need to implement the business rule that we can only have one order per customer in status 'Processing'. So in the subset of rows where status equals 'Processing', columns {customerid,status} are UNIQUE.
      We can have multiple 'New' or 'Done' orders for a given customer, just not multiple 'Processing' orders. This constraint can be dealt with using a smartly devised function based unique index.

      create unique index my_business_rule
      on orders(case status
      when 'Processing' then customerid
      else null
      end)
      /

      Delete
    2. What I meant by "at some point ... go around the API layer... etc.", was that there will be times when data needs to be modified (maybe just once, due to some incident/problem) in a manner that is currently not supported by the API-layer. Or maybe the API-layer does support it, but only in a slow-by-slow manner (row-by-row), which will not do since the update needs to 'fit' inside a maintenance window. Instead of developing a dedicated API-procedure, that has all the required constraint checking code, you'll see that a multi-row update statement will be developed and run.

      Delete
    3. Thanks, got it ( and agree ) !

      Delete
  2. Why do you think the API approach always leads to constraint enforcing code duplication?

    ReplyDelete
    Replies
    1. I think I'm going to dedicate a separate blogpost on this one, as the question keeps returning on me.

      Delete
  3. I've used the function-based index trick for some kinds of table constraints. See my blog post, No Duplicates within Details

    ReplyDelete
  4. Toon your killing me. Such an interesting approach to integrity constraints and you have yet to reach the table, database, and dynamic constraints. But seriously, I appreciate the time and effort you've taken to make these concepts straight forward. I was wondering if you've written any papers on this subject?

    ReplyDelete
    Replies
    1. Darren,

      I've co-authored the book "Applied Mathematics for Database Professionals", which you might find interesting. Contains in-depth treatment of the "interesting approach". ;-)

      Toon

      Delete