Triggers Considered Harmful, Considered Harmful: 2012

Monday, March 26, 2012

The fourth use-case for triggers

In our previous post we talked about three of the four use-cases we introduced. Triggers can:

Assign/modify (row) column values.
Execute insert/update/delete statements.
Execute non-transactional stuff.

We've also discussed why indeed using triggers for above use-cases, should be considered harmful. Their automagic side-effects are often not expected by other programmers / users of the database application.

In this post we continue with use-case four. Triggers can:

Execute select statements.

This use-case, where we have the need to execute queries inside triggers, is a very common use-case, and it is one that always (re)surfaces, in almost every project. And there's a reason for this: a very important database-concept requires you to query other data from within triggers.

This concept is called: data integrity constraints. And it's exactly this use-case where, in my opinion, triggers form the means to an end.

Now beware though, and we give you this warning beforehand: using triggers to implement data integrity constraints, is by far not easy. In fact it's extremely complex (as we will detail in future posts). But to me that is no reason to not use triggers. And this use-case will not suffer from the automagic side-effects the other three use-cases had. So I disagree here when Tom Kyte says that even for this use-case 'thou shall not use triggers'.

Recall the 'business rule' that we were trying to implement using triggers: a department cannot employ a manager without a clerk in the same department. That rule is in fact a data integrity constraint. Now, few people know of the documented feature in the SQL standard called: assertions. SQL assertions have been part of the SQL standard since 1992. See this link for the syntax/grammar (it's the eighth 'key sql statement' from the top of the list). Had database vendors supplied us with support for the two-decades old SQL assertion feature, then we could have straightforwardly implemented the data integrity constraint with one CREATE ASSERTION statement:

Translating the above assertion into somewhat awkward english, it goes like this: there cannot exist a department such that this department employs a manager, and such that this department does not employ a clerk.

Once the assertion is created, it's up to the DBMS to enforce the integrity constraint in a correct and (hopefully) efficient manner. Just like you expect the DBMS to correctly and efficiently enforce a uniqueness constraint, or a foreign key. Conceptually these constraints are no different than assertions: it's just that these two represent (constraint) patterns that occur so frequently in every database design, that we've been provided with dedicated SQL language constructs to declare them. But theoretically the UNIQUE / PRIMARY KEY and FOREIGN KEY language constructs are redundant, when assertions are available: both can be rewritten using a CREATE ASSERTION statement.

Would you use assertions, if Oracle provided them? I certainly would, and you would probably too. You are using the other declarative means to implement data integrity (check constraints, primary key, foreign key) right now too, aren't you?

Implementing data integrity constraints in a declarative manner, enables a kind of separation of concerns: you implement your integrity constraints once-and-forall while you create your tables. And then you build business logic on top of the database design without having to worry about validating integrity constraints: all you need to cater for is error-handling, in case your business logic tries to modify/store data in such a way that it violates one or more of the declared integrity constraints. If you want to read up on this way of implementing data-centric applications theHelsinkiDeclaration.blogspot.com is a good starting point.

By the way, there's a reason why DBMS vendors have yet to supply us with support for the CREATE ASSERTION statement:

Developing a correct and efficient implementation for an arbitrary complex assertion (which is what we're asking for), is very hard. By 'correct' we mean, the implementation must properly deal with concurrent transactions manipulating data that is involved in the assertion. By 'efficient' we mean, the implementation must not just run the SQL-predicate that was asserted in the create assertion command, but it must be able to a) detect when a transaction might potentially violate the assertion, and then b) run a minimal check to ensure the continued validity of the assertion.

As far as I know the problem areas described above, haven't been fully researched yet by the scientific community: for one I'm unable to find this research, and had it been researched fully, it would have been fairly easy for a database vendor like Oracle to 'stand on the shoulders' of those researchers and provide us support for create assertion.

So there we are: the fourth use-case for triggers being implementing data integrity constraints.

To be continued...

Monday, February 6, 2012

Where TK agrees with TK (or: why are triggers harmful)

Sofar we've explored playing around with a few triggers to implement a business rule. In this post I'd like to step back a bit and take a birds-eye view at the common use-cases that I see brought up for introducing triggers into the app-development picture.

The first use-case is: we use triggers to assign or modify column values of mutating rows (that is, rows currently being inserted, updated, or deleted). Here's a few typical examples of this use case that, I'm sure, must look familiar to you too.

For this use-case we always use a row-level trigger (for each row), as this type of trigger allows us to manipulate :new.[column-name] variables. We can re-assign supplied (by the trigger DML-statement) column values, or initialize ones that were not supplied. So for instance the use enters all column-values but the Created and By_User columns, which are initialized by a row-trigger we wrote.

The second use-case is: we use triggers to execute insert/update/delete statements. So we perform some DML against a table. This table has some trigger, in which we perform more DML.

Of course this can lead to a cascading effect: the DML that we perform from within our trigger body might be against a table that too has triggers. In which there might be more DML-statements performed, and so on and so forth. Typical examples of why we perform DML-statements inside triggers, are: we want to audit changes against the table, or we want to maintain redundant data stored elsewhere in the database design.

The third use-case is: we use triggers to execute non-transactional stuff. And by "non-transactional" I mean stuff that, once executed inside our transaction, does not participate in a rollback, in case we (the code-path) decides to perform one.

The most beautiful example of the last decade for this is, we call a web-service inside our trigger body. In case this is a web-service that actually 'does something' (changes/transacts information at/behind the endpoint of the service), then obviously when we rollback our triggering DML-statement, whatever was changed by the web-service, remains changed. Other examples of this use-case are: we send email from our trigger body, or perform file I/O. We can also perform an autonomous transaction (one that actually performs some DML which given it's executed inside the autonomous TX, cannot be rolled back by the parent transaction). This is actually a blend of this use-case with the previous one.

So what's harmful about all these use-cases? Well, they make stuff happen automagically (copyright the other TK).

What we mean by this is, that instead of only getting the effect of the triggering DML-statement, you also get the effect of whatever is being performed inside the trigger body. And exactly that is the harmful bit.

In a SQL DBMS, the insert, update and delete 'operators', constitute three primitive operators that we can use to manipulate the data held inside the database (tables). Primitive here is used in the sense of primary or basic. Sort of like Inc(rement) and Dec(rement) are primitives that can be used in machine language to increase or decrease a register's value. These primitive operators have well understood and universally known change-semantics. Say you start with an empty EMP table, you then insert a employee row, you will end up with a change of the EMP table: it now has a single row in it. And all other bits and pieces of the database remain unchanged.

Triggers executing automagic stuff change these universally well-understood change-semantics of our three primitive operators.

And this is harmful: because nobody (after you've gone) expects insert, update or delete to quietly behave differently. For those of you who have done some machine programming in a distant past: it's like increasing one of your registers, and suddenly (as an automagic side-effect) the top of the execution stack is increased also and has some subroutine's address pushed on it. Not something you'd expect...

The TCO (total cost of ownership) of an application sits not in initially building the thing. It sits in the years, sometimes decades, the thing lives. And you, but more probably others, maintain it. And more often than not these developers expect their available primitive operators to follow the primitive change-semantics.

I'll finish this post with the following thought:

If you agree on the three use-cases discussed being harmful, do you then also agree with me that:

The cascade delete option of a foreign key, and
The default clause of a table column, and
The char (fixed-length) datatype

are to be considered harmful too.

Aren't they conceptually about the exact same problem? Introducing different change-semantics to our three primitive operators?

I'd be interested to hear of any other 'features' that change the way insert, update or delete behave.

Sunday, January 29, 2012

Statement level constraint consistency

In the past week I've been investigating how Oracle fires triggers with the Merge and Multi-Table-Insert statements. Also took a look at 'statement-level constraint consistency' with these two types of statements. My findings are here: http://rulegen.blogspot.com/2012/01/statement-level-constraint-validation.html

Normal transmission on harmful triggers should resume shortly.

Wednesday, January 4, 2012

Workaround 2: don't use the row-trigger

So where were we?

We wanted to implement the following business rule: "Cannot have a manager without a clerk in the same department." At the end of this post, we tried implementing that using a straight-forward row-trigger, which introduced us to the mutating table error. And we've explained thereafter (here and here) that this error is there for a very good reason.

We've also clarified that the two common 'workarounds' 1a and 1b really should not be used.

So using a row-trigger to check for a clerk whenever a manager is inserted will not work. Remember the reason we wanted to use a row-trigger is because in it we have available :new.deptno which represents the department into which a manager just got inserted. This enabled us to efficiently check only that department for the necessary clerk.

So what about a statement trigger then? Obviously it forces us to validate all departments, since within the context of the after insert statement trigger, we do not know anything about the actual insert(s), other than that some insert took place.

Above is an attempt to implement our rule using the after insert statement trigger. As you can see we validate all departments. This is not a very efficient solution to implement our business rule. Not only do we validate all departments, we also validate them on all types of inserts. Obviously when we insert, for instance, a trainer, there is no need at all to validate this rule (assuming the rule was fine, before this insert), since inserting a trainer cannot violate this rule: such an insert doesn't 'touch' the rule.

We will explain in detail how to a) workaround the mutating table issue + b) get an efficient (and correct) implementation for our rule using a statement trigger in multiple future posts. But before we do that, we'd like to spend some time on common use-cases of triggers, and why they gave rise to the general consensus out there of triggers being harmful.