Friday, December 30, 2011

Look mom: a mutating table error without a trigger!

Did you ever wonder why it says 'trigger/function' in the error message of ORA-04091?


ORA-04091: table ... is mutating, trigger/function may not see it


We know (and understand) by now that a row trigger cannot read a mutating table, but what's the /function all about in above message text? Well there is a completely different scenario that has nothing to do with triggers, where you can run into this error. I thought to spend a short post on that first, so that you really fully understand that ORA-04091 is your friend. And again in this scenario it prevents you from coding logic that might work differently tomorrow than from what it did today.

Let's quickly show you this scenario. We create the EMP table again and insert some test data into it.


drop table EMP;
create table EMP
(EMPNO    number(3,0)  not null primary key
,ENAME    varchar2(20) not null
,SAL      number(4,0)  not null)
/
insert into emp(empno,ename,sal) values(100,'Toon',4000);
insert into emp(empno,ename,sal) values(101,'Izaak',5000);
insert into emp(empno,ename,sal) values(102,'Marcel',7000);
insert into emp(empno,ename,sal) values(103,'Rene',8000);
commit;


Our HR department decided that above salaries should be aligned a bit more in the year 2012. And in their wisdom they decided that the following update should be run at the end of the year:


update EMP e1 set e1.SAL =
 e1.SAL + ((select avg(e2.SAL) from EMP e2) - e1.SAL)/2
/

This update brings everybody's salary a bit more towards the current average salary of all employees. The dba does a testrun of the script in the acceptance instance:



It runs well, HR confirms that it does what it was intended to do, and dba is fine with this in production.

Here come the QA department though: they need to 'ok' the running of this script in production too. So they inspect the script with all QC guidelines in mind. Result of their review is that they won't allow this. There is distinct 'business logic' in there that needs to be removed from the statement and implemented in a stored function, so that it 'can be reused sometime in the future' (I've seen this happen in real companies...). They mandate that a function be implemented such that the update statement will be of the following form:


update EMP e set e.SAL = f_new_sal(e.SAL)
/

So the developers pump out the following function for this:


create or replace function f_new_sal
(p_current_sal in number) return number as
--
pl_avg_sal number;
--
begin
  --
  select avg(SAL) into pl_avg_sal
    from EMP;
  --
  return p_current_sal + (pl_avg_sal - p_current_sal)/2;
  --
end;
/


Everybody is happy. The dba executes the testrun again:



WTF? Oracle throws a mutating table error? But there's no trigger involved here...

Why is this happening?

If you understood the explanation of this error in my previous posts sofar, you should be able to figure this one out. Oracle again detects that we seem to be selecting from a table that's currently mutating. Think about what is happening here:

  • The update starts identifying the rows that need their SAL column value changed;
  • It does this in some undetermined order (!);
  • When the first rows is found, it invokes our function to determine the new SAL column value for it;
  • Our function queries the EMP table, which is currently mutating: bingo we hit the error.

When Oracle steps out of the SQL-engine that's currently executing the update statement, and invokes the function, then this function -- just like an after row update trigger would -- sees the intermediate states of EMP as they exist during execution of the update statement. This implies that the return value of our function invocations heavily depend on the order in which the rows happen to be updated.

Wanna see this happening? Let's use our database link "workaround" to prevent the mutating error. We create a loopback database link and change our function accordingly:

create or replace function f_new_sal
(p_current_sal in number) return number as
--
pl_avg_sal number;
--
begin
  --
  select avg(SAL) into pl_avg_sal
  from EMP@loopback;                -- Here: added db-link.
  --
  return p_current_sal + (pl_avg_sal - p_current_sal)/2;
  --
end;
/

Dba can now run the update without hitting the error:



But look at the SAL values, only Toon's salary is updated correctly. These are not accepted by HR...

And these values will be different if we force a different order of updating the rows, for instance like this:

update (select * from EMP order by EMPNO desc) e 
  set e.SAL = f_new_sal(e.SAL)
/

The proof is in eating the pudding:



Now Rene's salary is the only one being updated correctly.

So again, don't forget: ORA-04091 is your friend.

Note that in the original version of the update statement (without the function), Oracle ensures that the repeated 'invocations' of the subquery (select AVG(e2.SAL) etc.) all see the same snapshot of the database: the one that existed at the start of the update statement. So there is no indeterministic behavior in that case.

A fix for the issue demonstrated could be to just add the autonomous transaction pragma to the function. We would then get the same (deterministic) behavior as the original version of the update statement.
A better fix in this case though would be to not have the function at all, and just go with the original update statement.

Friday, December 23, 2011

The mutating table error prevents non-deterministic behavior of your code

A short recap of workarounds 1a and 1b from our previous post.
  • In workaround 1a we ensured that the query that was selecting from the mutating table, was being executed from within an autonomous transaction.
  • In workaround 1b we ensured that this query was being executed via a loopback database link.
We also discussed the major difference between the two workarounds:
  • 1a: the query sees the mutating table as it existed prior to the start of the main transaction.
  • 1b: the query sees the mutating table as it is being mutated by the triggering DML statement. Essentially this workaround behaves as if Oracle did not have a mutating table error at all.
Please never implement these workarounds. Especially 1b for the reason I will demonstrate in this post. And do you really want the overhead of going through a database link every time the row-trigger fires? And potentially have twice the number of database sessions in your database instance? I hope not.

Apart from this, you will have planted a booby trap in your application code: your code can now behave non-deterministically.


To discuss this we will investigate how our row-trigger behaves not in the case of a single row insert, but a multi-row insert. Given our business rule, no manager can be without a clerk in the same department, a multi-row insert that inserts both the manager as well as the necessary clerk as the first two employees for a new department in one go, is a valid insert statement that our trigger-based rule-enforcement code should allow.

Let's look at the behavior of our row-trigger using workaround 1b (loopback db-link) first. Suppose we have a table called EMP_STAGE. This (staging) table is holding employee-rows that are to be inserted into the real EMP table. Table EMP_STAGE currently holds two employees: a manager and a clerk both working in department 42.

Now see what our row-trigger does.


Our row-trigger does not allow this insert. Why? Well think about it. How is the statement 'insert into EMP select * from EMP_STAGE' executed? Particularly in what order are the selected from EMP_STAGE and then inserted into EMP? Apparently in the case above the manager was the first one that was selected and inserted into EMP. Which would then fire our row-trigger, which would then see the mutating table (via the loopback db-link) and not find the required clerk in department 24. Hence our row-trigger raised the business rule error. And the multi-row insert statement never reaches inserting the second row (the clerk), it is cancelled by the raise_application_error and rolled back.

But what if the rows happened to be processed in a different order? We can force this by simply adding an order-by clause to the subquery in the insert statement (we re-execute the insert and have added 'order by empno desc'):


Our two-row insert statement now executes succesfully!

So what happened in this case? Due to the order by clause we've ensured that employee 111 (the clerk) got inserted first. The row-trigger fires, sees it's not a manager and therefore nothing needs to be done. Next employee 110 got inserted. The row-trigger fires again, sees it's a manager, and executes the business rule check by calling our procedure. This procedure now queries the EMP table (via the loopback db-link) and it actually sees (!) the clerk that just got inserted by the same statement, and therefore executes successfully.

Now the order in which rows get inserted by a multi-row insert statement is (and rightfully should be) of no concern to you. Think of the subquery instead of being a simple 'select * from emp_stage', being a complex multi-table join: depending on current contents/sizes of the tables involved, statistics might differ which in turn might cause different execution plans for the insert statement, which in turn might change the order in which rows get inserted by a multi-row insert statement. If your code is not immune to this order then in effect your code executes in a non-deterministic fashion. And this is the essence of why ORA-04091 is your friend. It protects you against developing software that might work today, and not work tomorrow.

So how does this all work with the autonomous transaction workaround? Here's the same two-row insert for that scenario, executed without and with the forced order (EMP again has no employees in department 42).


As you can see both versions of the insert statements fail with this workaround. The explanation for this is simple. Since the query that checks for the necessary clerk is run from inside an autonomous transaction, all it sees is the EMP table as it existed prior to start of the insert statement: in which there were no employees for department 42. So in this case the 'workaround' is not a workaround at all: yes it prevents the mutating table error, but no it does not allow a valid insert-statement to execute successfully. Note that it even wouldn't allow two subsequent single row-inserts: first insert a clerk, then insert a manager. With this workaround you are forced to commit the clerk first, and only then insert the manager.

So, in summary:


Again, forget about these two red herrings: don't use them.

In the following posts we will start developing how to correctly work around the mutating table error. And as many of you probably know, it involves a combination of row and statement triggers.

In the meantime happy holidays!

Tuesday, December 13, 2011

"Workarounds" for ORA-04091

In the previous post, we demonstrated running into the mutating table error. The example involved the well-known EMP table and a business rule that we are trying to implement: all departments with a Manager should also employ a Clerk. We started our implementation for that rule by writing a straight-forward 'after-row-insert' trigger on the EMP table.


  • The row trigger detects whenever a Manager is inserted (remember, we can only detect this with a row trigger, not with a statement trigger);
  • It then calls a stored procedure to which it supplies the department number (deptno-column value) of the inserted Manager;
  • The procedure then queries the EMP table to see if a Clerk exists in this department;
    • If it finds no such Clerk, the procedure raises an application error, which causes the after-insert-row trigger to fail. This in turn prevents the insert of the Manager, since it would violate our business rule (the Manager insert will undergo a statement-level rollback due to the row-trigger failing);
    • If it finds such a Clerk, then the procedure finishes successfully, which causes the after-insert-row trigger to finish successfully, which causes the Manager insert to execute successfully.
A straight-forward 'use-case' of using a (row) trigger, right? Of course this setup didn't work, since the cursor in the stored procedure will always hit the mutating table error, thus preventing a Manager insert even if the necessary Clerk would indeed be present.

So, what's the workaround for this? Well there are a two I would like to discuss in this post: one of them is very 'popular', the other one is -- I guess -- not so well-known.

I cannot stress enough upfront right now, that both 'workarounds' are red herrings.

Workaround 1a: use the autonomous_transaction pragma inside the stored procedure.

If we add just one line at the top of the stored procedure, we can fix this baby (see red-arrow line below).


By ensuring that all code inside the P_CHECK_FOR_CLERK procedure will run as an autonomous transaction, we can prevent the mutating table error. Why? Because it causes Oracle to 'step out' of the current transaction context, in which EMP is currently 'mutating', and open a new transaction in which EMP is not mutating, and therefor can be queried without hitting the mutating table error. So all we need to do is add the PRAGMA line highlighted above. We do not need to add an explicit COMMIT (or ROLLBACK) at the end of the procedure, which is usually required to end an autonomous transaction, since the procedure doesn't really start a transaction at all: is just issues a query.

This is the popular workaround. Now for the lesser known workaround.

Workaround 1b: use a loopback db-link to query the mutating table.

By querying the EMP table to look for a Clerk, via a loopback database link, we can fix this baby too.


So we create a loopback database link (which we call loopback above) which connects back to our application owner schema. We then add the text '@loopback' to the table name (EMP) inside the FROM-clause of our query that was hitting the mutating table (second red arrow above). Done. We'll demonstrate in a minute that this 'fixes' our issue as well.

But before we do that you really need to understand a fundamental difference between these two workarounds.


In the preliminaries that we discussed in the previous post, one of the observations that was made about an autonomous transaction was that all code in such transaction cannot see any (uncommitted) changes made by the other (parent) transaction. So what this means for our example is that the cursor in the stored procedure (the one that tries to find a Clerk) which is executed by an 'after-row' trigger, will see the EMP table without the Manager that was inserted by our triggering insert statement.

Yet.

The cursor that uses the loopback database link, which is executed from within a new session created by the database link, *will* see the mutating table as it exists at that point-in-time in the transaction of the 'original' session (the one that issued the Manager insert). Why? Because Oracle detects that both sessions are now part of a distributed transaction, and the database link session will see the same database snapshot as the other session. Now this is pretty radical, I think. When I discovered this behavior I was stunned for a few moments: here we have a database session (OK, it's one that was created by a database link, but still) a session that is able to see uncommitted changes of a different session! I will show you a demo that proves this behavior in a minute.

Now wouldn't that be an in-depth (trick) question for some fancy certification exam:

"When can a database session see uncommitted changes of another session?"
a) Never.
b) Always, this is default behavior.
c) If it's using an autonomous transaction.
d) If the session was opened by a database link.

;-)

Now let's test both workarounds. Remember we are trying to insert a manager into department 42, in which no other employee works yet.



Et voilá: no more mutating table error. Our row-trigger + procedure work, they give us the business rule error.

So what's wrong with both approaches? Can you guess it? I'll give you a hint: think about multi-row inserts. Will they work given workaround 1a (the autonomous transaction) or workaround 1b (the db-link trick)?

I'll discuss this in the sequel, which should follow promptly.

Friday, November 25, 2011

Some preliminaries

There are a few remarks to be made on DML event triggers. Just to be sure we are all at the same level on the playing field. The first one is that since these triggers fire as part of the execution of a DML statement, the trigger code is not allowed to execute any DDL (which causes an implicit commit) or (explicit) transactional control statements, such as commit, rollback, savepoint or rollback to savepoint. This restriction is true for both statement level as well as row level DML event triggers.


If you try to do so, Oracle will throw an ORA-04092 at you. A common "workaround" of developers who really think that they need to do this kind of stuff from within trigger code, is to wrap the offending code inside an autonomous transaction, thereby not affecting, transactionally, the main transaction which is currently executing the DML statement (that caused the trigger to fire). There is an important observation to be made at this time: the SQL-code inside such autonomous code block, cannot see the changes that are being made by the DML statement that is executing. Nor can this SQL-code see the effect of any other DML statements that might have been executed earlier inside the main transaction. Oracle treats the autonomous code block as if it were executed inside a different session. And sessions never see changes made by other sessions that have not yet committed these changes.

Remember this one. I'll refer back to this observation in the next post when I discuss common "workarounds" for the infamous mutating table error.

The second remark:


If a trigger fails, due to an exception being raised (and left unhandled), then the triggering DML statement (and all that might have been done by the trigger code self), will be rolled back. Note that this is a statement level rollback: so any prior changes executed in the transaction, are still left "posted" (not rolled back). You can view this statement level rollback as a 'rollback to savepoint', where the savepoint was set (by Oracle) just prior to starting the DML statement execution.

Third (an final) remark in this post: row level triggers are not allowed to read what's called the mutating table.


If your row level trigger code queries the table that's being affected by the triggering DML statement, then Oracle will throw an ORA-04091 at you and cause the above mentioned statement level rollback. If you've ever played around with (row) triggers, you must have encountered this issue. I think my first encounter was in 1993. And there is a very clear cause why you will encounter this issue: the kind of stuff you would like to do in a row-trigger, often, no very often, requires you to query the mutating table. I'll come back to this later.


One of my goals is to explain to you that this error is not your enemy. No, it's your friend. It prevents you to develop software, that might work today, but no longer tomorrow. I'll demonstrate that in the next post. For now we'll close this post by quickly building an example that gives rise to the mutating table.

Suppose we need to implement the requirement that every employee whose job is Manager, must be accompanied by another employee whose job is Clerk and works in the same department (as the manager does). Well, we can implement that easily, all it requires is a row trigger, right?


We build an after insert row trigger on the EMP table, which will verify, whenever a Manager is inserted, that a Clerk exists in the same department. Here's the code for that procedure you see called in above trigger body:


And as you can see, this code queries the mutating table. So let's test this trigger. Here's the contents of our EMP table, and an insert statement that attempts to insert an Manager into department 42 (which has no Clerk).


And as expected, Oracle throws the mutating table at us.

In the next installment I'll discuss common "workarounds" for this issue. They really aren't workarounds, but everyone calls them that...

Tuesday, November 22, 2011

So what triggers are we talking about?

Here's an overview of the talk.


This post will deal with the first bullet: the scope, what triggers are we talking about? And what triggers are we not talking about. Then there will probably be a few posts on 'properties' of the triggers, most notably I will spend some time on explaining the infamous mutating table error. Next we move on to a high level classification of use-cases of triggers. And talk a bit about why some of these might be considered harmful. Finally we will explain, in detail, the one use case where triggers are the perfect means to achieve the end.


The most common types of triggers, the ones everybody probably used at some time in their pl/sql programming career, are the "DML event" triggers. As above slide shows, there are twelve of such triggers: four each for every type of DML statement, Insert, Update and Delete. These triggers will be fired by the DBMS before a DML statement, after a DML statement, and before/after each affected row of the triggering DML statement. Stuff you all know right? The big difference between the statement-level and the row-level triggers, is that the latter ones can inspect (and change) the column-values of the current row that is being inserted/deleted/updated.


So here's an example. Suppose we have an EMP table that holds employees, and we want to execute an update statement that will increase the salary of all clerks (see update statement above). This will for the given table affect three rows. The before update statement trigger will then fire once. Next for each affected row the before and after row triggers will fire. And finally the after statement trigger will fire.

So if we create the four update triggers on the EMP table as follows:


We will get the following output (given we have set serveroutput to on).


Nothing new so far, I hope. Before we continue I just wanted to mention that as of Oracle11G we have the compound trigger feature.


 A compound trigger enables us to create the four update triggers above all in one go as follows:


Now, do you know why Oracle introduced compound triggers? I'll talk about that in a later post. What I'll say now is this: compound triggers are the answer of an enhancement request made by you (the pl/sql community) a long time ago. Because you have always hit a certain programming pattern with regards to triggers, when using them for a certain use case. Again I'll explain this in more detail in a future post.

So these are the triggers that are in-scope of this blog: DML event triggers, be them created individually or four in one go using the compound trigger mechanism.

Oracle DBMS offers us with many more triggers:


All of which will not be the matter of subject for this blog.

Stay tuned.


Starting this blog

So I finally was able to find some time to start this blog. I mentioned before that I might do this in the asktom discussion here. This blog will be about database triggers. Oracle database triggers that is. I'm a strong advocate of database triggers. Now hold your horses... I have been using database triggers ever since Oracle7 in the early nineties brought them to us. But I've been using them for a very specific reason, and have been presenting about this ever since. I will blog about this in a very similar way as I blogged about TheHelsinkiDeclaration, by documenting the presentation I have frequently given now at various Oracle related seminars and user group meetings.

Let me present exhibit A, showing that I have been presenting about a particular use-case for using database triggers for quite some time now. Here's the front cover of the 1994 International Oracle User Week (that's how Oracle OpenWorld used to be called in the nineties) conference.

And here's a part of the contents.


Paper #10, "Business Rules: Classification and Implementation (...)". And as you can see I was employed by Oracle back then. Note that Steve (Feuerstein) was already presenting in those days too. Now some of you can already guess what my talk was about. The use-case for triggers that I'm advocating, that is. For those of you who don't, stay tuned, I will explain the use-case and my arguments for it, in the posts that will follow.

So that was (almost) two centuries ago, business rules presentations by T(oon) K(oppelaars)...

Fast forward to this millenium. Here's another TK on triggers in the database:

From http://tkyte.blogspot.com/2006/09/classic-example-of-why-i-despise.html
"Triggers are so abused and so used inappropriately, I'd rather live without them."
"There are no times triggers cannot be avoided. They are purely a convenience that is overused, abused, and improperly used."

And from: http://tkyte.blogspot.com/2007/03/dreaded-others-then-null-strikes-again.html
[on why he doesn't like triggers] "Because I hate being surprised or tricked. And triggers are all about trickery and surprises."

The other "TK" of course is Tom Kyte from asktom.oracle.com. Since you are reading this blog, I"m sure Tom Kyte needs no further introduction. He makes no secret of the fact that he has strong reservations about the database feature called "triggers". And this opinion is wholeheartedly embraced by the PL/SQL community at large. The default response whenever a question on database triggers pops up on  PL/SQL OTN forum is:

"Triggers should be avoided as much as possible."
"Don't use them, they are bad."
"Triggers are considered harmful."

Well excuse me, but I find this general consensus harmful.

At this time I should disclose that I run a company called RuleGen, which is also the name of a product that generates Oracle trigger code to implement business rules.

A year ago I decided to present on why I find this general consensus harmful. And this resulted in a presentation at Hotsos 2011. The title of this presentation is: "Triggers considered harmful, considered harmful". And that's not a typo, no. I'm using the "considered harmful" concept which was introduced by Edsger Dijkstra when he published a paper called "Goto statement considered harmful". And as you can read here, it gave cause to quite some 'follow up' papers with the same "typo" in their titles.

Recently I also delivered this presentation at the BGOUG. And it has been submitted for presentation at the upcoming ODTUG. I'll let you know if it got accepted, so you can attend ODTUG and hear a live version of this blog.

So much for the introduction. Future posts we will go into the details of why I find triggers not harmful at all given they are used for a very specific use-case.