Friday, December 30, 2011

Look mom: a mutating table error without a trigger!

Did you ever wonder why it says 'trigger/function' in the error message of ORA-04091?


ORA-04091: table ... is mutating, trigger/function may not see it


We know (and understand) by now that a row trigger cannot read a mutating table, but what's the /function all about in above message text? Well there is a completely different scenario that has nothing to do with triggers, where you can run into this error. I thought to spend a short post on that first, so that you really fully understand that ORA-04091 is your friend. And again in this scenario it prevents you from coding logic that might work differently tomorrow than from what it did today.

Let's quickly show you this scenario. We create the EMP table again and insert some test data into it.


drop table EMP;
create table EMP
(EMPNO    number(3,0)  not null primary key
,ENAME    varchar2(20) not null
,SAL      number(4,0)  not null)
/
insert into emp(empno,ename,sal) values(100,'Toon',4000);
insert into emp(empno,ename,sal) values(101,'Izaak',5000);
insert into emp(empno,ename,sal) values(102,'Marcel',7000);
insert into emp(empno,ename,sal) values(103,'Rene',8000);
commit;


Our HR department decided that above salaries should be aligned a bit more in the year 2012. And in their wisdom they decided that the following update should be run at the end of the year:


update EMP e1 set e1.SAL =
 e1.SAL + ((select avg(e2.SAL) from EMP e2) - e1.SAL)/2
/

This update brings everybody's salary a bit more towards the current average salary of all employees. The dba does a testrun of the script in the acceptance instance:



It runs well, HR confirms that it does what it was intended to do, and dba is fine with this in production.

Here come the QA department though: they need to 'ok' the running of this script in production too. So they inspect the script with all QC guidelines in mind. Result of their review is that they won't allow this. There is distinct 'business logic' in there that needs to be removed from the statement and implemented in a stored function, so that it 'can be reused sometime in the future' (I've seen this happen in real companies...). They mandate that a function be implemented such that the update statement will be of the following form:


update EMP e set e.SAL = f_new_sal(e.SAL)
/

So the developers pump out the following function for this:


create or replace function f_new_sal
(p_current_sal in number) return number as
--
pl_avg_sal number;
--
begin
  --
  select avg(SAL) into pl_avg_sal
    from EMP;
  --
  return p_current_sal + (pl_avg_sal - p_current_sal)/2;
  --
end;
/


Everybody is happy. The dba executes the testrun again:



WTF? Oracle throws a mutating table error? But there's no trigger involved here...

Why is this happening?

If you understood the explanation of this error in my previous posts sofar, you should be able to figure this one out. Oracle again detects that we seem to be selecting from a table that's currently mutating. Think about what is happening here:

  • The update starts identifying the rows that need their SAL column value changed;
  • It does this in some undetermined order (!);
  • When the first rows is found, it invokes our function to determine the new SAL column value for it;
  • Our function queries the EMP table, which is currently mutating: bingo we hit the error.

When Oracle steps out of the SQL-engine that's currently executing the update statement, and invokes the function, then this function -- just like an after row update trigger would -- sees the intermediate states of EMP as they exist during execution of the update statement. This implies that the return value of our function invocations heavily depend on the order in which the rows happen to be updated.

Wanna see this happening? Let's use our database link "workaround" to prevent the mutating error. We create a loopback database link and change our function accordingly:

create or replace function f_new_sal
(p_current_sal in number) return number as
--
pl_avg_sal number;
--
begin
  --
  select avg(SAL) into pl_avg_sal
  from EMP@loopback;                -- Here: added db-link.
  --
  return p_current_sal + (pl_avg_sal - p_current_sal)/2;
  --
end;
/

Dba can now run the update without hitting the error:



But look at the SAL values, only Toon's salary is updated correctly. These are not accepted by HR...

And these values will be different if we force a different order of updating the rows, for instance like this:

update (select * from EMP order by EMPNO desc) e 
  set e.SAL = f_new_sal(e.SAL)
/

The proof is in eating the pudding:



Now Rene's salary is the only one being updated correctly.

So again, don't forget: ORA-04091 is your friend.

Note that in the original version of the update statement (without the function), Oracle ensures that the repeated 'invocations' of the subquery (select AVG(e2.SAL) etc.) all see the same snapshot of the database: the one that existed at the start of the update statement. So there is no indeterministic behavior in that case.

A fix for the issue demonstrated could be to just add the autonomous transaction pragma to the function. We would then get the same (deterministic) behavior as the original version of the update statement.
A better fix in this case though would be to not have the function at all, and just go with the original update statement.

4 comments:

  1. Toon,

    What [formal prescription] is it that causes the "SELECT AVG(...) ..." to be evaluated only once in the original SQL ?

    ReplyDelete
  2. Erwin,

    Did I say it was evaluated once? It might not be. Oracle ensures what is called "statement-level read consistency", which would cause repeated executions of that subquery to see the same snapshot of the database. It seems that Oracle does not consider the statements executed in the function as being an integral part of the update-statement. For if it did, these should also see the same snapshot of the database (the one that existed just prior to start of the update statement execution).

    From the documentation at: http://docs.oracle.com/cd/B19306_01/server.102/b14220/intro.htm

    "[read consistency] Guarantees that the set of data seen by a statement is consistent with respect to a single point in time and does not change during statement execution (statement-level read consistency)"

    I'm not sure how to answer your question using a 'formal prescription.'

    One could argue why Oracle doesn't ensure this 'statement-level read consistency' for repeated function calls that appear inside a SQL statement. It could be considered a bug as far as I'm concerned. But this is the way it currently works.

    Toon

    ReplyDelete
  3. Okay. Is "statement-level read consistency prescribed by the standard" ? (that's what I meant by "formal prescription".)

    I guess my main point is actually this : while the title claims the talk is about triggers (SQL triggers), your proof/pudding seems to be about _Oracle_ SQL triggers. Not same thing !!!

    ReplyDelete
  4. I've modified the description under the title. It now reflects that I'm talking about Oracle.

    ReplyDelete