19 March 2012

If exists update else insert

A frequent occurrence when writing database procedures is to handle a scenario where given a set of fields, for example a new employee record, update the existing employee record if it exists otherwise create it.

Often this problem is solved with a select statement and then an IF statement, eg:

declare
  v_exists varchar2(1) := 'F';
begin
  begin
    select 'T'
    into v_exists
    from employee
    where employee_id := i_employee_id;
  exception
    when no_data_found then
      null;
  end;
  if v_exists = 'T' then
    update employee
    set ...
    where employee_id = i_employee_id
  else
    insert ( ... ) 
    values( ... )
  end if;
end;

One Too Many SQL Queries

The approach above works fine, except that it requires one too many SQL queries. It violates the mantra of performant database programming, which says:

Do it in as few SQL queries as possible

There are two ways of solving the requirements that usually eliminate the extra select.

Optimise For The Most Common

When coding a procedure, you should try and get an idea of how the procedure will be used. In this case, the question to ask whether the procedure will mostly be used to update existing employee records, or insert new ones. Maybe there will be no clear winner, but there often is.

Primarily Insert

If the procedure is expected to mostly insert new records and rarely update existing ones, then use the following pattern (assuming there are unique constraints on the database to prevent duplicate employees being created):

begin
  begin
    insert into employees ( ... )
    values ( ... )
  exception
    when dup_val_on_index then
      update employees 
      set  ....
      where employee_id = i_employee_id
  end;
end;

This code relies of the database to tell you the record already exists based on the integrity constraints on the table, which is much more efficient and less error prone that attempting to do it yourself.

Primarily Update

If the more likely case is that existing records will be updated, the code below is better:

begin
  update employees
  set .... 
  where employee_id = i_employee_id;

  if sql%rowcount = 0 then
    -- no rows were updated, so the record does not exist
    insert into employees (  ... )
    values ( .... );
  end if;
end;

When any SQL statement is executed in PLSQL, the SQL%ROWCOUNT variable will contain the number of rows affected (in this case updated) by the most recent query. In this case, if it contain zero, it means the update failed to find any rows to update and therefore the record needs to be inserted instead.

Use Merge

You can get away with even less code that this. Oracle 9i introduced the MERGE statement. The MERGE statement takes a list of records which are usually in a staging table, and adds them to a master table. If the record exists in the master table, it should be updated with the new values in the staging table, otherwise insert the record from the staging table.

Sounds pretty similar the problem outlined above, except that merge wants the new records to be in a staging table. Luckily enough, we can fake a staging table using DUAL:

create table employees (
  employee_id integer not null,
  employee_name varchar2(100) not null);

alter table employees add constraint employee_pk
  primary key (employee_id);

create or replace procedure merge_employee(
  i_emp_id in integer,
  i_emp_name in varchar2
)
is
begin
  merge into employees e
  using (select i_emp_id id, i_emp_name name from dual) s
  on (e.employee_id = s.id)
  when matched then update set employee_name = s.name
  when not matched then insert (employee_id, employee_name) 
    values (s.id, s.name);
end;
/

With the merge statement, we now have a single more complex query instead of the 3 we started with, or the two of the refined approach.

However, for the case where inserts are most popular, I believe the merge will perform slightly worse than the code in the Primarily Inserts section, as it still checks if the record exists before attempting to merge it, as the explain plan below shows:

------------------------------------------------------
| Id  | Operation                      | Nam         |
------------------------------------------------------
|   0 | MERGE STATEMENT                |             |
|   1 |  MERGE                         | EMPLOYEES   |
|   2 |   VIEW                         |             |
|   3 |    NESTED LOOPS OUTER          |             |
|   4 |     FAST DUAL                  |             |
|   5 |     TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |
|*  6 |      INDEX UNIQUE SCAN         | EMPLOYEE_PK |
------------------------------------------------------

However, there is a chance that it is better optimized inside Oracle, which leaves an interesting theory to investigate.

This post is part of a series:

Previous in series | Series Index | No further articles

Better At Oracle