A collection of tutorials, code and tools to help you get better using Oracle
10 January 2013
I thought I had figured out all the pitfalls of changing stored procedures and functions on-line, and then it happened again:
ORA-04068: existing state of packages has been discarded
This occurred when upgrading a single stored procedure - no packages, no complex interdependencies and on Oracle 11.2.0.3. The procedure I changed was not referenced by any other procedures, and it was being invoked from Java on a fairly busy system.
I am pretty sure this shouldn't happen, so I decided to create a test case to try and prove it one way or another.
First of all I need a procedure to play with - pretty much the simplest procedure possible (p1.sql):
create or replace procedure p1
as
begin
null;
end;
/
And I need a second procedure to change on-line (p1_1.sql):
create or replace procedure p1
as
begin
null;
null;
end;
/
Notice the second 'NULL;' call in the second procedure, which makes it different from the first version. Then I need some Java code to call this procedure in a loop - again I am going for the most simple Java code I can:
import java.sql.*;
import oracle.jdbc.*;
public class SimpleProc {
private static Connection conn;
public static void main(String[] args)
throws ClassNotFoundException, SQLException, InterruptedException
{
connect();
CallableStatement stmt = conn.prepareCall("begin p1(); end;");
int count = 0;
while (true) {
stmt.execute();
count ++;
System.out.println ("Made procedure call number "+ count);
Thread.sleep(1000);
}
}
public static void connect()
throws ClassNotFoundException, SQLException
{
DriverManager.registerDriver
(new oracle.jdbc.driver.OracleDriver());
String url = "jdbc:oracle:thin:@//localhost/local11gr2.world";
// jdbc:oracle:thin:@//host:port/service
conn = DriverManager.getConnection(url,"hotplsql","hotplsql");
conn.setAutoCommit(false);
}
}
This code prepares a call to the procedure P1, and then executes it in a loop, printing out the execution count after each execution. Then it sleeps for a second before executing the procedure again, over and over until the process is killed. One thing to note is that the call to the procedure is only prepared one time, but the procedure is executed many times, which is a common pattern in a well written application. If I start the Java running, and open a SQLPLUS session, I find that I can change between the two version of my stored procedure without causing any errors in the Java code:
SQL> @p1
Procedure created.
SQL> @p1_1
Procedure created.
SQL> @p1
Procedure created.
SQL> @p1_1
Procedure created.
SQL> @p1
Procedure created.
So this looks good, it would seem I can recreate the procedure with no problems, which is what I thought when I suggested changing this procedure on our production system.
It turns out not to be that simple. If I make a small change to the Java code, and remove the Thread.sleep(1000) line, things change dramatically. As soon as I compile the new version of my procedure, I get this error:
Made procedure call number 21368
Made procedure call number 21369
Made procedure call number 21370
Made procedure call number 21371
Exception in thread "main" java.sql.SQLException: ORA-04068: existing state of packages has been discarded
ORA-04065: not executed, altered or dropped stored procedure "HOTPLSQL.P1"
ORA-06508: PL/SQL: could not find program unit being called: "HOTPLSQL.P1"
ORA-06512: at line 1
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:837)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:445)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
at oracle.jdbc.driver.T4CCallableStatement.doOall8(T4CCallableStatement.java:204)
at oracle.jdbc.driver.T4CCallableStatement.executeForRows(T4CCallableStatement.java:1007)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1315)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3576)
at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3677)
at oracle.jdbc.driver.OracleCallableStatement.execute(OracleCallableStatement.java:4694)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1086)
at SimpleProc.main(SimpleProc.java:16)
Which is exactly what happened when I put this procedure into production. The error isn't a total disaster, as if I had my call wrapped in a try-catch block, then on the next execution it will work correctly, but it still means some transactions will fail.
In my opinion there is something strange going on here - when there is a bit of a delay between executions Oracle can handle the procedure recompiling, but if the execution happens too quickly after the recompile, it fails. Tracing the session reveals some interesting information. With a delay in the Java code, the trace file looks like this:
PARSING IN CURSOR #390309088 len=16 dep=0 uid=223 oct=47 lid=223 tim=264320251394 hv=3303240157 ad='7ff1fa66878' sqlid='36ytppb2f6wfx'
begin p1(); end;
END OF STMT
EXEC #390309088:c=0,e=197,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264320251389
WAIT #390309088: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264320251979
*** 2013-01-10 10:21:00.532
WAIT #390309088: nam='SQL*Net message from client' ela= 997998 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264321250098
EXEC #390309088:c=0,e=84,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264321250370
WAIT #390309088: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264321250444
*** 2013-01-10 10:21:01.532
WAIT #390309088: nam='SQL*Net message from client' ela= 999933 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264322250439
EXEC #390309088:c=0,e=179,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264322251647
WAIT #390309088: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264322251825
<snip>
*** 2013-01-10 10:21:11.531
WAIT #390309088: nam='SQL*Net message from client' ela= 998423 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264332250242
**> EXEC #390309088:c=0,e=3213,p=0,cr=0,cu=0,mis=1,r=1,dep=0,og=1,plh=0,tim=264332253903
WAIT #390309088: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264332254126
*** 2013-01-10 10:21:12.531
WAIT #390309088: nam='SQL*Net message from client' ela= 996845 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264333251088
EXEC #390309088:c=0,e=182,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264333251703
WAIT #390309088: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264333251880
Notice that the cursor is parsed and then is executed many times with no further parsing (mis=0 in the EXEC lines). Then after I recompile the procedure, we get a EXEC line with mis=1 before it goes back to executing without any misses. So it appears that Oracle is smart enough to notice the procedure has changed, and then gracefully reparse and continue on without an error.
If I remove the sleep command from the Java code the trace file looks different:
PARSING IN CURSOR #221881568 len=16 dep=0 uid=223 oct=47 lid=223 tim=264706629638 hv=3303240157 ad='7ff1fa66878' sqlid='36ytppb2f6wfx'
begin p1(); end;
END OF STMT
EXEC #221881568:c=0,e=88,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264706629636
WAIT #221881568: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264706629773
<snip>
EXEC #221881568:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264712941846
WAIT #221881568: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264712941882
WAIT #221881568: nam='SQL*Net message from client' ela= 149 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264712942058
EXEC #221881568:c=0,e=33,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=264712942135
WAIT #221881568: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264712942188
WAIT #221881568: nam='SQL*Net message from client' ela= 1626 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=264712943853
**> WAIT #221881568: nam='library cache pin' ela= 10196 handle address=8792328936552 pin address=8792292571936 100*mode+namespace=842320396222466 obj#=-1 tim=264712954206
**> EXEC #221881568:c=0,e=10380,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=264712954356
**> ERROR #3:err=4068 tim=264712954394
In this trace file, we can see the process has waited on 'library cache pin', which didn't happen before. This blocks the execution for a short time and is caused by my other session recompiling the stored procedure. You cannot execute a procedure while it is being compiled, and you cannot compile a procedure while it is being executed either, which is what this pin enforces. As soon as the recompile completes, the library cache pin is released and execution can continue. However this time, instead of the EXEC reporting a library cache miss, it attempts to execute the procedure and fails.
This seems like a bug - maybe the cache pin is released too soon, before all the old cursor resources are cleaned up, or maybe once a process gets as far as being blocked by a cache pin, it has passed the point where it can figure out it needs to reparse the cursor. Either way it is disappointing as it means you cannot safely recompile a procedure in a busy live system (unless of course you use Edition Based Redefinition) without risking some errors.