19 July 2011

Where are the DDL and Build Scripts?

Boss: Bad news guys - someone dropped the wrong database last night, the development database you have been using is gone.

Team: Eh, just how gone is gone?

Boss: Totally vaporised, there wasn't even any backups - sure the last round of cost cutting left us with no cash to buy the hardware

Team: But all our code was on there - that's a months work, we can never release next week now!

Boss: Don't worry, Jim has already provisioned a new database, it's there ready for you to build your application in ...

Team: Build?! I don't think you understand, the code was ONLY in the database, we assumed it was safe, backed up, safe!

Boss: I hope you're joking ...

This may seem like a joke, but I have seen this happen, and trust me when I say it will only happen to you once! I think I know why too - Database IDEs (Integrated Development Environment) such as Toad and SQL Developer. These are great tools, probably essential if you want to be productive, but they are easily abused.

The DDL Wizard

The biggest crime is the GUI for creating new database objects. All good IDEs have a 'create table' button that pops up a wizard. Creating a table is as easy as typing in some column names, selecting a column type and clicking a few check boxes, same goes for an index, sequence and so on.

Now comes the problem - you have used a tool that generated some code and ran it for you behind the scenes, but you don't have that code. It isn't saved anywhere on your file system and if someone runs the wrong command and drops your table, then it's gone for good.

The only place that DDL code exists is inside the database you are using. If you are disciplined, then you might generate the DDL for your table and save it away somewhere, but it's easy to forget.

Creating one table isn't all that difficult, and you can easily do it again, but creating 20 tables, with all their constraints, indexes and foreign keys - that could take a few hours, at least.

Moving Code Onward

At some point, it will come time to move all your fancy database changes to a new environment, maybe another development environment, test or even production, which is where things get really interesting, because you have no code, and do you remember which objects you altered since the last release?

The IDEs have a really great feature - schema diffs! All you need to do is point it at your development instance where all the changes are safely stored, and point it at the environment you want to changes applied to and as quickly as you can say Bad Practice, it will generate you a nice script with all the DDL code required to move the changes to the new environment.

This almost seems like a good idea, until we dig deeper ...

First, you are unlikely to be allowed to 'diff' against production, so you need to be sure the target environment looks exactly like production, and sometimes it is not.

Second, if your application or team is any size, then each release will consist of many changes, and you can almost guarantee not all of them will make it to production as the delivery managers and test teams do their thing - how to you unpick a change if it is delivered like this, or even know what the change is associated with what?

Thirdly, when your DBA messes up and drops your database, or the server implodes, or your network to that data centre drops out, (or ... you get the picture) you are totally hosed, and guaranteed to be up all night trying to piece things back together from memory.

DDL is Code

DDL is code, plain and simple. When you run it in Oracle, it creates a database schema which serves a purpose, not unlike compiling a C program. If you work as a developer then Best Practices 101 states all that C code goes into a version control system, it doesn't matter which one, but Subversion and Git are good, and free.

If I had my way, I would remove all the buttons from the GUI that allow you to change the schema, and force everyone to write the DDL code in a file they save on disk and check into version control at the end of each day.

Forget the DDL wizards, write DDL scripts to create and change your schema, and check them into version control

The Database Build Script

Given an empty database, could you run a single command (or less than 5 commands) to build the entire database, and be confident it would work? For most teams the answer is no.

However, if you worked on a C, Java, or .net project and asked the same question, 9 times out of 10 the answer would be yes.

This is confusing, as a database build script is easy compared to compiling and building a complex application, but is something that is missing from a lot of projects.

Consider a new application. On day 1, there is no database, so you code up a script to create all the tables for version 1:

001_initial_db_build.sql

Then, as time goes on some changes are required - a database is not like application code. A new version of application code basically destroys the old version, if you need to change a procedure that already exists, you just change the original file and recompile - simple.

If you need to change the definition of a database table, you cannot just change the original file that created it - you need to migrate the table to a new version, so you create a new script that contains just the changes:

002_version_2_changes.sql

If the database is at version 1, then this new script will move it to version 2. If you have an empty database, running each script in turn is your build script.

Migrations

By managing the database as a series of migrations, you solve two problems at once - any version of the schema can be moved to any other version of the schema easily. Given an empty database, it is easy to build the application by running each migration in turn.

If you are really clever, you will have one migration per bug fix or new feature - then when the project manager decides to pull that part out of the release it is simple to exclude that migration!

Database schemas migrate from one version to the next, and by managing the DDL scripts each as separate migrations, releases and builds become simple.

Stored Procedures

I cannot argue that a good IDE makes developing stored procedure code much easier, and few people would work without one, but if you're not careful, the IDE can cause some problems.

When a developer needs to change an existing stored procedure, a typical work flow is:

Using the schema browser, locate the package you want to change
Double click on it to open it in the editor - the latest version will be loaded from the database.
Make changes and compile as required
Go home ...

This works fine, most of the time, but again all the code and changes are not stored in a file anywhere on your machine, only in the often shared database. All the problems around the database going away are still present, putting your code at risk.

If your development environment is shared among many developers, the chances of losing your code are even greater. Imagine this scenario:

Jim: Arrives in work and immediately opens a procedure in his IDE

Tom: Arrives in work later, and opens the same procedure, but before making any changes gets called into an all day meeting

Jim: Makes lots of changes all day, and heads home happy they are all working.

Tom: Returns from his meeting after Jim has gone home makes a few changes and compiles the procedure

How do you think Jim's changes will look when he returns tomorrow?

As Tom opened the procedure before Jim made any changes, he had the original code open in his IDE. Then Jim make lots of changes and went home. Tom then made a simple change and wiped the changes Jim made from the system when he compiled - all the work Jim did is lost forever! If you work this way, whoever complies last wins, which is not good.

It is easy for the IDE to encourage a bad work flow, and unless you are aware of it, you may fall into this trap. What Jim and Tom should have done is locate the code in the version control system, check it out change it, save it locally and then compile it into the database.

Working this way, the worst case is that they each have their own changes stored on their local system, and may even find out they are both working on the same code when they attempt to check the code in.

Using Version Control Badly

The IDE work flow also encourages bad practices when using version control. I have seen teams work for weeks on code, loading it from the database, changing and recompiling without ever saving it locally or checking it in.

Come release time, all the code is exported from the database as a massive exercise, and checked into version control. Doing this, you lose many of the benefits of the version control system, such as rolling back small changes, detecting conflicts early and simply having a backup of the code.

Sandboxes

When developing Java or C or .net code, each developer tends to have their own development sandbox. You can change or break anything you like in the sandbox, as nobody else can see it. Most large scale database projects share a single development database between many developers, making development much more difficult.

There can be various reasons for this, but the common one is that developers find it too difficult to set-up Oracle on their local systems, and it is not practical to give every developer their own Oracle instance due to memory and management constraints.

One solution to this problem is to have one Oracle instance, but give each developer their own schema. For this to work, the application must be designed to run in any schema, which is a good thing anyway, but it has a side effect of enforcing migrations, build scripts and correct use of version control.

Each developer can work in isolation, but is forced to write migrations to move their code into the main shared development schema where changes are merged and tested. An easy build mechanism is also essential to set up individual development environments.

Summary

Like many things, doing things the correct way seems like a lot of work at first, but it isn't really. Just losing one days work for a team is much more painful than coding and versioning DDL migration scripts, and if done carefully there are powerful side effects, such as selectively removing parts of a release, developer sandboxes and eventually continuous integration, automated testing and all that other good stuff the rest of the software development world enjoys.

This post is part of a series:

Previous in series | Series Index | Next in series

Better At Oracle