06 July 2011

Referential Integrity - Primary and Foreign Keys

Sally's schema design

The Customer Table

Having tackled the Products table, it is time to move onto the Customer table. You can see it is going to be more of the same, but there are some interesting problems.

Primary Keys

Each customer in the table has an identifier, called the Customer_ID. This ID is what the application will use to refer to the customer, so it must be unique, but there is slightly more to say that than.

Remember back to the Products table - it had a productid column too, but we ignored it and created a unique index on the ProductName column.

For the products table, it has two unique identifiers - the ProductID, which will generally be used to look up specific products and the ProductName, which is a human readable version of the Product ID, with a business rule that states it must be unique.

In all three cases (customerid, productname, and product_id) a unique index along with a not null constraint is a perfectly valid way to ensure the data is correct, but there is another type of constraint called a Primary Key.

A Primary key behaves just like a unique index, but it has a few more rules:

No columns in the Primary Key can ever contain null values
A table can have at most 1 Primary key.

According to the purists, all database tables should have a primary key defined, and it is the preferred method of accessing a table. In Oracle, behind a Primary Key is always an index (usually a unique index which is exactly the same as any other unique index) so accessing a table via the Primary Key and a unique index is just as efficient. In many ways, a Primary Key is not strictly necessary, but it is best to have one on each table if possible.

Looking at the other fields in the Customer table, it is reasonable to assume that forename, surname, address_1 and zip code all must have values, which can be easily solved with a few not null constraints.

The comment Sally made about the email column is interesting, but that problem will have to wait until Tom comes back from lunch so lets create the table with what we know so far:

ORA11G> create table customer (
  customer_id integer        not null,
  forename    varchar2(255)  not null,
  surname     varchar2(255)  not null,
  email       varchar2(255),
  address_1   varchar2(255)  not null,
  address_2   varchar2(255),
  address_3   varchar2(255),
  address_4   varchar2(255),
  address_5   varchar2(255),
  zipcode     varchar2(10)   not null
);

SQL11G> create unique index customer_uk on customer (customer_id);

SQL11G> alter table customer add constraint customer_pk
          primary key (customer_id) using index;

Notice how a unique index was created, and then the primary key was added to the table - this is just good practice as it allows you to name the index the Primary Key uses behind the scenes - if you don't do it this way, Oracle will create an index quietly behind your back, and it will have a cryptic name such as SYS4343287655!

Regular Expressions

Sally's note states that the email field can be null, but if it exists it should look like a valid email address.

You: Tom, glad to see you are back from lunch - I am stuck once again!

Tom: Well that's not surprising! Only joking, what is giving you problems?

You: Sally wants me to make sure that any values put into the email column actually look like an email address. I take it we have finally found something that I have to do in Java?

Tom: Well, probably not actually - how would you check an email address looks valid in Java?

You: Using regular expressions of course, they rock!

Tom: I agree, they do rock, and Oracle knows this and has supported them from version 10g too! What you want is a check constraint that uses a regular expression.

You: There really is nothing this database cannot do ... Thanks Tom.

Tom: No problem, and good work figuring out all that stuff about Primary Keys!

So what makes a valid email address?

For this example, lets keep it simple, an email address should have the format of:

<one or more letters>@<one or more letters>

Written as a regular expression, this would be:

/^[a-zA-Z]+@[a-zA-Z]+$/

There are regular expressions that validate real email addresses, and the one above will fail for emails containing digits, dots, underscores and various characters, but this is not a regular expressions class, so please forgive the simplification.

So how do we make Oracle reject any email addresses that don't match the format? According to Tom, we can use a regular expression in a check constraint to do just that. To complicate things slightly, Oracle has a slightly different regex syntax to the one give about, but it can be translated pretty simply:

SQL11G> alter table customer add constraint customer_email_ck
  check (regexp_like(email,'^[[:alpha:]]+@[[:alpha:]]+$'))

Have a go at inserting some rows with email addresses that meet and do not meet the format to see what happens.

The Orders Table

So far, all the constraints discussed have involved checks within the same table - checking values of columns, uniqueness, data formats etc. However, for the Orders table, Sally has stated that each row in the Orders table must contain a valid customer.

In relational databases, it is a pretty common requirement that a row in one table references a row in another table. In this case, an Order is not valid unless it has a Customer_ID associated with it. Your first thought may be to create a not null constraint on the column. That would ensure a value is entered into the column, but this could be any numerical value - it doesn't fulfill Sally's requirement of ensuring an order contains a valid customer.

A valid customer can be defined as one that exists in the Customer table. So, for each row that is inserted into the Orders table it needs to be checked to ensure there is a corresponding row in the Customer table, with the two rows linked by the Customer_ID.

Luckily Oracle has a solution for this - Foreign Key constraints.

Lets create the orders table, and then create a foreign key to link it to the Customer table:

ORA11G> create table orders (
          order_id     integer     not null,
          customer_id  integer     not null,
          order_date   date        not null,
          order_status varchar2(9) default 'new' not null
);

-- We need the primary key to uniquely identify orders
ORA11G> create unique index orders_uk1 on orders (order_id)

ORA11G> alter table orders add constraint orders_pk
          primary key (order_id) using index;

-- Now add the foreign key, notice how it references the customer table
SQL11G> alter table orders add constraint orders_customer_fk
          foreign key (customer_id) references customer(customer_id);

The foreign key definition clearly states which table and columns it references. Have a go at inserting some Order records with a Customer_ID that does not exist in the Customer table to ensure Oracle rejects them.

Check or Foreign Key?

There is one more item of note on the Orders table. The Order_Status column is only allowed to contain one of two values, 'new' or 'completed'.

For a small set of values, the versatile check constraint is perfect to enforce this rule:

SQL11G> alter table orders add constraint orders_status_ck
  check (order_status in ('new', 'completed'));

Can you see the similarity between checking a valid customer exists, and check the Order_Status is in a list of values?

Both checks are against a list of allowed values, but the list of allowed customers is large and very dynamic - new customers will be created all the time, so enforcing a valid customer with a check constraint is not practical. However, for two status values, a check constraint is ideal.

Depending on the requirements, it may make sense to have another table, Order_Statuses, that contains the list of allowed status values for an order, and link it back to the Orders table with a foreign key - either method will work well.

The Order Items Table

Phew, finally we come to the last table! All that is left is to consider the Order_Items table. There is nothing new here - so this should be a quick one.

An Order_Item record should always be linked to an Order that exists, which is another Foreign Key.

An Order_Item record should only be linked to Products that exist - again a foreign key.

SQL11G> create table order_items (
          order_item_id integer not null,
          product_id    integer not null,
          order_id      integer not null
);

SQL11G> create unique index order_items_uk on order_items (order_item_id);

SQL11G> alter table order_items add constraint order_items_pk
          primary key (order_item_id) using index;

SQL11G> alter table order_items add constraint order_items_product_fk
          foreign key (product_id) references products(product_id);

ERROR at line 2:
ORA-02270: no matching unique or primary key for this column-list

This error occurred because we didn't create a primary key on the products table, and it makes a lot of sense. For each row inserted into the OrderItems table, Oracle will have to check if the ProductID exists in the Products table. If there is no index on that column, then that check will not be efficient and so Oracle refuses to enable the foreign key constraint. Adding a primary key to Products will soon sort things out:

SQL11G> create unique index products_uk2 on products(product_id);

SQL11G> alter table products add constraint products_pk
          primary key (product_id);

Sql11g> alter table order_items add constraint order_items_orders_fk
          foreign key (order_id) references orders(order_id);

SQL11G> alter table order_items add constraint order_items_product_fk
          foreign key (product_id) references products(product_id);

Summary

Finally, Sally's database design is complete, and for this company, invalid Orders will soon be a thing of the past. Even in this simple example, we have touched upon just about all of the RI constraints available in Oracle:

A unique index is used to ensure a column or set of columns is unique within the table.
Watch out for NULLs in unique indexes, and in general.
The Primary Key is a slightly more strict unique index, and ideally every table should have one.
Not Null constraints are used to enforce the presence of a value.
Check constraints are very versatile, and can be used to compare values in the same row, check the format of a value or ensure a value is within a list of allowed values.
Foreign Key constraints are used when a row in one table requires the presence of a related row in another table.

All that is left is to dispel some performance myths, but that can wait until next time!

This post is part of a series:

Previous in series | Series Index | Next in series

Better At Oracle