What Is a Key?

Date last modified: 09 March, 2008

Back to home page.

I find every year that some students have difficulty fully understanding the relational database concept of key.  Here are some notes that might help if you are among those students.

In order to explain key, I need to introduce the term relation variable, normally abbreviated to relvarM359 unfortunately does not use this term, which was introduced in the 1990s by Chris Date and myself, not for a new concept but in order to make clear an important distinction that the previous literature had failed to make clear.

A relation is not something that can vary over time, any more than the number 3 (for example), is such a thing.  But a relational database appears to consist of relations that do vary over time.  Those objects that make up a database are not really relations; rather, they are variables, each of which, so long as it exists, contains a relation—not always the same relation, but always a relation with the same heading (see below).  Because such a variable always contains a relation, it is a relation variable, just as a variable that is always assigned an integer is an integer variable.  It is because "relation variable" is a bit of a mouthful, and because the construct needs to be referred to frequently, that we use the abbreviation relvar.

Now I can explain what a key is, for it is a property of a relvar.  (Where the literature refers to it as a property of a relation, it really means relvar.  And that goes for the M359 course text too.)

Now, a relvar has a heading: a set of attributes.  You can see examples of headings in your M359 Relational headings summary card.  For example, look at the following entry, in the middle of the section headed The university relational headings:

Tutors (Studentid, CourseCode, StaffNumber)

I explain this entry as follows

(a)   Tutors is the name of one of the relvars in the University database.

(b)   The attributes of Tutors are named StudentId, CourseCode, and StaffNumber.  This means that every relation that is ever assigned to Tutors must be a relation whose heading consists of those attributes.

(c)   The attribute names StudentId and CourseCode are underlined. This means that Tutors has a key consisting of those two attributes.

Now I can explain what a key is (and what it is not!):

1.     A key is not an attribute!  Rather, it is a set of attributes.  Each attribute of a key is of course also an attribute of the relvar to which the key "belongs".  Put more concisely, in mathematical parlance, a key of relvar rv is a subset of rv's heading (not necessarily a proper subset—a key can be, and sometimes is, the entire heading).

2.     Consider our example, Tutors, with its key {StudentId, CourseCode}.  The declaration of that key (to the DBMS) means that at all times, wherever the relation currently assigned to Tutors has two more more tuples having the same value for StudentId, those tuples must have different values for CourseCode.  Equivalently, wherever it has two more more tuples having the same value for CourseCode, those tuples must have different values for StudentId.  This reflects the real world rule that a student gets no more than one tutor for each course on which he or she is enrolled.

3.     Let us examine various other subsets of the heading of Tutors, to see if that relvar might have any other keys:

(a)     {StudentId}:  No. If that were a key for Tutors, then it would be impossible to record more than one tutor assignment for each student.  Effectively, nobody could be enrolled on more than one course (unless we permitted them to take a course without having a tutor!). 

(b)    {CourseCode}: No. If that were a key for Tutors, then it would be impossible to record more than one tutor assignment for each course.  Effectively, no more than one student enrolled on a particular course could have a tutor.

(c)    {StaffNumber}: No. If that were a key for Tutors, then no tutor's group on M359, for example, could have more than one student in it.  Moreover, no tutor could be a tutor for more than one course.

(d)    {StaffNumber, CourseCode}:  No.  That would allow a tutor to be a tutor on more than one course but still there could be no more than one student per group.

(e)    {StaffNumber, StudentId}:  No.  That would allow a tutor's group to have several students but the tutor would not be allowed to tutor more than one course.

(f)    {StaffNumber, StudentId, CourseCode} (the entire heading): No.  That would permit s student to have several tutors for the same course, contrary to the university's rules.

(g)    {}: The empty set!  It might come as a surprise to you to hear that the empty set can be a key (M359, like many texts on the subject, fails to recognise this, probably because Ted Codd himself, the original proponent of relational databases, failed to recognise it.)  You might be able to work out the logic for yourself: a relvar with the empty set as key can never contain more than one tuple.  So, although it's a logical respectable key, it's not a terribly useful one in practice.

4.     In general, then, if k (a set of attributes) is declared to be a key for relvar rv, then if two or more tuples in rv agree in value on all the attributes of k bar one, then they must not agree in value on that remaining attribute.  (A key consisting of a single attribute is a degenerate case of this rule: no two tuples can have the same value for that single attribute.)

5.     Also in general, a relvar can have several keys, but we choose just one for underlining and call that one the primary key.  The choice is arbitrary, so the concept of primary is not really very important from a logical point of view.  The general concept of key, however, is very important!  The term candidate key means exactly the same as key (i.e., the addition of candidate has no real significance—it was proposed by Ted Codd because he regarded each key as a candidate for being nominated as the primary key).

6.     SQL has a construct called the table, similar to the relvar in many respects.  (Regrettably, the term table is used also for what is assigned to it, which is something like a relation.) Where a relvar has attributes, a table has columns.  SQL allows a subset of a table's columns to be declared as a key for that table.  It also allows one of them to be nominated as the primary key.  Specifying a key to be primary makes for a certain amount of convenience in connection with other constraints that might be needed, as you learn when you study Block 3 of M359.

Back to home page.    Back to top of this page.