Constraints

Date last modified: 05 February, 2007

Many M359 students find constraints one of the most difficult subjects covered on the course.
Here I try to pinpoint the specific causes of difficulty and address each one separately.
I assume you have studied the Block 2 material on the subject.
I reference the University conceptual data model.

Here are what I perceive to be common causes of difficulty—decide which ones apply to you and in each case follow the link to see what I have to say about it:

not sure what is really meant by the term "constraint", in an E-R model
not sure what is really meant by the term "constraint", in a relational design
in an E-R diagram, not sure what a "blob" (as opposed to a circle) really means
don't understand how foreign keys relate to the E-R model
don't understand how alternate keys relate to the E-R model
not sure in which relation to "post" a foreign key
don't understand "relation for relationship"
don't understand how to represent blobs in the E-R diagram as constraints in the relational model
don't understand constraint ... is empty expressions at all
understand constraint ... is empty expressions, but have great difficulty writing the relational algebra expression that goes in between constraint and is empty

Constraints on an E-R model

A constraint in general is an expression of a rule that restricts what is permissible. An E-R model is a representation of an enterprise. A constraint on an E-R model expresses a rule representing some restriction on what is permissible in that enterprise.

In general, a constraint on an E-R model can be expressed by writing an entry in the section headed Additional Constraints. The entry should consist of a single, precise sentence, clearly referencing entity types and relationships shown in the diagram, such as:

A student who is enrolled on course c must have successfully taken all the prerequisite courses for c.
The tutor assigned to an enrolment must be in the same region as the enrolled student.
The submission date of ab assignment other than the first must be later than that of the immediately preceding assignment on the same course.

However, some constraints should not be expressed this way, because they can be expressed more simply, more clearly and more precisely by use of special notation in the diagram or the Entity Types section. Examples:

No two students can have the same student identifier. Instead, underline the StudentId attribute of entity type Student.
No two enrolments can be for the same student on the same course. Instead, underline the StudentId and CourseCode attributes of entity type Enrolment.
A single enrolment cannot involve more than one student. Instead, omit the crow's foot at the Student end of the EnrolledIn relationship line connecting Student and Enrolment.
A single enrolment cannot be on more than one course. Instead, omit the crow's foot at the Course end of the StudiedBy relationship line connecting Course and Enrolment.
Every enrolment must involve a student. Instead, put a blob at the Enrolment end of the EnrolledIn relationship line connecting Student and Enrolment.
Every enrolment must be on some course. Instead, put a blob at the Enrolment end of the StudiedBy relationship line connecting Course and Enrolment.
Every overseas student is a student. Instead, draw entity type boxes for OverseasStudent and DomesticStudent inside the entity type box for Student. (Always remember to draw at least two "subtype" boxes if you draw any at all. If you drew just the OverseasStudent box inside Student, you would be saying, pointlessly, that every student is an overseas student.)

What a blob means

A relationship line between two entity types in an E-R diagram means that the model admits that it is meaningful to talk about the possibility that an entity of one of those types is related to an entity of the other type in a certain specific way. A blob at an end of that line simply indicates that, in fact, every entity of the type at that end does participate in that relationship with at least one entity of the other type.

For example, consider the entity types Course and Staff, connected by the line labelled TelephoneTutoredBy. That line has a blob at the Course end and a circle at the Staff end. The blob at the Course end indicates that every course is telephone-tutored by some member of staff. The circle at the Staff end, on the other hand, means that it is permissible for some member of staff to exist that does not telephone-tutor on any course.

Constraints on a relational model

A constraint in general is a rule that restricts what is permissible. A relational model is a definition of a relational database in terms of its structure and the constraints on the permissible contents of that structure.

In general, a constraint on a relational design is expressed (in Block 2 notation) by including a constraint ... is empty expression in that design, where the gap between constraint and is empty must be filled by a relation expression. The meaning of a constraint ... is empty expression is that it is never permissible for the relation denoted by that relation expression to contain any tuples. Unfortunately, this means that we sometimes have to grapple with double negatives in order to get the required constraint ... is empty expression right.

Consider, for example, a rule to the effect that every assignment mark must be between 0 and 100, inclusive. To write that as a constraint ... is empty expression, we have to recast the rule as "no assignment mark is permitted to be not between 0 and 100". Luckily we have comparison operators for both "less than" and "greater than", so we can dispose of the double negative by observing that if every assignment mark must be greater than or equal to zero, then that is the same as saying that no exam mark can be less than zero; similarly, if every assignment mark must be less than or equal to 100, then no assignment mark can be greater than 100. So, if we can successfully apply the mental gymnastics I have just alluded to, we will arrive at this:

constraint (select Assignment where Mark < 0 or Mark > 100) is empty

However, thankfully, not all constraints have to be written using constraint ... is empty. In fact, even the example I have just shown doesn't have to be written that way. Instead, it can be written as a a constraint ... expression (note, without the is empty bit) inside the definition of relation Assignment, as follows:

constraint Mark >= 0 and Mark <= 100

You can read this shorthand as "every mark in the Assignment relation must be greater than or equal to zero and less than or equal to 100". The expression following the word constraint is a condition, exactly as can be written after the word where in a select expression. Note how, in the "double negation" recasting using is empty, >= and <= become < and >, respectively, and the word and becomes or. The inverse would apply too: < and > would change to >= and <=, and or would become and. Make sure you understand why these observations apply.

Now, you might ask, given the availability of the shorthand that avoids double negation, why do we ever need the general constraint ... is empty construct? The answer is that it is not always possible to express a constraint in terms of a simple condition that must be satisfied by every tuple of a particular relation that is defined in a relational model.

Consider, for example, a rule to the effect that no student can be both an overseas student and a domestic student, assuming that relations OverseasStudent and DomesticStudent are both defined in the relational design. Clearly we need a constraint that references both of these relations. Equally clearly, that constraint cannot be expressed as a simple condition on just OverseasStudent or just DomesticStudent. Assuming that each of these two relations has StudentId as its primary key, we need a constraint to the effect that no tuple must ever appear in the join of OverseasStudent and DomesticStudent:

constraint (OverseasSudent join DomesticStudent) is empty

Finally, for certain very special constraints we have the shorthands primary key, alternate key and foreign key. Most people seem to be able to understand these shorthands better than they understand constraint expressions, but nevertheless alternate key and foreign key are both pervasively troublesome to some extent.

Writing relational algebra expressions

All I can do here is to draw your attention to some important points that seem to be frequently overlooked:

join, select and intersection all mean "and", whereas union means "or" and difference means "and not".
The operands of union, intersection and difference must be union compatible, meaning that they must have the same heading.
The attributes referred to in the where condition of a select must be attributes of the relation being selected (i.e., the relation denoted by the expression written immediately after the word select). If you feel a need to reference attributes in another relation, you must use join.
The heading of the result of project consists only of the attributes listed after the word over. If you invoke select on a project, the where condition of the select cannot reference any attributes not listed in that over clause.
where is used only in select; over is used only in project and divide.
Beware of using divide. It is only rarely needed and is quite difficult to understand. Usually there is at most one question in a TMA or the exam whose solution involves the use of divide.

Foreign keys and how they relate to the E-R model

Consider writing a foreign key specification whenever you see a relationship that is either of:

many-to-one with mandatory participation (a blob) at the "many" end, or
one-to-one with mandatory participation at either end.

(You might also consider it in cases where the participation at the many end or at both ends is optional, but in those cases there is a better way of doing it, which I describe later.)

Always write the foreign key specification in the relation definition representing an entity type with mandatory participation in the relationship in question. If the relationship is many-to-one, this must also be the "many" end (i.e., the end at which the crow's foot is placed). Then the relation specified in the references clause is that representing the other entity type, which must be at a "one" end and usually has optional participation.

For example:

relation Enrolment
CourseCode: CourseCodes
StudentId: StudentIds
TutorNo: StaffNumbers
primary key CourseCode, StudentId
foreign key CourseCode references Course
foreign key StudentId references Student
foreign key TutorNo references Staff

Picture a diagram with four entity type boxes, labelled Enrolment, Student, Course and Staff. The Enrolment box has three lines coming out of it, connecting it to each of the other three boxes, labelled (say) IsOn, Involves, and IsTutorFor. At the Enrolment end of each line is a blob and a crow's foot; at the other end is a circle and no crow's foot. All of that can be deduced from the above relation definition.

To explain what a foreign key specification is shorthand for, I take the first of the three shown. It could alternatively be expressed thus:

constraint (project Enrolment over CourseCode) difference (project Course over CourseCode) is empty

Translated, this says that every course code appearing in Enrolment must also appear in Course. If there was some course code in Enrolment but not in Course, then it would appear in the result of the difference.

Note that each of these foreign keys expresses mandatory participation of the entity type corresponding to the relation in which they are placed.

What if mandatory participation is at the "wrong" end?

For example, suppose there is a rule to the effect that every student must be enrolled on some course. In other words, there is a blob at the Student end of the Involves relationship line as well as at the Enrolment end. In that case we cannot enforce the mandatory participation of Student by placing a foreign key declaration in either the Student relation or the Enrolment relation. Fortunately, that longhand I showed you, for which foreign key is shorthand, does not depend in any way on the degree of the relationship it represents. Therefore it can be used in cases like the one at hand, which would be written thus:

constraint (project Student over StudentId) difference (project Enrolment over StudentId) is empty

Translated, this says that every StudentId appearing in Student must also appear in Enrolment. If there was some StudentId in Student but not in Enrolment, then it would appear in the result of the difference.

Representing relationships when foreign key placement is insufficient

Consider:

foreign key TutorNo references Staff

as declared in the Enrolment relation. It means that every Enrolment tuple has a TutorNo value that matches the StaffNo value in some Staff tuple. Thus, it implements an assumed mandatory participation (of Enrolment in IsTutorFor).

In practice it might be that Enrolment's participation in IsTutorFor is optional, not mandatory, to cater for an enrolment that has been registered but for which no tutor has yet been appointed. In that case we cannot represent the relationship simply by placing a foreign key in Enrolment. Instead, we must decompose Enrolment into two relations, following the "relation for relationship" principle described in Block 2, like this:

relation Enrolment
CourseCode: CourseCodes
StudentId: StudentIds
primary key CourseCode, StudentId
foreign key CourseCode references Course
foreign key StudentId references Student

relation TutorOn
CourseCode: CourseCodes
StudentId: StudentIds
TutorNo: StaffNumbers
primary key CourseCode, StudentId
foreign key CourseCode, StudentId references Enrolment
foreign key TutorNo references Staff not allowed null

Alternate keys and how they relate to the E-R model

An alternate key of a relation is just a key that is not declared as the primary key of that relation. Consider the relation Enrolment(StudentId, CourseCode, TutorNo), with primary key (StudentId, CourseCode). The primary key specification is shorthand for the following constraint ... is empty expression:

constraint (select (Enrolment join (Enrolment rename (TutorNo as TN))) where not (TutorNo = TN)) is empty

This is saying that whenever two tuples in Enrolment have the same student identifier and the same course code, they must also have the same tutor number—in other words, they must be the same tuple (in other words, no two distinct tuples in Enrolment can ever have the same student identifier and the same course code).

Now, suppose there is a rule to the effect that no tutor is permitted to be a tutor on more than one course. We could express that rule as follows:

constraint (select ( Enrolment join ( Enrolment rename (CourseCode as CC))) where not (CourseCode = CC)) is empty

Do you notice the similarity between that and the longhand expression for the primary key? In fact this is saying that whenever two tuples in Enrolment have the same student identifier and the same tutor number, they must also have the same course code; in other words, they must be the same tuple; in other words, no two distinct tuples in Enrolment can ever have the same student identifier and the same tutor number. The similarity between the two longhands is such that they can both be expressed by almost the same shorthand. They can't both be expressed using primary key because by definition a relation has no more than one primary key. Here is the shorthand for this key that is not the primary key:

alternate key StudentId, TutorNo

How might this constraint relate to the E-R model? Well, in this particular case the corresponding constraint in the E-R model is one for which there is no shorthand that can be used in the diagram or the Entity Types section. Therefore, it would have to relate to a sentence in the Additional Constraints section, such as "A staff member cannot be a tutor on more than one course". But suppose the rule is even stricter than that, saying not only that nobody can tutor on more than one course but also that every course has exactly one tutor. In that case there is a 1:1 relationship between entity types Staff and Course, with mandatory participation for Course and optional for Staff. That relationship can be completely expressed in the relational design as follows:

relation Staff
StaffNo: StaffNumbers
...
primary key StaffNo

relation Course
CourseCode: CourseCodes
...
TutorNo: StaffNumbers
primary key CourseCode
alternate key TutorNo
foreign key TutorNo references Staff

The alternate key specification says that no two distinct tuples in Course can have the same tutor number, just as the primary key specification says that no two distinct tuples can have the same course code. That the Staff end of the Tutors relationship line lacks a crow's foot is expressed by the primary key specification for Staff, implying that no Course tuple can be related (via TutorNo) to more than one Staff tuple. That the Course end of the same line lacks a crow's foot is expressed by the alternate key specification for Course, implying that no two distinct Course tuples can be related (via TutorNo) to any given Staff tuple.

Note that the choice as to which of CourseCode and TutorNo is the primary key is arbitrary, from a strictly logical point of view. We choose CourseCode for purely psychological reasons (which I hope you agree are obvious).