Response to A critical reading of the Third Manifesto by Maurice Gittens

Hugh Darwen <hd@thethirdmanifesto.com>

1st October 2004

For further information relating to this issue, visit Maurice Gittens's web site.

Maurice Gittens's article A critical reading of the Third Manifesto appeared in the September, 2004 issue of Database Magazine.  The article criticises certain aspects of the book Foundation for Future Database Systems: The Third Manifesto by C.J. Date and Hugh Darwen (2nd edition, Addison-Wesley, 2000).  At the invitation of the editor Hugh Darwen has written this response, which has been carefully reviewed by Chris Date.

The response might appear to be rather pernickety in places.  We believe that in that respect it matches the vein of some parts of the article we are responding to.  We take no offence from pernicketiness and we hope none is given by this response.

This is a general response, attempting to cover the most important points without giving a blow-by-blow commentary.  The response's structure is only very loosely keyed to that of the article.  Appendix A gives a blow-by-blow commentary in the form of a copy of the article with embedded annotations by Hugh Darwen.

Overall Assessment

The article does not live up to its title.  A genuine "critical reading" of The Third Manifesto would mention each of its numbered Prescriptions, Proscriptions and Very Strong Suggestions, or at least those that the author takes issue with.  The article does not specifically mention any of them.  It makes an incorrect assumption in what appears to be an argument to the effect that relation variables are types.  In what appears to be an argument against our rejection of pointers, it defines the term "identity" to refer to a concept we believe we fully embrace and claims that we reject identity.  To justify its possible rejection of our rejection of something, it gives some hints about a proposed operator without showing in what way The Third Manifesto does not allow that operator to be supported.  It claims a certain equivalence between tuples and relations that is based on at least one demonstrably incorrect assumption.  In view of these findings, we obviously reject all of its conclusions.  Specific criticisms follow.

Introduction

The introductory section refers to our stated maxim, All Logical Differences Are Big Differences.  It attempts to claim that some of our work is inconsistent with this maxim and its corollary, All Logical Mistakes Are Big Mistakes.  We recognize that stating our maxim up front exposes us to ridicule by anybody who finds logical inconsistency in our work.  By the same token anybody who tries to throw our maxim back at us in this manner is similarly exposed.  Gittens has taken that risk.  We leave it for others to judge on whose face the egg is.

By the way, our corollary (originally proposed by Darwen) has been claimed to be a logical mistake!  It was unkindly pointed out to us that a small basketball player isn't necessarily a small person.  In other words, smallness and bigness are "overloaded" for different types and the "big" in "big difference" is not necessarily the same "big" as the one in "big mistake".  (This kind of overloading, by the way, is not supported by The Third Manifesto's proposed model of type inheritance.)

Logical Consistency

Gittens appears to think we are guilty of logical inconsistency in The Third Manifesto.  According to our understanding, a theory T is logically inconsistent if the set of propositions that can be concluded to be true underT includes a proposition p such that ¬p is also a member of that set; otherwise T is consistent.  It is not clear to us which statements of ours are claimed by Gittens to lead to inconsistency.

The First Great Blunder

Gittens appears to be disputing our claim that it is a blunder to equate relation variables (relvars) and object classes.  (Our justification for this claim is that a type is neither a relation nor a variable.)  In what we take to be an attempt to bolster his argument, Gittens makes a correct statement about relation types, where we are expecting him to say something about relation variables.  But of course a relation type is a type!  We cannot conclude from that fact that a value of that type is a type; it follows a fortiori that we cannot conclude that a variable of that type is a type.  But in any case, Gittens does not explain why he thinks The Third Manifesto is weakened by its rejection of "the wrong equation", if indeed he does think that.  (And if he doesn't, why bother to quibble with our argument?)

The Second Great Blunder

Gittens claims that we "reject identity" but offers a definition of that term that appears to make it refer, not to the pointers that are the subject of The Second Great Blunder, but to a concept we fully and earnestly embrace!  But in any case, Gittens's explanation of why he thinks The Third Manifesto is weakened by its rejection of pointers concerns a certain operator (called foreach) that he would like to see in his ideal D but believes would be prohibited under the terms of The Third Manifesto.  He does not tell us exactly which Prescriptions or Proscriptions he thinks militate against inclusion of that operator.  From the little we can see at the moment, we have no reason to suppose that something like foreach could not be supported if desired.  Indeed, it is surely made reasonably clear on pages 200-201 that we do not wish to preclude such operators.

Identity

Gittens writes, "Identity is a fundamental property of all things by which they can be counted.  If the elements of mathematical sets did not have identity they would not be countable."  We wouldn't argue with this, but we note that in most textbooks on logic, the term identity refers to a predicate.  This does not necessarily conflict with Gittens's view.  The following extract from Wilfrid Hodges's eminently approachable Logic, published by Penguin Books Ltd. in 1977, might illuminate:

One particularly important predicate is the 2-place predicate

x1 is one and the same thing as x2

This predicate is called identity; in symbols it's written 'x1 = x2', and the symbol '=' is read 'equals'.  A sentence got by putting designators in place of 'x1' and 'x2' in [the identity predicate] is called an equation.

Various English phrases can be paraphrased by means of identity.  For example:

Everest is the highest mountain in the world.
Everest = the highest mountain in the world

Cassius Clay and Muhammed Ali are the same person.
Cassius Clay = Muhammed Ali.

This is none other than the lost city.
This = the lost city.

Two plus two equals four.
Two plus two = four.

The word identical is normally used in English to express close similarity rather than identity.  For example, identical twins are not one and the same twin and two women wearing identical dresses are not wearing one and the same dress.

The Third Manifesto is in complete accord with Hodges here.  Note that Hodges does not refer to counting, though we would have to agree that it is not possible to count things we cannot distinguish from each other.  (By the way, the elements of the "mathematical set" of real numbers cannot be counted.  Perhaps this is because in Gittens's view they do not all have identity?)

More importantly, note Hodges's use of the term designator for terms that can be substituted for the free variables in a predicate to yield a proposition.  His examples of instantiations of the identity predicate show how different designators can refer to the same thing.  Does Gittens's "identity" refer to some method of referring to something other than by using a designator?  We reject any such notion: Whereof we cannot speak, thereof we must remain silent!  Or does Gittens think that for every object in the domain of discourse there must be some special designator for that object, which he calls its identity?  We reject that notion, too, in general.  The same number is designated by 2 in decimal notation and 10 in binary; we have no reason to pick out either of those, or any other possible designator of the second counting number, as being special.

In Codd's model, attribute values are to be interpreted as designators.  Furthermore, it was essential to his model that they represent what we have seen referred to as rigid designators.  An occurrence of a designator is rigid if it refers to the same thing in all possible situations.  Terms like "two", "Everest", and "Cassius Clay" are normally rigid in the sense that "the highest mountain in the world", "the heavyweight champion of the world", and "the president of the USA" are not normally rigid (they all potentially refer to different things at different times or in different contexts).  Rigidity depends on the use:  "The president of the USA" is nonrigid in "Hugh Darwen is the president of the USA" but rigid in "The president of the USA is the person most recently elected to that post".  (The truth or falsehood of that proposition is, of course, irrelevant here.)

Variable names, pointers, and object identifiers (in the O-O sense of that term) are all nonrigid designators, precisely because they designate variables.  Contrary to Gittens's often-repeated claim, we do not reject the logical concept of identity—on the contrary, we wholeheartedly embrace it.  But we firmly reject the use of nonrigid designators in relations.

Gittens correctly mentions a secondary reason we have found for rejecting object identifiers:

[They] describe a problem with object identity and their inheritance model. It is a fallacy to assume that this problem would exist with some other inheritance model.

But we do not make that assumption.  We refer to the work of Zdonik and Maier in which they present four "desiderata" (for a type system), namely: substitutability; static type checking; mutability; and "specialisation via constraint".  (We put that last desideratum in quotes because the authors appear to mean simply "type constraints", but that is not important—specialisation by constraint as we mean it implies the existence of type constraints.)  Zdonik and Maier conjecture that it is possible to support any three of these desiderata but not all four together.  In Part IV of the book we believe we refute this conjecture by defining a model of type inheritance in which all four desiderata are supported.  We go on to explain, in Appendix G, that it appears that Zdonik and Maier are tacitly assuming that a fifth desideratum (for object identifiers) is always supported.  We find that support for object identifiers cannot coexist with specialisation by constraint.  We observe that object identifiers had already been rejected on our behalf, so to speak, by E.F. Codd himself.  Therefore there is no reason for The Third Manifesto to reject specialisation by constraint, and it doesn't (nor does it require it, by the way, though it can only be omitted if type inheritance is omitted altogether).

Types, Domains and Object Classes

With reference to Gittens's Section 3.2, Introduction to predicate logical models, we observe that the domain of discourse of a database, under the interpretation intended by The Third Manifesto, is the set consisting of every value that could legally appear as an attribute value of some tuple of some relation derivable from that database by evaluation of an expression in whatever D is being used to access it.  That domain of discourse is partitioned into subsets called types.  (Codd called them domains, and thus generated a certain amount of confusion, which is why we no longer use that term for the concept in question.)

Given that attribute values represent designators whereas tuples represent propositions, and given that logicians clearly use the terms designator and proposition to refer to importantly different concepts, we think we are justified in claiming that there is a logical difference between the concept of an attribute value and the concept of a tuple (even though some attribute values happen to be tuples).  Now, the set of permissible values for a given attribute is called a type, whereas the body of a relation is a set of tuples.  To equate type and relation, therefore, is to equate two logically different concepts.  To equate two logically different concepts has to be a logical mistake.  Moreover, the logical mistake in question is compounded by the fact that it is not actually types and relations that are being equated, but types and relation variables.  So we have two logical mistakes here.  We feel fully justified in calling that a blunder, and a great one at that.  We are not moved by Gittens to retreat from that position.

Predicate Constants

Gittens expresses a desire for predicate constants to be able to appear as values.  In other words, he wants D to support a type, or perhaps a type generator, whose values can be operated on by whatever operators he would like to be available on predicate constants.  He does not tell us what those operators are, so we cannot tell if such a type or type generator would be in contravention of The Third Manifesto.  In his examples, "Mark loves to love" and "Jane loves to miss", it seems clear that the noun phrases (hence designators) "to love" and "to miss" do not refer to the propositional value of the predicates x loves y and x misses y.  They are rigid designators.

Regarding operators on predicate constants, what operators does Gittens expect to be available on, for example, the predicate constant of the triadic predicate a + b = c?  In any case, what exactly is that predicate constant?  Surely not what remains when we strike out the variables, for that would yield "+ =", which would also be the predicate constant of the dyadic predicate a + b = a.  (According to the definition given in http://en.wikipedia.org/wiki/First-order_predicate_calculus), a + b = c includes an appearance of the predicate constant = and an appearance of the function constant +.  It is not clear from this that every predicate has exactly one predicate constant.  For example, a = b Ù b = c has two appearances of =.)  [Added 22nd November, 2004: further thoughts on predicate constants are given in Appendix A.]

The Proposed foreach operator

The Third Manifesto demands the existence in D of certain operators.  For example, RM Prescription 8 mandates support for equals and RM Prescription 18 mandates support for "the usual operators of the relational algebra".  The RM Proscriptions mention certain kinds of operator that D is expressly forbidden to support, but we cannot find any that would clearly militate against inclusion of foreach.  Whether foreach is really a good idea or not, we cannot judge on the evidence available.  We remark that it seems peculiar to have an operator that sometimes returns a value when it is invoked and sometimes does not, and whose operands are, even in the cases where a value is returned, required to be variable references in particular and not expressions of arbitrary complexity in general.  Thus, some of the varieties of foreach might be in contravention of The Third Manifesto's definition of read-only operator.  But that is not a reason given by Gittens for foreach being in contravention of The Third Manifesto.

The Expressive Equivalence of Relation Values and Tuple Values

The relevance of Gittens's Section 4 on this subject is not clear to us, nor is the importance, if any, that he attaches to its conclusions.  But we reject it anyway because it appears to contain a logical mistake.  In Section 4.4, his proposed ordered triple representation of the body of a relation appears to allow two or more tuples in the same body to have the same value.

A Footnote on "A Codd inspired amendment ...".

[This footnote does not appear in the response published in Database Magazine]

In "A Codd inspired amendment to my critical reading of The Third Manifesto", Gittens claims that a certain amount of support for his position is expressed in E.F. Codd's "Extending The Relational Model to Capture More Meaning" (1979).  I have never expressed any support for the referenced work.  Indeed, I recoiled from it at the time, precisely because I had always thought, as I still do, that the strength of the Relational Model, like that of logic, lies partly in its disregard for meaning.  Much as we admire Codd's original work, Date and I have found ourselves in disagreement with him on a number of issues that he subsequently addressed.   

Regarding the First Great Blunder, it is not clear to me that Codd can be interpreted as having expressed support for the equation relvar = class.  In any case I continue to think that equation to be a grave error.

Regarding the Second Great Blunder, I do think that Codd might have overemphasised the importance of surrogate keys, but I do not accept that surrogate key values are object identifiers in a different guise.  Any distinction between surrogate keys and nonsurrogate keys is not a logical difference.  There is a logical difference between key values and object identifiers.  Apart from anything else, an object identifier in general identifies a variable; a key value certainly does no such thing, and Codd would certainly never have proposed or condoned the possible existence of variables other than relvars in a relational database.

Appendix A: imbedded commentary

HD: My comments are imbedded in this style.  Any that look like questions can be regarded as rhetorical. The text in which they are imbedded was copied from this PDF file, sent to me by the editor of Database magazine.  With the author's permission, I have corrected two or three awkward typographical errors in that draft.

A critical reading of the Third Manifesto

HD: This title is misleading.  The Third Manifesto per se (Chapter 3 of the book) consists of six sections containing in all 58 numbered points.  These 58 points form the basis of the dissertation.  One would expect a critical reading to refer explicitly to some or all of these 58 points.  This paper refers to none of them, but only to some of the book's introductory material.

Maurice Gittens <maurice at gittens dot nl>

14th July 2003

Abstract

According to the authors, Hugh Darwen and C.J. Date, of the book entitled "Foundation for Future Database Systems: The Third Manifesto" the maxim: All logical differences are big differences and its corollary All logical mistakes are big mistakes has been central to their work on this book. Respecting the standard set by this maxim and its corollary, this paper will proceed to identify a number of issues with the logical consistency of the dissertation presented in The Third Manifesto, using maxims such as: logical conclusions should only be drawn from premises which are both valid and relevant.

HD: It is not exactly incorrect to place the maxim at the centre of the work, but the maxim is only that: a maxim.  To suggest that the work has a mere maxim at its core without mentioning its solid technical basis (The Relational Model of Data) might give the impression that the work is somewhat frivolous in nature.  We could delete the maxim and all references to it without altering the substance of the book.

The copyright of this document belongs to its author. Making complete and unmodified copies of this document is allowed.

Status: draft

Revision history

·              July 14 2003; Fix typo in the title of the document

·              April 7 2003; More cleanups

·              February 26 2003; Based on comments by Hugh Darwen I reworded a few sentences which seemed to cause confusion; I also fixed a few typographical errors

·              January 8 2003; Rene Jansen made me aware of another reason for the dismissal of ObjectIDs provided by The Third Manifesto. Add this to the section about the alleged second great blunder. Thanks Rene.

·              January 6 2003; A first draft of this document

Contents

1 Introduction 2

1.1 Background information . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 On a personal note . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 About the claims made by the author 3

2.1 Regarding the _rst great blunder . . . . . . . . . . . . . . . . . . . . 3

2.2 Regarding the second great blunder . . . . . . . . . . . . . . . . . . . 4

3 About Predicates, Relations and their identity 6

3.1 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Introduction to predicate logical models . . . . . . . . . . . . . . . . 7

3.3 Why is identity deemed a necessity? . . . . . . . . . . . . . . . . . . 8

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 On the expressive equivalence of relation values and tuple values 10

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2 Defining tuple values . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.3 Defining relation values . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.4 Showing that all relation values are tuple values . . . . . . . . . . . . 11

4.5 Showing that all tuple values are relation values . . . . . . . . . . . . 12

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Conclusions 12

1 Introduction

1.1 Background information

A web page at http://www.gittens.nl/OOR.html raised a number of issues the author found with the logical consistency of the dissertation presented in the second edition of the book "Foundation for Future Database Systems"1[1] by C.J. Date and Hugh Darwen.  In a personal communication Mr. Hugh Darwen, requested I clarify my use of certain English words and also that I be more specific as to the issues I found with the dissertation presented in The Third Manifesto. This paper is written as an attempt to comply with the request of Mr. Darwen.

The main issue and my primary claim

The main issue I perceive, with the logical consistency of the dissertation presented in the The Manifesto, follows from the maxim Date and Darwen presented as central to their work in The Third Manifesto. Date and Darwen presented the maxim: All logical difference are big differences and its corollary All logical mistakes are big mistakes as a guiding principle in their work on the Third Manifesto. The Third Manifesto proceeded to identify what it refers to as the Two Great Blunders:

HD: If the dissertation suffers from logical inconsistency, then it must be possible to discover some proposition p such that both p and ¬p can be concluded from the dissertation.  Gittens does not actually show such a proposition, so I would argue that the claim that the dissertation is logically inconsistent is not justified by this paper.

·              Equating relvars and classes

·              Mixing pointers and relations (or more specifically allowing database relvars to contain object IDs)

However, in my humble opinion, the argumentation used to substantiate the claim that the alleged blunders are truly to be viewed as blunders is somewhat weak.[2]  This opinion is based on the following maxim: logical conclusions should only be drawn from premises which are both logically valid[3] and relevant. HD: We would clarify "valid" as "agreed to be true".  We don't see the relevance of "relevant" here: surely one can draw a valid conclusion from an irrelevant premise? Put another way, keeping in mind the maxim, All logical differences are big differences, logically valid conclusions are conclusions based on premises free of fallacies, including fallacies of relevance.[4]  HD: Why fallacies?  Why not just falsehoods?

I think it important to state explicitely, that my claim in this regard, is not that Date and Darwen are wrong in their opinions. My claim, in this regard, is that the substantiation they provide as justification for their statement that the alleged great blunders are indeed great blunders is rather weak, relative to the high standard they claim to be central to their work.

HD: There's a big difference—a logical one, indeed!—between logical inconsistency and "rather weak" substantiation.  Regardless of which of those two Gittens really means, it is not clear, yet, whether he

·         agrees that the Great Blunders are indeed blunders but seeks a stronger justification for the appellation; or

·         agrees that the characteristics referred to by the Great Blunders are indeed undesirable, but thinks that inclusion of those characteristics is not so undesirable as to merit the term "blunder"; or

·         disagrees that the characteristics referred to are undesirable (though perhaps this possibility is supposed to be ruled out by his assurance that he does claim we "are wrong in [our] opinions").

It seems likely that the second reading is the intended one, in which case we question whether the paper really is raising a big issue with our work.  It seems more like a quibble.

Secondary issues and my secondary claims

In this regard the question is asked whether or not the alleged two great blunders are indeed blunders. This subject matter will be addressed in subsequent sections of this paper.

1.2 On a personal note

I think it appropriate to state my appreciation for the fact that Mr. Hugh Darwen has thought it appropriate to spend time communicating with me about the issues I raised.

2 About the claims made by the author

2.1 Regarding the first great blunder

The first alleged great blunder identified in The Third Manifesto follows:

Equating relvars and classes

Now please consider the question: What arguments that adhere to the strict discipline of logic does The Third Manifesto provide for the claim that this equation is indeed a blunder? In considering an answer to this question it is noted that this claim[5] is first made on page 15 of second edition of the Third Manifesto, which by its own admission (on page 14) is informal in nature. Still lacking a mathematically sound definition of what an object class is, page 21 decides that object classes and domains are, I quote: "the same thing".  HD: Agreed that this claim by us is weakened by the lack of a rigorous (never mind "mathematically sound") definition of object class.  So we are merely arguing that they are the same thing, rather than rigorously proving it.  It is not clear whether Gittens would argue differently.  Since the statement is made in an informal context, one wonders if classes and domains are informally the same thing or also formally[6] the same thing. The reason given to substantiate the claim that object classes and domains are determined to be the same thing is presented as the fact that for both, domains and object classes, it holds that their values are manipulated by operators defined for the type in question. However, the same argument can be made for relations. Is it not the case that relations values are manipulated by a set of operators defined specifically for their types. HD: Yes. Actually, each of the operators of the relational algebra is defined for all relation types.  Yes, relation types have a set of pre-defined operators, does this make them logically different from a specific class of domains which have a pre designated set of operators? No, it does not! HD: We definitely agree that relation types are types.  We make the point very strongly in the book.  OK, since this argumentation is said to be informal, the author proceeded to seek out, the formal arguments presented, for labeling the first great blunder as such. Apart from reiterations of the alleged first great blunder, no such argument has currently been found by the author.[7] Additionally, Date and Darwen, seem to assume[8] that there is one so-called right way, in which objects and relations should be integrated. Does not the discipline of logic dictate that one must prove, that there exists only one right way, before one could even claim to provide the one right way? Is it not possible that there are different ways, each with its own merit, to achieve the integration between objects and relations?

HD: Gittens has not even questioned the correctness of The First Great Blunder here.  He agrees with us that a relation type is a type, but The First Great Blunder is to regard a relation variable as a type.  Does Gittens wonder if it's reasonable to regard a variable as a type?  A relation as a type?  We do not think either is reasonable or desirable, for reasons stated in the book.

2.2 Regarding the second great blunder

The Third Manifesto identifies the second great blunder as:

Mixing pointers and relations (or more specifically allowing database relvars to contain object IDs).

Using the index of The Third Manifesto I have found the following reasons why object IDs or references[9] are unwanted by Date and Darwen.

1.             Codd's information principle: All information in the database at any time must be cast explicitly in terms of values in relations and in no other way or All interrelating between different parts of a database must be achieved by comparison of value.

2.             The reason Codd removed pointers from the relational model is stated as: It is safe to assume that all kinds of users [including end users in particular] understand the act of comparing values, but that relatively few understand the complexities of pointers [including the complexities of referencing and dereferencing in particular]. The relational model is based on this fundamental principle... [The] manipulation of pointers is more bug-prone than is the act of comparing values, even if the user happens to understand the complexities of pointers.

3.             On page 417 of the second edition, the paragraph entitled : "OBJECT IDS UNDERMINE INHERITANCE".

Concerning the first two points I ask the question: Of what logical value are these arguments?[10] The fact that Codd rejects pointers on the grounds that they are "difficult to understand" and "bug-prone" is of no logical value, and as such in the context of providing a justification for the so-called second great blunder, these arguments represent a fallacy of relevance. This is not to say that the statement is not true by some measure. It is only to say that such statements do not provide logically valid grounds for the dismissal of object-IDs.  HD: We wouldn't argue with Gittens on this point.  Do we claim anywhere in the book that we have a rigorous proof of the fact that The Second Great Blunder is a blunder?  Anyway, perhaps we should have added that according to our understanding the mathematical theory of relations requires the designators represented by attribute values to be purely referential (Hodges).  Pointers certainly are not purely referential.  Concerning the third point, the following issues:

·         First, a fallacy is exposed by this quote from the page referenced: "Pointers can lead to a serious problem if type inheritance is also supported". This provides the in-site that Date and Darwen confuse object identity and pointers. HD: Actually, we claim that an object identifier (as defined in typical OO programming languages) has to all intents and purposes the same behaviour as a pointer.  It is not clear to us whether Gittens's concept of object identity is this same OO programming language concept.

·         Second, they proceed to describe a problem with object identity and their inheritance model. It is a fallacy to assume that this problem would exist with some other inheritance model. HD: We make no such assumption.  We refer to the work of Zdonik and Maier in which they present four "desiderata" (for a type system), namely, substitutability, static type checking, mutability, and specialisation by constraint.  They conjecture that it is possible to support any three of these desiderata but not all four.  In Part IV of the book we believe we refute this conjecture by defining a model of type inheritance in which all four desiderata are supported.  We go on to explain, in Appendix G, that it appears that Zdonik and Maier are tacitly assuming that a fifth desideratum (for object identifiers) is always supported.  We find that support for object identifiers cannot coexist with specialisation by constraint.  We observe that object identifiers had already been rejected for us, so to speak, by E.F. Codd himself.  Therefore there is no reason for The Third Manifesto to reject specialisation by constraint, and it doesn't (nor does it require it, by the way, unless some kind of type inheritance is supported).

These two points identify the third reason supplied for the dismissal of ObjectIDs as a fallacy.  HD: We do not understand the point being made by this sentence.

More on identity

Please consider the position of Hugh Darwen on identity as it was presented in a personal communication [ref 3].

We do not recognize any concept of identity of a value v other than v itself. A truth-valued expression of the form x = y is true if and only if the values denoted by the expressions x and y are identical are in fact one and the same value. Given equality, we do not need any other concept to do with distinction of values. In case the distinction you are referring to is the one found in some OO programming languages, I remark that in such languages equality is as in our definition (though "=" is sometimes sacrificed, with unpleasant consequences, in favor of an operator with the same name but meaning "approximately equal to"), whereas what you call identity is equality of pointers (usually called object identifiers), and a pointer points to a variable, not a value. As you note in your very next section, we do not admit pointers.

Identity is a fundamental property of all things by which they can be counted.  If elements of mathematical sets did not have identity they would be not be countable.[11]  To put this another way, Identity can be viewed as a property of an element of a set.  HD: Is this relevant to the real issue at hand (The Second Great Blunder)?  Anyway, we remark that the set of real numbers is not countable.  Fortunately, The Third Manifesto does not deal with such sets; as far as the sets it does deal with are concerned, we agree with Gittens.   Equality on the other hand, is a correspondence between two or more elements of a set. For example, if v1 ; v2; :::; vn represents a set of relation variables. Each variable in this set has a distinct identity, otherwise it would not be possible to distinguish it from other variables in this example set. The identity of these variables is orthogonal to the issue of whether or not some of these relation variables are equal or not. HD: We agree with that, too.  We require the existence of a comparison operator (called "=") to determine whether two expressions, including in particular two relvar references, denote the same value.  We do not require the existence of an operator for the specific purpose of determining if two expressions denote the same variable.  This would require the existence of a type each of whose values is a possible variable name.  We do not require the existence of such a type and in fact we explicitly prohibit it if the operators for that type were to include what is commonly called "dereferencing".  And for the sake of completeness I wish to state that it should be evident that the fundamental concept of identity has nothing to do with the concept of pointers as they are know in different programming languages.[12]  HD: Yes, that is evident.

3 About Predicates, Relations and their identity

This section presents what is in the opinion of the author a logically sound motivation for the support of identity in future databases. To comply with Hugh Darwen's request for specific examples I start with some examples.

3.1 Some examples

Consider some example functionality. Let rv1 ; rv2; :::; rvk be relation variables of the same relation type. Let v1 ; v2; :::; vk be the relation values of rv1 ; rv2; :::; rvk. I would like to be able to ask the equivalent of following questions in the query language of the database: Which set of relation variables has the value v2. Or which relation variable has the greatest number of tuples with a particular property? The database system would in turn respond with a properly typed set of identities corresponding with the result of the query. The catalog of common SQL-databases might be used for such purposes however, in current relational database systems the types of the objects returned would, as you know, be incorrect. This forces people working on business-repositories, data-mining applications etc, to build much logic into their applications which, in my opinion, should be gracefully handled by databases of the future. If the result of a query can be an entity representing a relation variable, or a type, or a tuple variable etc, the logical expressiveness[13] of the database is increased. If the information in the catalog of relational databases were properly typed much of the necessary machinery would be present in database systems.  HD: We do not deny that these are interesting requirements, but nor are we aware of having written anything in The Third Manifesto to militate against their fulfilment.  Perhaps Gittens believes that OO Proscription 2 is the obstacle in question.  In that case, we would draw his attention to the discussion of that Proscription in Chapter 9, on pages 198-201.

A new operator

HD: It would be helpful if Gittens could show exactly which Prescription or Proscription of The Third Manifesto militates against the existence in D of the proposed operator, and why.  From the evidence given here, we have no reason to suppose that the operator is prohibited.

An operator which in my opinion is necessary, in one form or the other, in future database is the foreach operator. This would be the database counter part of the universal quantifier operator known from predicate logic.  [14]Hopefully self-explanatory, informal 15 examples, using this operator in an SQL like language follow:[15]

Example statement                                                                                                  Description

foreach relation r select * from r;                                                                          select all tuples in the default schema

foreach relation r in schema example_schema delete from r;                            remove all tuples from relation variables in a schema

foreach relation r in schema example_schema delete r;                                     drop all relations from a schema

foreach schema s foreach relation r in s select * from r                                     select all tuples in the database

foreach relation r select * from attributes(r)                                                       select the attributes of all relations

foreach relation r where r.someProperty() == true                                              select * from r select all attributes of relations with some property

It is important to note that the type of r in a statement like: foreach relation r .... is a relation type. HD: We note that the term relation appears to stand for relation variable in each of these examples.  It seems that the type of r is actually several relation types, on account of the fact that several relvars are typically not of the same type.  Also, it is not clear that what is returned, in the first example at least, is a relation.  (If it is claimed that it is a relation, then what is its heading?)  We cannot comment further on these suggestions without seeing them fleshed out.  The logical variable r is said to be bound to a predicate constant, representing the identity but not the propositional value of the predicate[16]. 

Please note that supporting the foreach operator, SQL statements like ALTER and DROP statements may be replaced by appropriate uses of UPDATE and DELETE statements. Thus showing, these and similar, statements to be redundant.  HD: It is not clear why "delete r" should be interpreted as "drop r", nor why DROP is made redundant.  It seems that DROP has merely undergone a change in spelling, to DELETE (without the FROM).  The Third Manifesto has a Prescription (RM Prescription 25) to support DROP and ALTER via DELETE and UPDATE on catalog tables.

3.2 Introduction to predicate logical models

A model M for a first order predicate logical language L is a pair (D; I) such that :

·         D represents the domain of discourse of the model M. This is the set of objects which can be bound to variables in L. In relational systems, objects in the domain of discourse may be viewed as domain values. In relational systems, domains partition the domain of discourse into a set of of disjoint subsets. Such that the union of the set of all domain values in a RM database is exactly equal to the set of objects in D.  HD: Agreed.

·         I represents the interpretation function of the model M. Since I is a mathematical function it by definition has a domain and a co-domain, denoted dom(I) and codom(I) respectively.  HD: Agreed, though reference to the concept as a function is a new idea for me.

In first order predicate logic each object d in the domain of discourse D has an associated constant c in the predicate logical language L which represents it in the language L.  Using the interpretation function I, each predicate P of arity n in the predicate logical language L assigns the property represented by P to a set of n tuples HD: n-tuples?  {t1 ; :::; tk} where each ti (1 £ i £ k) can be written as ti = (d1; :::; dn) where each dj is an object in the domain of discourse D. As an example let us consider a model M for a predicate logical language L with constants {a; b; c; Mark; Jane} and predicates {odd; love; miss; rich}.  In this example the domain of discourse D of M is D = {1; 2; 3; "Mark"; "Jane"}, while an example interpretation function I for M is presented in the following table.

dom(I)            codom(I)

a                      1

b                      2

c                      3

Mark              ".Mark"

Jane               "Jane"

Love               {("Jane"; "Jane"); ("Jane"; "Mark")}

Miss                {("Mark"; "Jane")}

Rich                {"Jane"; "Mark"}

Odd                {1; 3}

This example represents statements like:

·         Jane loves both herself and Mark

·         Mark misses Jane

·         Jane and Mark are both rich

HD: And, crucially, what statements are represented by the first five lines in the table?

The identity of the predicate love captured by the predicate constant love which, in the example above, appears in domain of the interpretation function I. The propositional value or the value of the predicate love in this example, is the set of tuples {("Jane"; "Jane"); ("Jane"; "Mark")}. When the Third Manifesto speaks of the relation value it is referring to the propositional value of a predicate. HD Correct. In this example love and miss are binary predicates, so the interpretation function I maps them to sets of binary tuples. The interpretation function I maps the constants in the language L to elements of D and unary predicates are mapped to subsets of D. The information contained in the interpretation function of a predicate logical model can be viewed as the predicate logical equivalent of a database. Relational algebra can thus be viewed as an algebra defining operations on a subset of the co-domain of interpretation functions of predicate logical models, more specifically relational algebra defines a number of operations on the propositional value of predicates.  HD: We would not dispute any of this, even if we don't use such terminology ourselves.

3.3 Why is identity deemed a necessity?

The predicate logical language L in the previous section was based on the object constants {a; b; c; Mark; Jane} and the predicate constants {odd; love; miss; rich}. Now please notice that the codomain of the interpretation function I in the previous example contains no appearances of either object constants or predicate constants. Put another way, there are no appearances of elements of dom(I) in codom(I). The reason for this is quite simply that First Order predicate logic[17] does not allow object constants and predicate constants to be part of the domain of discourse D. HD: Agreed. As far as my understanding reaches, Codd's information principle is, at least in spirit, referring to this fact. HD: I never thought of it that way.

When value substitution is not enough Now please consider a modified interpretation function as an extension of the previous example. This example will attempt to illustrate that by allowing so-called predicate constants to appear in the co-domain of the interpretation function, more sophisticated HD: i.e., second-order? and higher? logical statements can be made[18].

dom(I)        codom(I)

a                  1

b                  2

c                  3

Mark          "Mark"

Jane           "Jane"

love            {("Jane"; miss); ("Mark"; love)}

miss            {("Mark"; "Jane")}

rich             {"Jane"; "Mark")}

odd             {1; 3}

In this example the predicate love is used to make the statement that Mark loves to love and also the statement Jane loves to miss. Notice that it would be incorrect to substitute {("Mark"; "Jane")}, which is the propositional value of the miss relation, for the predicate constant miss in this example? Such a substitution would represent the claim Jane loves the set {("Mark", "Jane")}, which is clearly a different statement than the statement Jane loves to miss. HD: Agreed, but don't see where this is going.  We are still in first order.

Of course one could argue that such expressiveness is not necessary. HD: We wouldn't argue that way, because you get it anyway with relational completeness.  Or do we misunderstand something? This however, does not seem prudent when the purpose is to define a foundation for future databases. By allowing predicate constants into the domain of discourse it now becomes possible to ask question like: What does Mark love to do? Or Select all the people who like to love people or miss people and also What do people love to do? I would hope that models for future databases, how ever they are called, would at least define operators which allow, the manipulation of and access to, objects in the domain dom(I) of the interpretation function I. It is also desired that predicate constants are added to the domain of discourse.[19] HD: Again, surely this "expressiveness" comes with relational completeness.  The terms presented to us as standing for predicate constants don't seem to have anything special about them.  If the system is aware that they stand for predicate constants, then we need to know what operators are envisaged to operate on values of type PREDICATE_CONSTANT.

Generic data-mining applications which search for "trends" in databases, generic business repositories, generic database applications, which automatically generate user interfaces allowing user friendly access to databases, intelligent agents which master the art of speech, etc. are examples of applications which would benefit from this.

3.4 Summary

The fact that The Third Manifesto rejects constants representing the identity of objects in databases is in my opinion a logical error and as a consequence a big mistake. This rejection of identity is a logical error on the following counts:  HD: Gittens has failed to explain what "constants representing the identity of objects" are, and why he thinks we reject them.

·        The Third Manifesto rejects identity on grounds which are not relevant in mathematical logic[20] HD: We do not "equate identity to pointers".  We equate object identifiers (as found in languages like Java and C++) to pointers.

·        Key concepts like relation variables and candidate keys, are not recognized within the relational algebra of The Third Manifesto. Since these concepts are, according to The Third Manifesto, required in future databases, it is an error to not give them a sound mathematical foundation[21].  HD: It doesn't make sense to us for an algebra to "recognize" variables, nor does it make sense for a relational algebra in particular to "recognize" candidate keys.  By the way, it is not clear what Gittens means by the relational algebra of The Third Manifesto.  RM Pre 18 requires "the usual operators of the relational algebra (or some logical equivalent thereof)" and lists some specific operators that are required to be supported "without excessive circumlocution".

·        In the definition of a tuple value, it is evident that tuple values include an object identifier called an attribute name. HD: Now it is clear that what Gittens means by object identifier is not what we mean! Contrary to what is claimed by Date and Darwen[22], the value of a triple, or tuple with a arity of three, representing an attribute does not define its identity  HD: We do not refer to the concept of defining something's identity. The object identifier attribute name defines the identity of this triple in a tuple because it is the attribute name that must be unique.[23]  HD: Agreed that the attribute name uniquely identifies the triple within a given tuple.  Disagree that this contradicts anything else we have written.

Adding insult to injury, the rejection of identity also limits the logical expressiveness of the algebra upon which future databases are might be based. HD: We reject that we are "guilty" of rejecting "identity". This opinion has been substantiated by illustrating the correspondence between relational and predicate logical knowledge representation models. In terms of relational database systems the following suggestions are made in this regard:

·        Allow for a properly typed equivalent of predicate constants, representing the identity of a predicate.  HD: Consider the predicate a + b = c.  According to my recent reading on the subject, this predicate contains an appearance of the predicate constant = and also an appearance of the function constant +.  I see no predicate constant representing the identity of the predicate.  That said, I think it's a nice idea to consider a relvar name as being a special case of a predicate constant, and in this special case the predicate constant can perhaps be considered to represent the identity of the predicate.  But if we want special operators for operating on relvar names, and I'm right in guessing that such operators can't be extended to apply to predicate constants in general, why not just call the operands relvar names? Properly typed object identifiers serve this purpose well.  HD: We reject OO-style object identifiers for the reasons given in the book.  In any case, I don't see how every oid can be considered to represent the identity of a predicate.  In Java, the invocation "new point(1.0, 1.0)" returns the oid of a point object (variable) that has been initialized to the indicated value.  In what sense is that oid a predicate constant?

·        Allow for operators which provide access to, and the manipulation of, the equivalent of the domain of predicate logical interpretation functions[24].  HD: We subdivide this domain into subsets called types.  The operators defined for values and variables of those types provide "access to" every value in each type and hence every element in the domain of discourse.

4 On the expressive equivalence of relation values and

tuple values

HD: We do not understand the purpose of this Section.  It seems to lead nowhere.  We do not understand "expressive equivalence".

4.1 Introduction

This section will show that every tuple value has a corresponding representation as a relation value. Conversely every relation value will be shown to have a corresponding representation as a tuple value. This exercise will be performed using liberties allowed by The Third Manifesto.

HD: The demonstration that follows is merely a feat of prestidigitation.  We reject it because, even if we accept its validity, it is only a demonstration of stuctural similarity.  For language design purpose we distinguish things by their "behaviour" (i.e., the operators defined for them), not their perceived structure.  We define structure in order to be able to define operators.

4.2 Defining tuple values

Let us consider tuple values and relation values as they are defined in chapter 3 of The Third Manifesto. A tuple t is defined as a set of ordered triples (I; T; V) called attributes. Such that:

_ I is an identifier called the name of an attribute. No two attributes in t share a common name.

_ T is an identifier representing the type of an attribute.

_ V is a value of type T, called the attribute value.

The set of pairs obtained by eliminating the attribute value from triples in t is called the heading of t . The heading of a tuple t will be denoted: heading(t ). When the purpose is to show that Relation values and Domain values are basically appearances of one and the same thing, one is inclined to demonstrate that any relation value can also be represented by a set of triples. So, please read on...

4.3 Defining relation values

The Third Manifesto defines a relation r as a pair (h; b) where :

·        h represents the heading of r. The heading h is defined to be a tuple heading.

·        b represents the body of r. b is a set of tuples all conforming to the heading h.

In the following it will be demonstrated that every relation value[25] can be represented by a mathematically equivalent tuple value[26]

4.4 Showing that all relation values are tuple values

The purpose of this section is to illustrate that, by the liberties provided by The Third Manifesto, all relation values are tuple values. Let r = (h; b) be a relation value with heading h and body b. Since tuple values are sets of ordered triples it becomes necessary to demonstrate that all relation values are similarly representable as sets of ordered triples. The body b of the relation r will now be defined as a set ts of ordered triples (I; T; V), such that:

·        I is an identifier called an object identifier HD: This construct appears to have no counterpart in The Third Manifesto.  Therefore we reject it.

·        T is an identifier representing name of a type.

·        V is a value of type T.

To insure that this set of triples t s forms a valid tuple the following conditions must hold for all triples (I; T; V) in ts :

·        no two object identifiers in ts are equal

·        the type T is defined to have the same type as the heading the h of r, which is to say: T = heading(r).

·        V is a tuple value of type T.

HD: This 3-part definition does not accord with The Third Manifesto and we reject it out of hand.  It permits two or more tuples to have the same V, contrary to RM Proscription 3.  Furthermore, the first listed component, I, is something we do not recognize and would in any case be redundant.

When these conditions hold, ts will be a valid tuple, containing the same information found in r. Can this exercise not be performed for any and every relation value?  HD: No, not for any of them!

4.5 Showing that all tuple values are relation values

Let t be a tuple value the heading of which is denoted heading(t). A relation value r = (heading(t); t) is quickly recognized as a relation value which is in no logically significant way different from t.  HD: We disagree. r is something that can be operated on by the relational restriction operator, whereas t is not.  Furthermore, the union of tuples t1 and t2 is called "join" defined to return a tuple whose heading is possibly different from that of t1, that of t2, or both; the union of relations r1 and r2 is called "union" (not "join") and is defined to return a relation whose heading is the same as that of both r1 and r2.

4.6 Summary

This section illustrated that tuples and relations as defined by The Third Manifesto are appearances of one and the same thing. This implies that from a logical point of view, only one of the two concepts is a necessity. Noticing that user defined types in principle, allow types of arbitrary complexity supporting a diverse set of operators, domains, HD: No, domains are not operators.  Or is this some kind of typographical error? as defined by the Third Manifesto would seem to be the most general of the types supported by The Third Manifesto. Domains have been equated to object classes by The Third Manifesto, it is interesting to contemplate the logical implications these of findings.  HD: We do not understand the point being made by this final sentence.  As the section is headed "Summary", it should refer to something written in the preceding subsections of Section 4.  If instead it is referring to a candidate area for subsequent investigation, it should say so.  In any case, we would say that much of the material in the book is the result of our contemplation of the logical consequences in question.  We should add, though, that "equate", if meant literally, is too strong.  The classes of Java and C++ are OO counterparts of our types but are not logically equivalent to them.

5 Conclusions

With regard to the subject matter of this article the following conclusions are drawn:

·        The Third Manifesto, has provided no logically valid substantiation for the claim that the alleged first great blunder is indeed a blunder.  HD: We would agree with this claim if it can be shown that our substantiation includes a contradiction.  We have not be shown anything written in The Third Manifesto that we would accept as a contradiction.  If somebody who properly understands our substantiation wishes to claim that it is weak, we would merely disagree with that person on that particular point and there is no much more to be said.  If somebody who seems not to properly understand our substantiation claims it to be weak, we would try to point out the apparent misunderstandings and ask that person to reconsider.

·        The Third Manifesto, has provided no logically valid substantiation for the claim that the alleged second great blunder is indeed a blunder. HD: Our response to the first bullet can stand for this one too.

·        From the perspective of the relational algebra presented in The Third Manifesto, the requirement that each relation variable must have at least one candidate key, is an arbitrary one.  HD: First, that each relvar has at least one candidate key is not a requirement; it is an observed property of every relation, and therefore of every relation that might ever be the value of a relvar (which is what we mean by the candidate key of a relvar).  Anything that does not have a candidate key is not a relvar!  Second, as already noted, candidate keys have nothing to do with relational algebra.  Third, this conclusion appears to come right out of the blue—it does not seem to follow from anything that has been written elsewhere in the paper.

·        In the relational algebra of the Third Manifesto identity is rejected by Date and Darwen, while it is reified, as a requirement, in the form of candidate keys in a context foreign to this relational algebra. This fact makes the rejection of identity in relational algebra, an arbitrary one.  HD: Candidate keys certainly do not represent a reification of identity.  A candidate key of a relation is an observed property of that relation (like parity being an observed property of an integer) and a candidate key of a relvar is a constraint, restricting the values that might be assigned to that relvar.

·        Domains, which have been equated to object classes by The Third Manifesto have been established to represent a more general class of types than relation types.  HD: We call them types, not domains.  The set of all types is obviously a proper superset of the set of all relation types; is it really worth mentioning as a conclusion?  If so, why?  (It is unfortunate that Gittens keeps on using the term "domains" for what The Third Manifesto calls "types".  In Section 3, he uses the terms "domain" and "co-domain" in their usual mathematical sense, which is not exactly the sense intended by E.F. Codd when he used the term in his Relational Model of Data.)

·        By rejecting identity in the algebra of future database systems The Third Manifesto also limits the logical expressiveness of future databases.  HD: We disagree without further comment, for reasons already given.

References

[1]           C.J. Date, Hugh Darwen [2000] Foundation for Future Database Systems: The Third Manifesto, Addison-Wesley Publishing Company.

[2]           J. van Eijck, E. Thijsse, [1989], Logica voor alfa's en informatici,Academic Service

[3]           Hugh Darwen [2002] "Gittens000.pdf", a personal communication

[4]           Maurice Gittens [2002]An anatomy of knowledge representation and a theory of meaning, A document available at http://www.gittens.nl



[1] This paper will refer to this book as "The Third Manifesto".

[2] I invite the reader to verify for herself or himself, that The Third Manifesto provides no logically valid arguments justify the labeling the presented propositions as "blunders".

[3] A logically valid premise is one that is substantiated in terms of mathematical logic.

[4] or layering violations, if you prefer.

[5] Also much of the argumentation which supports this claim is in this section of the book

[6] as in mathematically equivalent abstractions

[7] I will gracefully, acknowledge being in error if such arguments do exist.

[8] on page 14 of the second edition of the Third Manifesto

[9] There seems to be some confusion that object IDs, references and pointers represent one and the same thing. It should for example, be recognized that identity is a property of every element of a mathematical set. Confusing the identity of an object and the notion of a pointer is a logical error

[10] The reference to logical value of an argument refers to the degree in which an argument can be used to draw logically valid conclusions

[11] Similarly, symbols in mathematical strings also have identity. In the string ltaaalt there are three instances of the symbol a each with their own identity. The fact that the symbols are all the same, is not relevant to their identity in this string.

[12] Of course, based on the identity of objects, pointers can distinguish between them. But this does not equate pointers to identity.

[13] In this regard a formalism f1 is said to be more expressive than a formalism f2 when the set of statements that can be represented using f1 is a super set of the statements that can be represented using f2.

[14] In the context of databases I would suggest this operator be used for quantifying objects which are elements of the domain of predicate logical interpretation functions.

[15] and thus appealing to the goodwill of the reader

[16] The following section will elaborate, so that the distinction becomes clear

[17] The same is true for higher order predicate logic

[18] A superset of higher order logical called extensional type logic is based on allowing predicate constants in the domain of discourse.

[19] This is to say that in my opinion future databases such be _rmly based on extensional type or intensional type logic. At least by supplying the necessary primitives that allow extensional and intensional phenomena to be captured.

[20] For some reason unknown to the author, Date and Darwen equate identity to pointers.

[21] Otherwise, many could claim that The Third Manifesto judges Object Oriented Systems by different

standards than relational ones

[22] See quote in section 2.2

[23] Also, please see the next section for an illustration that, given the liberties provided by The Third Manifesto, tuple values and relation values are appearances of one and the same thing

[24] No, the catalog of commercial relational databases does not get it right. Have you ever noticed that, given the operators of relational algebra, it is impossible to perform a trivial operation like selecting every tuple in a relational database?

[25] under the definition of relation values dictated by The Third Manifesto

[26] under the definition of tuple values dictated by The Third Manifesto