Feedback on CS252 exam

[From Meurig Beynon, who marked the exam, but didn't make up the questions!]

The feedback below is based on my preliminary marking of the exam papers. The scripts will be reviewed by another examiner before final marks are awarded. Overall, question 1 was answered best, and questions 2, 3 and 4 were progressively less well-answered. This may reflect the fact that by and large students tackled the questions in their order of appearance, and may have had less time for their last question. Question 2 had a relatively easy first component, but not so many students did as well in the later parts. There were many excellent answers to Question 3, but the overall average was brought down by students who gave token answers. Question 4 needed quite a lot of time and SQL expertise to do well, and not many students achieved a good mark on it. More detailed discussion of each question is given below.

Question 1

(a)(i) Most students recognised that were two statements in the script by counting the semi-colons. Surprisingly few showed that they understood the implications of this in the later parts of the question - parts (e) and (f).
(ii) Not all students paid attention to exactly what was asked here. For instance, they omitted to say what type of value an operator returned (viz a relation or a truth value), and they also referred to operators (like UNION) that weren't in the last line.
(iii) This was KEY(x) - and not the "(r5 where y>3 and z>y)" cited by some. Good practice here is to recognise constraints as assertions with logical values that have no side-effects (i.e. don't change the contents of relations etc).
(v) The English in this part of the question was tough to understand - the key point is that the "parameters subject to update" were xy and r5.
(b) This wasn't a particularly easy predicate to understand informally - it meant "<this many> people chose <this number>". It then makes sense to state that "a people chose b" and also to state that "b people chose a". Several people failed to recognise UNION as "or" rather than "and" though.
(c) Many students got this wrong - but it should be obvious from the update operation: "r4 := xy MATCHING ...." that the type of r4 is that of xy.
(e) and (f) went together, and most students went wrong here. Most said that you couldn't say what the value of COUNT(r4) was, which should have alerted them to the need to think carefully. (In general, it's unlikely that the correct answer to a "what is the value ...?" question would be - "you can't tell what the value is" - unless perhaps it's as a follow-up to a more straightforward "what is the value ...?" question that has a more obvious answer.) In fact, this was a big hint that you needed to interpret the execution of the "two statements" identified in part (a) correctly. With the correct interpretation, the MATCHING operation that defines r4 works only with the single tuple {x 1, y 2} in the relation xy, so yields an empty result. The other tuples only get inserted into xy after this definition of r4 has been made.
(g) Doing this part correctly depended on recognising that not all the updates to xy get executed at the same time - the update of r4 comes between the initial assignment to xy and the subsequent ones. Note also that an update to a relvar means that you have first to clear any existing value by using an SQL truncate() command.

Question 2

The first three parts of this question were typically done pretty well. Most students had difficulty with the other parts. It is always difficult to specify queries accurately when you are doing this on paper in the absence of the DB environment. General principles to observe are: make sure that your answer matches the category of answer expected. For instance, determine whether you think the answer is a constraint (as it was in part (d)), and at least give an answer that is a constraint (e.g. begins with "NOT EXISTS" etc). It's also important to ensure that you refer to the appropriate tables in your answers - students often give answers that clearly don't refer to a table that is critically important in getting the right result! (Ideally, it might be that - in setting questions - we should actually request that you specify which tables you need to answer a specific query, but this would make long questions even longer.) Many students gave fragmentary answers some of which perhaps deserved credit ... to maximise the chance of getting credit where you don't completely specify a query, you should try to annotate what you do by way of developing fragments. For instance, organise your partial answers, and explain what each is intended to express (without this information the examiner can't easily reward your unfinished work). Another important issue in translating from Tutorial D to SQL is to remember that SELECT DISTINCT is appropriate because of the poor semantics of SQL.

Question 3

If you understood the topic, this was a relatively straightforward question, and very neatly structured to guide you. One danger in setting with questions with many parts is that if students answer the first parts of the question incorrectly, the later parts don't really make much sense. This is good reason why it sometimes pays off to read the whole question through before you answer any parts - the chances are that the later parts give hints as to what is expected before.
(a) Most people remembered this definition, but as usual some wrote a garbled version of something half-recalled and perhaps not well-understood. The definition of BCNF has the logical form: "if A → B is a non-trivial FD, then A is a superkey". Some students seemed to think that this was equivalent to asserting that "A → B is a non-trivial FD and A is a superkey", which has no force at all as a definition.
(b) The point of the FDs in this part of the question was to give you some sample FDs - some correct, some spurious - to test your understanding of the basic concept and notation. The biggest error that you could make here was to regard the FDs FD1 - FD6 as integral to the rest of the question - these 'see-what-you-think-of-these' possible FDs weren't meant to be referred to again in parts (c) and (e) - though some students did interpret in this way. Sadly, this made their answers to (c) and (e) absurd. As far as the answer to (b) is concerned, some students interpreted A → B the wrong way round - it should be read as "A determines B" and not as "B determines A". Note also that {A,B} → {C} is not the same as {A} → {C} and {B} → {C}, whereas {A} → {B,C} is the same as {A} → {B} and {A} → {C}. When you give your answer to a question like this, it's OK to give the answer in cryptic form provided that it's correct (as in "FD1, FD4"), but in general it pays to explain the reasoning behind your answer, since it's hard for the examiner to give you credit for an answer like "FD1, FD5" which seems to be partially correct, but may actually just be a guess on your part, and doesn't reveal your thought processes.
(c) Most students correctly interpreted the business rules as FDs, but didn't look hard enough at the conditions specified (e.g. "the number of FDs is the minimum required") - in particular if a customer determines a unique salesperson (BR3), and each salesperson is in exactly one region (BR5), then every customer is in exactly one region (BR2).
(d) The only key for the relation REP_ASSIGNMENT is C# - a point that you missed if you didn't spot that a given customer determines a specific salesperson and a region etc. Many people suggested S# as a key - a notion that is intuitively clearly absurd: "how can selecting a particular salesperson determine a specific customer?". It always pays to think about the real meaning as well as following standard syntactic rules when interpreting formal expressions.
The rest of the question was generally quite well answered by those students who read (f) carefully. The key point here was to note that you were asked for "an equivalent pair of relvars", which was a well-conceived way to put you back on track even if you'd got confused earlier in the question. There wasn't much hope for students who interpreted "a pair of relvars" as three or more however! A less serious error that several students made - perhaps following the ridiculous sample relvar included in (f) simply to remind you of the form your answer should take - was to specify more than one key, as in KEY{C#}, KEY{C#,CN}. Of course, by the very definition of keys, one key cannot contain another. In part (i), there were obvious constraints in the form of foreign key requirements, but you also needed to add a constraint to ensure that a customer was in the same region as the salesperson allocated to them (if any).

Question 4

This question was not a joy to mark, as students didn't typically give correct complete answers nor did they provide well-documented answers for which marks could be awarded even though they were inaccurate or incomplete. It's good to bear the examiner's perspective in mind when you write exams - more often that not, examiners are keen to reward effort, but it's not realistic to expect them to interpret fragmentary efforts at framing an SQL query or highly convoluted natural language interpretations of an SQL query in order to give some credit.
(a) looked straightforward, but in fact only one student answered in quite the way that I had expected. Students, bars and beers are normally conceptually quite distinct, at least until you've had a skinful of the latter, so you need to rename the fields s, ba and be (or exploit SQL's diabolically loose type-checking!) if you are to make a single relation that lists all data of all three kinds in a single column. You otherwise had to select s, ba and be from a product of the three tables to construct a plausible answer (for which you were awarded full marks).
(b) was an exercise in using aggregate operators in SQL which students familiar with these operations generally did quite well.
(c) required you to realise that the s-fields in 'likes' and 'visits' in the DB were the only source of students, and that if you removed those students who visited bars, you would be left only with those (if any) who featured in the 'likes' table but not in the 'visits' table. Some SQL expressions for this set of students were exceedingly convoluted, given that operators such as "MINUS" and constructions such as "NOT IN" are available. Note the benefit of looking at the question as a whole in this context - you could then see how to use the constructs required from the code given in (d) parts (i) and (ii).
(d) The answers to this part ranged from explicit translations of the SQL into an operational account - "Select distinct students from the 'likes' table eliminating the distinct students from the 'likes' table renamed as l where the beer that they like is in the beers selected from 'likes' renamed as l2 where the student s in table l is the same as the student s in the table l2 and ..." to what looked like outright guesses. If an operational account of SQL queries is for you "normal English", you may be having some difficulty communicating - though you should be able to identify some companions facing a similar difficulty who are also doing the module. If you make outright guesses, it is a good thing to ensure that your answer at least belongs to a plausible category - and returns 'students' for example, rather than 'beers'. Obviously neither kind of answer can atract a lot of marks. As in other questions, it is important to document what understanding you've got even if that falls short of a complete understanding and a definitive answer. If you can just give a natural language interpretation of one clause within a complex SQL expression that is helpful and earns you credit. Incidentally, there was an unfortunate side-effect of the choice of a Courier-style font in this question, which led a few students to misread l2 ("el-2") as 12 ("twelve"). For the record, the answers to parts (i) and (ii) had nothing to do with twelve students, twelve beers or twelve bars.
(e) This query was actually one for which you could have got a lot of guidance had you correctly interpreted part (d) (ii) - again something that underlines the value of looking at the question as a whole.
As noted in Question 2, an important consideration in formulating answers to queries is to ensure that you at least refer to all the relevant tables - for instance, it's expecting a lot to give a valid answer to parts (e) and (f) that doesn't refer to all three tables, since the verbs 'like', 'visit' and 'serve' all occur in both parts.