Subj : Re: Polymorphism sucks [Was: Paradigms which way to go?] To : comp.programming,comp.object From : cb Date : Fri Sep 23 2005 11:19 am In article , Chris Sonnack wrote: >Ah, at last, an intelligent answer from someone who appears to know >what they're talking about! Cool!! > >Christian Brunschen writes: > >>>> The term "relational" has nothing to do with keys and their >>>> referencing stuff. I used to think it did, but like you now, >>>> I was wrong. You lost this one. >>> >>> EVERY resource I checked agrees. You have yet to produce anything >>> other than arm waving to support your point. If you are right and >>> I'm wrong, where's the proof? Should be easy enough to find. >> >> Let me quote from "An Introduction to Database Systems, Volume 1, Fifth >> Edition" by C. J. Date (Addison-Wesley, 1990) - >> >> >> At this point, the reader may be wondering why relational databases are >> called "relational" anyway. The answer is simple: "Relation" is just a >> mathematical term for a table (to be precise, a table of a cedrtain >> specific kind - details to follow in Chapter 11). > >Ah ha. A certain specific KIND of table. Okay, let's skip to Ch.11.... Yes, a specific kind of table. I don't have the C.J.Date book at hand at the moment, however some of the requirements on a table to make it a relationship, are: 1) it contains tuples whose columns are taken from domains 2) there are no duplicate tuples We need go no further than this to see that most 'relational database' systems actually aren't. Many 'relational' database systems don't have a concept of 'domains' from which the columns' values are taken; and it's trivial in most 'relational' database systems to have duplicate identical rows in a table. In other words, there's almost nothing that actually implements the relational model _fully_. Indeed, anything that uses SQL is basically forced by the SQL standard(s) to implement something that in some respects departs from the pure relational model. Yet we talk about 'relational databases' nonetheless, and we use relational theory, and it all works. That's because in most cases, the differences between a 'tabla' and a 'relation' are sufficiently insignificant in practice as to not make a difference. So in most contexts, saying that 'relation == table' is sufficiently accurate. Specifically, in response to the assertion >>>> The term "relational" has nothing to do with keys and their >>>> referencing stuff. I used to think it did, but like you now, >>>> I was wrong. You lost this one. you replied >>> EVERY resource I checked agrees. You have yet to produce anything >>> other than arm waving to support your point. If you are right and >>> I'm wrong, where's the proof? Should be easy enough to find. In other words, you were asserting that the term 'relational' _does_ have something to do with keys and their referencing stuff - which it doesn't. A database can be relational without having any keys, any references - just a single table. And I'm using the term 'table' deliberately, because there is in practice, from the point of view of the user, an almost completely insignificant difference. >(Given what you say far below, it appears the next bit isn't from the >vaulted Chapter 11, so I'm trimming without mercy here....) 'vaulted'? *looks for the nearest vault* [ methinks you want an 'n' in place of that 'l' ] ;-) Indeed, the next bit continued to be from the same chapter 4.2 as the preceding bit. >> The principles of the relational model were originally laid down by one >> man, Dr. E. F. Codd, at the time a member of the IBM San Jose Research >> Laboratory [ ... ] . it was late in 1968 that Codd, a mathematiccian >> by training, first realized that the discipline of mathematics could be >> used to inject some solid principles and rigor into a field - database >> management - [...]. Codd's ideas were first published in a now classic >> paper, "A Relational Model of Data for Large Shared Data Banks") [ ... ] > >You know, maybe we should just go to the source: > > http://www.acm.org/classics/nov95/toc.html Maybe we should. Incidentally, why haven't you done just that prior to my posting? Or indeed, prior to making your inaccurate assertion - then this whole flamewar would have been avoided. >> The formal theory therefore does not use the term "record"at all; >> instead, it uses the term "tuple" (short for "n-tuple"), which was >> given a precise definition by Codd when he first introduced it. [...] >> it is sufficient to say that the term "tuple" corresponds approximately >> to the notion of a flat record instance (just >> as the term "relation" corresponds approximately to the notion of >> a table). > >You notice the use of "approximately" twice? Yes. But the approximation is 'close enough for government work' as the phrase goes. In spite of all these' approximatively's, the book continues to use 'table' and 'relation' ass essentially synonyms throughout the book - only pointing out any differences where they are *specifically* necessary to point out. >> In Chapter 11, 'Relation Structure', the book introduces formal definition >> for each term, which I will not quite here; > >Aw, why not? Because thay are *long*, and I only had a limited amount of time available when composing my post. >> but I will quote form the introduction to the chapter, on page 249: > >Hey, better'n nothing! A *lot* better, because rather than getting stuck in minute details which matter only in specific cases, the introduction covers thefundamental important bits and gives a general understanding (with the usual caveat that a general understanding does not equate an in-depth one). >> >> [ ... ] >> * A _relation_ corresponds to what so far in this book we have generally >> been calling a table. > >Ahem, "have GENERALLY been calling a table." The word 'generally' does not change the substance of the sentence: "'A _relation_ corresponds to [ ... ] a table". Yes, there are specific differences, but as a generalization, 'a relation corresponds to a table'. >> * A _tuple_ corresponds to a row of such a table [...] >> > >And the meat of this lies in the definition of a tuple. And, after reading >Codd's paper (link above), I'll agree that his definition of a "relation" >does show a single table that--from his example in section 1.3--containing >records not **obviously** linked. However, I suspect this is similar to >a code fragment and only shows a partial picture of the whole. After all, >I don't know any suppliers named "1", "2" or "4". You can just do a search-and-replace of the identifiers for the suppliers with some supplier name, if you prefer; the gist of the example remains the same, and requires no 'linking'. The term 'relation' is used, as Codd writes, 'in its accepted mathematical sense', which is described at as follows: In mathematics, an n-ary relation (or n-place relation or often simply relation) is a generalization of binary relations such as "=" and "<" which occur in statements such as "5 < 6" or "2 + 2 = 4". It is the fundamental notion in the relational model for databases. Formally, a relation over the sets X1, ..., Xn is an (n + 1)-tuple R=(X1, ...., Xn, G(R)) where G(R) is a subset of X1 × ... × Xn (the Cartesian product of these sets). If X=X1=X2=...=Xn, R is simply called a relation over X. G(R) is called the graph of R and, similar to the case of binary relations, R is often identified with its graph. An n-ary predicate is a truth-valued function of n variables. Because a relation as above defines uniquely an n-ary predicate that holds for x1, ..., xn if (x1, ..., xn) is in G(R), and vice versa, the relation and the predicate are often denoted with the same symbol. So, for example, the following two statements are considered to be equivalent: (x1, x2, ...) member-of G(R) [*] R(x1, x2, ...) Relations are classified according to the number of sets in the Cartesian product; in other words the number of terms in the expression: - unary relation or property: R(x) - binary relation: R(x, y) or x R y - ternary relation: R(x, y, z) - quarternary relation: R(x, y, z, w) Relations with more than 4 terms are usually called n-ary; for example "a 5-ary relation". ( [*] There is no 'member-of' character in ASCII, so I had to spell it out. The appropriate character is used on the wikipedia page. ) Look - nothing here about keys or references to anything, inside or outside the relation itself. You'll eventually have to accept that the term 'relational' has nothing to do with keys, or references, or similar; just with each individual _relation_ itself. >(A few paragraphs later he talks about "relationships", so I think there's >still some wiggle room here! :-) Not really - Codd's definition of a 'relationship' is simply the same as that of a 'relation', except for the fact that where the columns in a relation are identified by their position (and thus the order of the columns in a relation is significant), in a 'relationship' the order of the columns is irrelevant as they are instead identified by their name. This has the consequence that whereas in a relation, column names need not be unique, in a relationship, column names _do_ need to be unique. Or to quote Codd, "In mathematical terms, a relationship is an equivalence class of those relations that are equivalent under permutation of domains [ ... ]." >> So, [...] it should be fairly clear that the term 'Relational' in >> 'Relational Database' has _nothing_ to do with relationships between >> different tables: > >Which I never claimed it was. From your own quote blow, it certainly appears that you did: >I had to go read the back thread to recall >how this started, but it began over a disagreement whether SQL was a >"relational language". I suggested it would work perfectly well in a >non-relational database containing a single (non-related) flat table. The point that you missed is that even a single table is still a relation in the 'relational database' sense. The fact that there is only one table, or the fact that there are no foreign keys, does not make the database any less 'relational'. >And turned into a mini-war from there. Fanned on by the fact that neither side apparently bothered to properly research the issue. >I stand by the original seed: SQL is not a "relational language". >I also stand by the idea that "relational", in "Relational Database" >is about *relationships* between records. I claim that if there are no >relationships, there's no Relational. And that claim is wrong, as I have demonstrated. The term 'relational' comes from the mathematical definition of 'relation', which is simple a more precise way of saying 'table'. >> The term 'Relation' is instead a different, more precise term for >> the type of tables used in Relational databases. > >Exactly. Pay attention to the words: "...the TYPE of tables." Not just >you plain old tables, but (as quoted above), " table of a cedrtain [sic] >specific kind". Specifically, tables whose columns have values from domains, whose rows are tuples of those columns, where the table doesn't contain duplicate rows, yes. But even a single 'flat table' that fulfils those requirements is in fact a relation, and even completely standalone can be considered a 'relational database'. A database with a single table that doesn't refer to anything else, is *just as* relational as a database with lots of tables that refer to each other using foreign keys. It's that simple. >VERY important distinction. Not so important in effect, it turns out. >Here's what I wrote over a month ago: >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Here's a table (first row headers): > >NAME UID COLOR DATE TYPE >Alice - Blue 5-3-1948 BG45 >Bob - Red 5-3-1948 XL220 >Carol 3345 Red 12-7-1955 E30F >Dave 7709 Green 9-14-2000 - > >I dispute your "relational==table" definition, so I see not >one thing that makes this table "relational". It's just a >(what I'd call) "flat table". Well, guess what, a 'flat table' can still be a 'relation'. [ It also may not be, if it contains duplicate rows, or breaks one of the other requirements (none of which are really particularly onerous); but the above table doesn't, so the above table certainly looks like it would qualify as being a 'relation'. ] And, if we keep in mind that almost no actual relational database product on the market actually implements the complete full relational model with all its requirements on tables, the above table is certainly *just as relational* as a collection of inter-connected tables would be. In other words, there is no difference in how 'relational' a database is, simply because of the number of tables in it, or the number of foreign keys or other references between tables. You may not recignise the 'relational-ness' of the table you have skeitched above, but it's there. >Given something containing this table that also supported >SQL, you could write: > > Select NAME,COLOR from TABLE order by NAME That satement essentially performs a 'projection' operation on the 'table' relation (see section 2.1.2 of Codd's paper): "Suppose now we select certain columns of a relation (striking out the others) and then remove from the resulting array any duplication in the rows. The final array represents a relation which is said to be a _projection_ of the given relation." There are two differences though: * SQL would not actually remove any duplicate rows by default - you'd have to use 'select distinct' * You have introduced an ordering (by name) on the output, whereas the result of a projection on a relation is another relation, and relations are per definition unordered But those are differences that apply to SQL in general, not just to 'SQL on a flat table'. Indeed, any database that uses SQL is forced, by the definition of SQL, to not implement the full relational mdel. yet we still refer to them as relational databases, because fundamentally, in spite of a few differences, the model used and exposed to the user by those databases, is the relational one. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> So, even a database with only a single table, is a relational >> database. > >Only if it's a certain *specific* *type* of table. (-: Your table above, which you attempted to use as a counterexample, appears to qualify as a relation. Of course, if you were to introduce multiple identical rows, then you'd no longer have a relation - but then that's a deliberate choice, and would be no different if this table was one of several ones that were referencing each other. The end result is that your original assertion remains incorrect, and that the general statement 'relation == table' _is_ substantially correct, your protestations to the contrary. Best wishes, // Christian Brunschen .