hngopher.com

       [HN Gopher] Databases, Types, and the Relational Model: The Thir...
       ___________________________________________________________________
        
       Databases, Types, and the Relational Model: The Third Manifesto
       [pdf]
        
       Author : fipar
       Score  : 150 points
       Date   : 2021-10-04 10:28 UTC (12 hours ago)
        
 (HTM) web link (www.dcs.warwick.ac.uk)
 (TXT) w3m dump (www.dcs.warwick.ac.uk)
        
       | garmaine wrote:
       | I really wish Date had any understanding of modern type theory,
       | since that's the basis of his desired reform of SQL. His typing
       | model is 50 years out of date.
        
         | blacktriangle wrote:
         | Heck I wish he even had any understanding of the use of NULL in
         | real-world databases. He spends books arguing, very
         | compellingly I might add, against the use of NULL in SQL dbs,
         | but when asked to come up with an altenrative to NULL, he's
         | suggestions are sentinel values and splitting out every
         | nullable attribute into its own table. Both of these solutions
         | are trivially rejectable.
        
           | Joker_vD wrote:
           | Splitting every nullable attribute is not "trivially
           | rejectable", you absolutely can do this to represent sum
           | types (and it doesn't mean that those should actually be
           | stored separately: the relational model is _logical_ model,
           | it doesn 't matter if a table is implemented with one or with
           | two B-trees or somehow else entirely).
           | 
           | The main problem is that you have to manually type JOINs for
           | all of such sum-typed properties to do any useful
           | queries/updates -- and there really is no other way than
           | write it every time anew by hand because the relational
           | theory is just the intuitionistic _first_ -order predicate
           | calculus. You'd really want some _second_ -order tools, to
           | group those tables together... but then you essentially end
           | with tables with sum-typed columns, and since the main idea
           | of the relational model was to never step out of the first-
           | order predicate calculus, the reasonable solution is probably
           | to just introduce sum-types as actual types.
        
             | blacktriangle wrote:
             | Maybe trivial was the wrong word, but your second paragraph
             | is exactly why I reject the approach to splitting out all
             | nullable values into their own relation. Yes from a data
             | modeling theoretical perspective it works, but from a
             | pragmatic perspective having to join everything back
             | together again is a worse situation than just thinking
             | through the behavior of null in that situation.
             | 
             | But even then things are awful. What do I do when I want to
             | include the value of the nullable table in my result, but I
             | want all of my records, not just the ones with non-null. So
             | even though you've purged null from your data model you're
             | still bringing them back with an outer join.
        
               | Joker_vD wrote:
               | Well, what's the semantics of your result? Say you have
               | TABLE UsersWithKnownAge(id ID, age INTEGER)         TABLE
               | UsersWithUnknownAge(id ID)              CONSTRAINT
               | UsersWithKnownAge JOIN UsersWithUnknownAge IS EMPTY
               | 
               | with obvious predicates: "User with id ID is known to be
               | AGE years old" and "User with id ID is unknown". You join
               | them and get the relation with the predicate "User with
               | id ID..." How do you continue?
               | 
               | "We know AGE_KNOWLEDGE about the age of the user with id
               | ID", where AGE_KNOWLEDGE is Option<Integer> that has
               | semantics "number is unknown" or "number is known to be
               | NUM". Okay. What information about the whole dataset can
               | you extract from this join?
               | 
               | If your query needs to know the users' ages, you can't
               | execute it unless you restrict yourself to querying about
               | UsersWithKnownAge, in which case, just run it on
               | UsersWithKnownAge.
               | 
               | If your query doesn't need to know users' ages, you can
               | run it on "UsersWithKnownAge(id) UNION
               | UsersWithUnknownAge(id)", not on an OUTER JOIN.
        
               | talaketu wrote:
               | Nice toy example.
               | 
               | > What information about the whole dataset can you
               | extract from this join?
               | 
               | You could count the number of users.
        
               | j-pb wrote:
               | Joining the thing back together again is not so bad if
               | with recent multi-way join algorithms.
               | 
               | But then again, the resulting model is factually a graph
               | database and far far away from resembling something like
               | SQL or Relational Calculus.
               | 
               | (Which might not be a bad thing though.
               | 
               | "You either die NOT NULL or live long enough to see an
               | instance where you really wish that column was
               | nullable.")
        
               | Joker_vD wrote:
               | It's the implicit semantics of the tables. We start with
               | modelling the world itself: "user with id ID is AGE years
               | old and lives at ADDRESS", and end with modelling out
               | particular instance of knowledge of the world: "user with
               | id ID exists", "user with id ID is known to be AGE years
               | old", "user with id ID known to live at ADDRESS"; and
               | with the "closed world principle" you can explore more of
               | the limits of the knowledge: "which users we know to
               | exist and know their addresses but don't know their
               | ages?". But query "what is the id of the oldest user?" is
               | simply unanswerable unless you know all of their ages;
               | the best you generally do is answer "what is the id of
               | the oldest of the users that we know ages of?".
               | 
               | It's when we start confusing the map and the territory,
               | that's when lot of miscalculations starts to happen.
        
               | j-pb wrote:
               | I think you meant to write "open world principle", as
               | every non state fact is implicitly false under the closed
               | world assumption.
               | 
               | But yeah I agree, that databases should always be thought
               | of a a model of the world (crisp), and not as the world
               | itself (fuzzy).
               | 
               | But it's also noteworthy that this is a leaky
               | abstraction, and that any kind of database that has to
               | operate in the real world (in contrast to say an entity
               | component game engine) will face this leakiness.
               | 
               | The only way I see to resolve this is to turn the
               | database into something that remains crisp in the face of
               | fuzziness. E.g. by storing only observations or
               | interpretations of the real world processes, since the
               | observation is an "subjective" statement made from the
               | database perspective, it holds true regardless of the
               | actual "objective" condition.
               | 
               | It's just not easy to program and model in such a system.
        
               | blacktriangle wrote:
               | Sounds like a restatement of the open world / closed
               | world debate.
        
           | bob1029 wrote:
           | > splitting out every nullable attribute into its own table
           | 
           | One property of 6th normal form is that null _ceases to
           | exist_ by way of optional joins. This is a good thing.
           | 
           | If you ignore the physical data model for a few seconds, you
           | might see how this is can become a deterministic path for
           | modeling success in _any_ problem domain of arbitrary
           | complexity. You aren 't able to make the kinds of assumptions
           | that would typically screw you over in the future if you
           | follow 6NF properly. De-normalization for performance reasons
           | should be undertaken only after you have perfectly modeled
           | the domain in logical terms.
        
         | garethrowlands wrote:
         | Can you give examples of things this misses?
        
       | vlada_dvia wrote:
       | Interesting read!
        
       | jrochkind1 wrote:
       | > What makes this book different? We feel there are two key
       | features that make it stand out: its strong focus on the
       | relational model, and its thorough treatment of type theory (and
       | together, of course, the relational model and type theory are
       | precisely the foundations referred to in the previous paragraph).
       | To amplify:
       | 
       | > * The relational model: We must stress that what we are not
       | doing is proposing some kind of "new" or "extended" relational
       | model. Rather, we are concerned with what might be called the
       | "classical" version of that model; we have tried to provide as
       | careful and accurate a description of that classical model as we
       | possibly can. It is true that we have taken the opportunity to
       | dot a few i's and cross a few t's (i.e., to perform a few minor
       | tidying activities here and there); however, the model as we
       | describe it departs in no essential respects from Codd's original
       | vision as documented in references [20-22]. (Indeed, we even
       | toyed at one point with the idea of calling the book A Relational
       | Model of Data for Large Shared Data Banks.)
       | 
       | > * Type theory: Relations in the relational model have
       | attributes, and those attributes in turn have types (also called
       | domains). Thus, the relational model certainly assumes that types
       | exist. However, it nowhere says just what those types must be,
       | nor does it have much to say about what properties those types
       | must have; in other words, the relational model and type theory
       | are almost completely independent of each other (they are
       | orthogonal, to use the jargon). In this book, we present a
       | detailed theory of types that complements the relational model,
       | and we explore some of its implications. In particular, we show
       | that our type theory serves as a basis for a comprehensive model
       | of type inheritance.
        
         | throwaway894345 wrote:
         | On the subject of types, I've often wondered why relational
         | databases don't have sum types. Lots of real-world data models
         | have these kinds of relationships, and certainly "modeling
         | data" is well-within the purview of a database. I know that
         | ORMs exist, but they tend to be very bad and there's no
         | compelling reason to have to break off part of the database
         | responsibility, implement it terribly (it's divorced from the
         | query planner, so even the best ORMs are subpar) for each
         | application language, and put it on the client side.
         | 
         | Yes, I understand that sum types affect the way the data is
         | arranged, but the same approaches that are used by programming
         | languages to model sum types in memory can be used by databases
         | to model sum types on disk and in memory (it's all just byte
         | vectors at the end of the day).
        
           | Joker_vD wrote:
           | The relational model is a _logical_ model; that it 's backed
           | up by a B-tree or something else should bear no consequence
           | except in query/update performance.
           | 
           | And you kinda can simulate the sum types, you just can't
           | express it in the first-order way: instead of
           | CREATE TYPE Color (Red, Green, Blue)                  CREATE
           | TABLE MyTable(x Color, ...)
           | 
           | you do                   CREATE TABLE MyTableRed(...)
           | CREATE TABLE MyTableGreen(...)         CREATE TABLE
           | MyTableBlue(...)
           | 
           | There is just no syntax to group these three tables together
           | in all relevant queries/updates, and such skolemization is
           | really quite painful to maintain manually.
        
             | throwaway894345 wrote:
             | > The relational model is a logical model; that it's backed
             | up by a B-tree or something else should bear no consequence
             | except in query/update performance.
             | 
             | Agreed, but some people have incorrectly argued in the past
             | that introducing sum types would harm performance due to
             | the underlying data structures, so I was trying to head-off
             | that line of argumentation.
             | 
             | > And you kinda can simulate the sum types,
             | 
             | I'm not sure I follow your example. Your `Color` type is a
             | product type, not a sum type. I've explored every
             | workaround I could think of for sum types (and asked many
             | colleagues, etc), and they all failed in one important
             | respect or another (typically because the type system
             | doesn't understand your "sum type").
        
               | Joker_vD wrote:
               | No, it's a sum type. You have to add the constraints
               | CONSTRAINT MyTableRed INTERSECT MyTableGreen IS EMPTY
               | CONSTRAINT MyTableGreen INTERSECT MyTableBlue IS EMPTY
               | CONSTRAINT MyTableBlue INTERSECT MyTableRed IS EMPTY
               | 
               | but after that, I believe it works. It essentially
               | replaces the original "... and the color is 'x'"
               | predicate with three another predicates: "... and the
               | color is red", "... and the color is green", "... and the
               | color is blue" with the constraint that only one (at
               | most) of those can be true for any particular value of
               | "...". As you can see, you have to do that for every
               | would-be COLOR property, and adding more variants blows
               | up the number of tables and constraints even more
               | drastically.
               | 
               | It kinda reminds me of 3SAT and other logical exercises
               | in reducing everything down to the first-order logic:
               | it's doable, but the terms grow exponentially large.
        
               | throwaway894345 wrote:
               | The problem IIRC is actually consuming that sum type in
               | SQL. Unfortunately it's been several years since I played
               | with that approach, and I've forgotten the specific
               | problems I encountered.
        
               | alexvoda wrote:
               | The issue I see is that now you have mixed data and
               | schema. And unfortunately altering schema is much more
               | difficult in current systems. There is no homoiconic SQL
               | DBMS.
        
             | alexvoda wrote:
             | In what situation do you believe your solution would be
             | better than to create a table Colors, insert the desired
             | list of colors and have a FK from MyTable to Colors?
             | 
             | The way I see it a single atribute relation is a sum type.
             | Implemented in current imperfect systems as a single column
             | table (maybe two column table if you insist on using
             | surogate keys for various reasons)
        
           | lolski wrote:
           | You might want to check TypeDB, which is a database with a
           | strongly-typed query language:
           | https://github.com/vaticle/typedb.
           | 
           | It doesn't do union types yet, but it's got type inheritance,
           | type inference, type-safe joins (with its "relation"
           | feature). It's quite cool what you can do with it when you
           | can express complex domain with the language.
           | 
           | Disclaimer: I work there.
        
           | zaphar wrote:
           | I suspect part of the problem is that true sum types are
           | poorly represented in many of the languages that use these
           | databases. In a sense because of the layer these sit in the
           | stacks they have to appeal to a lowest common denominator. If
           | they supported tables where the rows were true sum types and
           | polymorphic as a result many languages would find it quite
           | difficult to interact with that database.
        
             | throwaway894345 wrote:
             | I don't understand why that would be a problem--programming
             | languages interact with databases at runtime via SQL. I
             | regularly use Go which lacks sum types, but there's nothing
             | technically difficult about generating SQL that leverages
             | sum types--except that SQL doesn't support sum types :).
        
               | zaphar wrote:
               | In Go returning a list of items that are one of a set of
               | different types would be poorly handled. It's possible
               | but very awkward. You would probably drop down to just a
               | slice of interface{} and then have to use reflection to
               | figure out what each item was.
               | 
               | Dynamic languages like Python or Ruby would probably be
               | fine but there are more languages than those out there to
               | worry about.
        
               | throwaway894345 wrote:
               | There are ways of modeling sum types in Go (a struct with
               | a few fields and a tag, an interface that's only
               | implemented by the relevant types, etc), but that's
               | unrelated to the database/client interface.
        
           | blacktriangle wrote:
           | Not a part of the SQL standard, but would PostgreSQL
           | enumerated types be what you're thinking of?
           | 
           | https://www.postgresql.org/docs/current/datatype-enum.html
           | 
           | Although I guess the question is are you thinking sum types
           | at the attribute level or sum types at the relation level? At
           | the relation level I'm not familiar with an answer, but I can
           | see where that would be a nice database-level solution for
           | polymorphic associations.
        
             | throwaway894345 wrote:
             | Postgres enum types are like C enums--a bounded set of
             | identifiers. Sum types requires a bounded set of
             | arbitrarily complex types (not just identifiers). For
             | example:                   type Event enum {
             | KeyPress(rune)             MouseClick{ x int64, y int64 }
             | }
        
           | mamcx wrote:
           | I'm making a relational lang (https://tablam.org) and this
           | was one of the question marks for me.
           | 
           | The #1 reason sum types was not presented is because "nobody"
           | thought (seriously) about that for long time. See how much
           | "null" have been with us and still you find defenders of it!
           | 
           | The second and more broad problem, is that the relational
           | model was NOT implemented by the RDBMS makers, not in full.
           | To make sum types work, you NEED to support embed relations
           | inside relations:                   rel Invoice         rel
           | DetailInvoice              Invoice[invoice_id:1 customer:123
           | lines: DetailInvoice[          [invoice_id:1 line_id:1
           | price:10 qty: 1],          [invoice_id:1 line_id:2 price:10
           | qty: 2]          ]
           | 
           | Then with this, do sum types comes from free:
           | type Option           Some ?           None
           | [customer_id:1  phone: [tag:Some 123-23456]]
           | [customer_id:2  phone: [tag:None]]
           | 
           | This will make things like storing Logs, metrics and event
           | sourcing so much nicer!
           | 
           | How store that is another matter, but I think is not that
           | complicated, just none of the major rdbms have put effort on
           | this.
           | 
           | This is one of my dreams: Making a rdbms with this and other
           | stuff that push the DBs to the modern way of make apps...
        
             | alexvoda wrote:
             | I am interested in opportunities to cooperate on this.
        
             | goto11 wrote:
             | > To make sum types work, you NEED to support embed
             | relations inside relations
             | 
             | If that is the case, sum types would be in violation of
             | first normal form.
        
             | cmrdporcupine wrote:
             | Frankly, understanding of what the relational model
             | actually _is_ has been missing from the industry even among
             | many people who have made data management their primary
             | focus. IMHO a lot of the penetration of the NoSQL stuff in
             | the last decade was based on misunderstandings of the
             | capabilities and potentials of the relational model.
             | 
             | I think the early history of SQL was too influenced by
             | ISAM-type stores and COBOL flatfiles and the like. Hence
             | the vocabulary around "tables" and "columns" instead of the
             | relations, relvars, tuples, etc. described by the actual
             | relational algebraic model. To many of those with an OO
             | fetish, "relational model" became synonymous with "tabular
             | model", and the industry's early 2000's obsession with
             | object polymorphism, OORMs middleware, etc continued the
             | stall in relational database innovation.
             | 
             | I feel like there is now renewed interest in this stuff,
             | though. Your project looks really neat, for example.
        
             | woggy wrote:
             | Weird that it pops up today as I was trying to find
             | something like this yesterday. Good luck with it!
        
           | goto11 wrote:
           | Probably because the same can already be expressed using
           | relations and foreign keys.
        
             | throwaway894345 wrote:
             | I don't think it can. I don't believe SQL allows you to
             | create a fkey column that points into many different
             | tables. You can create your own pseudo fkey column that the
             | database doesn't know is a pointer into other tables ("type
             | erasure") but then you have to recreate referential
             | integrity and so on yourself, and you still run into issues
             | querying against the table that has that fkey column in a
             | useful way.
        
               | goto11 wrote:
               | You don't need a FK to point to multiple tables. You
               | would have the reference in the other direction, from
               | multiple different tables to one table.
        
               | ithkuil wrote:
               | How do you ensure that only one row of only one table
               | references the row of that one table?
        
               | alexvoda wrote:
               | Theoretically, this could also be achived by allowing FKs
               | to Views. But I know of no system that allows this.
        
         | hikerclimber1 wrote:
         | Everything is subjective. Especially laws.
        
         | [deleted]
        
       | binkHN wrote:
       | Might be worthwhile to note this is copyright 2014 in the title.
        
       | rwhaling wrote:
       | Always an interesting read. A ton of insights, but I always find
       | it a bit hard to make sense of, and it feels so disconnected from
       | a modern distributed systems context.
       | 
       | In retrospect, I think a lot of the Spark SQL Dataframe workflow
       | comes pretty close to what D/Tutorial D aspire to - static
       | typing, functions on relations, imperative style but
       | fundamentally unordered, etc.; however, it's only a processing
       | system, not a storage system.
       | 
       | I have kept my distance from the "data lake" buzzword circles,
       | but maybe a transactional, Spark-based data lake does approximate
       | what Darwen/Date are going for? The only thing really missing
       | might be nested relations?
        
         | overkalix wrote:
         | Again, what's the difference between a data warehouse and a
         | data lake? Honest question.
        
           | snidane wrote:
           | Data lake: you own your files. Data warehouse: database owns
           | your files.
        
           | lrobinovitch wrote:
           | As I understand it, a data lake is a storage space for
           | unstructured data. A data warehouse is a storage + compute
           | layer, usually with data sourced from a data lake, that is
           | ready for querying. This understanding comes from the
           | description in this paper[1]
           | 
           | > To solve these problems, the second generation data
           | analytics platforms started offloading all the raw data into
           | data lakes: low-cost storage systems with a file API that
           | hold data in generic and usually open file formats, such as
           | Apache Parquet and ORC [8, 9]. This approach started with the
           | Apache Hadoop movement [5], using the Hadoop File System
           | (HDFS) for cheap storage. The data lake was a schema-on-read
           | architecture that enabled the agility of storing any data at
           | low cost, but on the other hand, punted the problem of data
           | quality and governance downstream. In this architecture, a
           | small subset of data in the lake would later be ETLed to a
           | downstream data warehouse (such as Teradata) for the most
           | important decision support and BI applications.
           | 
           | [1] http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf
        
             | overkalix wrote:
             | Reasonable enough I guess. Thanks!
        
           | cdcarter wrote:
           | In my experience, a data warehouse usually has an ETL process
           | at the beginning. Data comes in from disparate sources and on
           | a regular basis, it is ETLd into a shape that is ready to use
           | by the business. On the other hand, a data lake slurps in all
           | the data as soon as it is available, in whatever form it is
           | in. You have to process it into the business-consumable form
           | when you query/egress it, but you don't have to know your
           | dream schema up front.
        
       | titovodka wrote:
       | Im 2 years into my career. Is there a benefit in reading a
       | database book cover to cover? Book in question is Database
       | Systems by Garcia-Molina.
       | 
       | I've found the material so far interesting. But I'm not sure how
       | much value and how much information there is for me to possibly
       | retain.
        
         | SNosTrAnDbLe wrote:
         | I would also recommend "Database Internals: A Deep Dive into
         | How Distributed Data Systems Work" which gives a good overview
         | of the implementation of a database. I felt that closer to an
         | engineers perspective
        
         | tcard wrote:
         | While that would probably be very valuable, it will probably
         | come with lots of implementation details that won't really
         | matter to you unless you plan to write a database yourself.
         | 
         | I'd suggest this one: https://dataintensive.net/ Its intended
         | for _users_ of databases, that is, developers incorporating
         | databases (and other kinds of data systems) into their
         | applications. It gives just enough explanation on how those
         | systems work and what they can (and can't) provide, and also
         | gives a more general overview of what's available beyond
         | relational SQL databases.
        
         | mariocesar wrote:
         | You don't need to master it, but it serves as a map in your
         | head. It will bring you in, in the worst case, an intuition on
         | how things work. The "retain" part isn't an essential thing if
         | you have at least the hints. Even if they are few, they are far
         | more helpful than not having a clue at all.
         | 
         | I always recall two fundamental programming books and one DB
         | design book I read years ago. And still, for today, my mind
         | comes back to them, and it makes me faster when looking for
         | documentation or just googling things more properly :)
         | 
         | Knowing a complex subject, even not deeply, is always a
         | benefit.
        
           | zanethomas wrote:
           | you and i see i to i on that <g>
        
         | zanethomas wrote:
         | Yes.
         | 
         | I started programming long before the internet and read cover
         | to cover every computer book I could get my hands on.
         | Subsequent years had me prepared with a huge background of
         | ideas I could apply to problems I encountered.
         | 
         | However the depth of understanding I sought differed depending
         | upon the book.
         | 
         | Some books I dug into intensely. When the SICP came out I wrote
         | my own Scheme interpreter (using the information in SICP) just
         | so I could play with the examples.
         | 
         | Other books I just skimmed and indexed (in my memory) for
         | possible future reference. The 80s 12 volume Systems
         | Programming series by IBM is an extreme example.
         | 
         | Before the web I had accumulated 50 bankers boxes of computer
         | books that I had "read" cover to cover. When they were on my
         | shelves it often happened that when I had a problem to solve I
         | could remember which book(s) had relevant information. I was my
         | own search engine!
         | 
         | So, yes. If you are interested in databases start by skimming
         | the paper. You might decide to read for details or you might
         | not. Regardless, the more you know the more you can synthesize
         | solutions when needed.
        
         | setr wrote:
         | The first thing I try to do upon encountering any
         | technology/stack people recommend everywhere is try to figure
         | out what's wrong with it.
         | 
         | I then skim through anything I can find that's relevant, and
         | dig into the parts I didn't fully understand (or skimming
         | didn't fully answer), until I can provide a comprehensive
         | answer as to where it shouldn't be used.
         | 
         | In general, implementation details from this skimming is only
         | relevant as per the black-box abstraction -- I only care about
         | what workloads and impute the stack falls over on. The
         | implementation itself I only keep around as:
         | 
         | 1. I trust the looks of this implementation, so if I need it I
         | know where to start
         | 
         | 2. I can give a very superficial explanation of the
         | implementation, just enough to defend my answer as to why it
         | falls over in the workloads it falls over on.
         | 
         | So for databases, as an example, I care about the fact that
         | 
         | 1. it stores by row (because this is the key differentiator
         | from column-stores, and is the key as to why OLAP and OLTP have
         | different preferred workloads -- and why they can make
         | different optimizations)
         | 
         | 2. the indexes and roughly why they work/excel/fail (again,
         | their existence changes preferred workloads)
         | 
         | 3. That the SQL compiler produces a "plan", and this plan is
         | based on heuristics and estimates -- So it shouldn't be blindly
         | trusted. You can view the plan and review it.
         | 
         | 4. ACID, its guarantees, and implementation only as far as why
         | can't MongoDB and friends have it too (and why you might want
         | to drop it -- where does it only add overhead)
         | 
         | 5. Probably other stuff not on the top of my head
         | 
         | Everything is a trade-off. The goal is to know what trade-offs
         | you've actually made.
        
         | ryanchants wrote:
         | I would suggest something more directly applicable. Textbooks
         | are useful, but if you're looking for something to get the most
         | out of today:
         | 
         | SQL Performance Explained
         | 
         | Art of SQL
         | 
         | Database Design for Mere Mortals
        
           | RheingoldRiver wrote:
           | Huge +1 to Database Design for Mere Mortals. This was the
           | second database book I read (the first was SQL Antipatterns,
           | not a bad book but also not a good first book) and far and
           | away the most helpful. It took me 3 days to read, I had a
           | miserable flu after flying home from a vacation (pre-COVID),
           | and I was still able to follow the author without any issue.
           | I highly suggest a physical copy & writing in the margins,
           | underlining etc, though that's something I do in general so
           | YMMV if you don't get anything out of that.
           | 
           | As far as "technical" books go, it's very nontechnical, and
           | if you're already very experienced & good at normalizing it
           | maybe won't help you THAT much, but for someone who's at all
           | junior, I can't recommend it enough. The biggest takeaway
           | (beyond, "what's a normalized database look like") is to keep
           | questioning the real-life system you're working in & to ask
           | questions of stakeholders who understand it better than you,
           | but me saying that isn't at all doing it enough justice
           | compared to how well the author elaborates on the point.
           | (btw, Data And Reality is also good for this, though a lot
           | more theoretical in approach)
           | 
           | If you're _really_ new to databases you 'll want to
           | supplement this with something that does normalization more
           | formally, I don't have a great single suggestion for that. I
           | read two of C.J. Date's books, honestly I found them mind-
           | numbingly tedious to get through but I can't say I didn't
           | learn from them. But...I also can't really recommend them
           | because I was so bored by them that it made me stop reading
           | altogether for like 4 months because I was just putting off
           | finishing these books (LPT: always be in the middle of 2
           | books at the same time, unfortunately for me, both of my
           | books were these lol). I did also read Art of SQL, it was
           | both enjoyable & interesting, but maybe not what you're
           | looking for if you want something super structured that will
           | take you from point A to point B.
        
             | ryanchants wrote:
             | If you weren't aware, Database Design for Mere Mortals 25th
             | Anniversary Edition was released last December!
        
               | RheingoldRiver wrote:
               | I had seen that! Do you think it's worth buying & reading
               | the updated version? As much as I loved the original, I'm
               | not sure how much I'd get out of reading a book that's
               | 90% the same content at this point. Though tbf I guess I
               | wouldn't mind just supporting the author so maybe I'll
               | pick up a copy and skim it at least.
        
         | dgb23 wrote:
         | My advice: Yes. Get a good book and a interactive SQL editor.
         | Learn the concepts and try them out. Play, ask questions and
         | try to answer them. Skim over stuff that doesn't interest you
         | at the moment. Write stuff down.
         | 
         | Then after a while you'll encounter new concepts (a new book,
         | an online comment, an interesting tech talk etc.), have new
         | ideas or forget/get rusty on old ones, then you repeat the
         | above.
         | 
         | Continuous learning is part of the job IMO. I did the above
         | several times and will do them several times again in some
         | fashion or another.
         | 
         | For me it has had real value - repeatedly - that I didn't even
         | _know_ existed before I accidentally came across a situation
         | where I could apply some concept I learned this way.
         | 
         | Sure, the first time you apply something "in anger", you will
         | get to the nitty gritty details, which is _work_ but you cannot
         | even recognize an opportunity if you haven 't prepared your
         | mind for it and played around with ideas and concepts before -
         | except you have a significant time budget for this kind of
         | thing on the project, or a colleague who can introduce it, but
         | my recommendation is to not rely on that.
        
         | lioeters wrote:
         | I often see people on HN recommend the book, Designing Data-
         | Intensive Applications (2017). I've personally been chewing on
         | the material for a while now, gaining new insights.
         | 
         | Here's the table of contents:
         | https://www.oreilly.com/library/view/designing-data-intensiv...
         | 
         | It seems to cover roughly the same areas and range as the book
         | you mentioned, Database Systems: The Complete Book (2008).
         | http://infolab.stanford.edu/~ullman/dscb.html
        
           | mattdeboard wrote:
           | Yes, yes, yes, yes, yes! DDIA is amazing. It is _the_ premier
           | reference manual, as far as my brain and I are concerned.
           | It's the only "computer book" I keep next to my computer.
           | (Not that I reference it frequently enough to justify its
           | proximity; it's mostly so I can tell people "DDIA is the only
           | 'computer book' I keep next to my computer." But still...
           | it's great.)
        
         | olavgg wrote:
         | Not really, I've learned a lot from the PostgreSQL
         | documentation and various Stack Overflow answers.
         | 
         | Like with everything, getting experience and your hands dirty
         | will make you far more knowledgeable. There is no silver
         | bullet.
        
       | andrewingram wrote:
       | Ah yes, the database textbook from my old uni. I think I managed
       | to get a good grade on databases without ever buying it. But I
       | remember the lecture content based on it being quite interesting.
        
         | mpalfrey wrote:
         | Another ex Warwick person here!
        
           | benjiweber wrote:
           | I'm sure this article will attract the attention those of us
           | who were taught by Hugh Darwen.
        
             | struct wrote:
             | I have a Darden-signed copy - one of my proudest
             | possessions!
        
       ___________________________________________________________________
       (page generated 2021-10-04 23:01 UTC)