https://reprog.wordpress.com/2010/09/03/bibliographic-data-part-2-dublin-cores-dirty-little-secret/

The Reinvigorated Programmer
Everything except sauropod vertebrae
Skip to content

  * Home
  * About
  * Software
  * Doctor Who
      + Doctor Who reviews
      + My book: The 11th Doctor
  * TV
  * Music
      + Heavy metal timeline
      + What I've been listening to in ...
      + Other writing about music
  * Reading
  * Cooking
  * Politics

[banner3]
- Bibliographic data, part 1: MARC and its vile progeny
Bibliographic data, part 3: Has anyone, anywhere, ever read the whole
of the RDA specification? -

Bibliographic data, part 2: Dublin Core's dirty little secret

Posted on September 3, 2010 | 22 Comments

[This is part two in a series -- you should read part 1 first for
context and then you might go on to part 3.]

The Dublin Core -- metadata made dumb

Just when librarians were in despair of ever getting their data out
to the world in a form it could understand, along came the Dublin
Core (DC for short) -- a simple set of fifteen metadata elements (
contributor, coverage, creator, date, description, format, identifier
, language, publisher, relation, rights, source, subject, title, and 
type) that could be used to describe "document-like objects" such as
books, journal articles and web pages.

Everyone in the library world got really excited about the Dublin
Core for about three weeks in 1999, before realising that you can't
actually do anything with those elements beyond expressing author
(called "creator"), title and date. Everything else was too vague to
be of any use -- coverage, anyone? Relation? Format?

If you don't believe me, try translating a reference to a journal
article into DC -- for example, this one that we used in the previous
article:

Taylor, Michael P. and Darren Naish. 2007. An unusual new neosauropod
dinosaur from the Lower Cretaceous Hastings Beds Group of East
Sussex, England. Palaeontology 50 (6): 1547-1564. doi:10.1111/
j.1475-4983.2007.00728.x

I can easily see how to map the author, date and articleTitle, but
not the journalTitle, volume, issue, startPage, endPage or DOI.  So
one third of the elements are representable.

The Dublin Core people quickly realised that while the fifteen core
elements are OK for describing web-pages (which is frankly what they
were designed for, despite all the "cross-domain" rhetoric), they
were not much use for describing, well, anything else. Not beyond the
absolute basics, anyway.

[2-thunfisc]

[Oh, and by the way: there was, and is, no standard XML format for
Dublin Core, merely guidelines for how to roll your own. Just in case
you were wondering. There are standard element names to use (e.g. 
<dc:title>) but no standard wrapper element to represent the record
as a whole.]

Qualified Dublin Core -- metadata made slightly less dumb

The solution to the paucity of Dublin Core elements was this thing
called "qualified Dublin Core" (although that term doesn't seem to be
used much any more), in which the fifteen core elements are qualified
to make them more specific -- for example, dateAccepted, dateAvailable
and dateCopyrighted are refinements of the core element date.
According to the Dublin Core's own dumb down principle, "a client
should be able to ignore any qualifier and use the value as if it
were unqualified [...] Qualification is therefore supposed only to
refine, not extend the semantic scope of an Element." Sounds good,
right?

Except:

  * There is still no canonical XML representation for Dublin Core
    records, only canonical XML element names for Dublin Core
    elements.
  * The XML representation of dateAccepted is not, as you might
    expect, <dc:date type="accepted"> but <dcterms:acceptedDate>,
    which means you can't implement the dumb down principle just by
    discarding qualifiers, you need to encode specific knowledge of
    how to map "qualified" to core DC elements in your application.
    In other words, "qualified Dublin Core" is not qualified at all.
  * The dcterms namespace has its own instances of the fifteen core
    elements, so when you want to add a contributor, you have to
    choose (how?) between <dc:contributor> and <dcterms:contributor>.

All of this, inexplicable though it may appear, would perhaps be
tolerable. Were it not for the core incompetence of the Dublin Core
model. And here at last we come to the promised Dirty Little Secret ...

[4-thunfisc]

Even qualified Dublin Core can't describe a journal article

When I first heard this, I flatly refused to believe it. It seemed
impossible that anyone could design a metadata element set for
describing documents and have it not able to describe a journal
article. But it is, amazingly, quite true. When I made my best effort
to render the reference above into Qualified Dublin Core, I found
that I was able to represent only one additional field (the DOI, and
that not very well) beyond the three basic elements (authors, date, 
title) that basic Dublin Core allowed me.

Critics, with the exception of Oscar Wilde, seem mostly to agree that
the death of Little Nell (in Dickens's Old Curiosity Shop) is one of
the saddest passages in literature. Personally, I lean more towards
the separation of Rose and Doctor at the end of Doomsday (you know,
before the They Can Never See Each Other Again Because The Path
Between Universes Has Closed Forever thing got downgraded to She
Can't Appear Again Until The Fourth Season Due To Other Work
Commitments). Others may cite the ending of Old Yeller or the
departure of the ring-bearers to the Undying Lands after the scouring
of the Shire. But for me, the most tragic document ever written is
Guidelines for Encoding Bibliographic Citation Information in Dublin
Core Metadata: four and half thousand words of desperate flailing
that could have been summarised as "Don't even bother trying, it just
doesn't work".

Turns out that the Qualified Dublin Core solution to the problem of
citing journal articles was to add -- get ready for this -- a 
bibliographicCitation element. Oh, joy! And so the introduction of
the Guidelines document concludes with the observation:

Before the introduction of the Dublin Core term
'bibliographicCitation' it was not obvious how to describe fully a
journal article using Dublin Core metadata. There was no suitable
Dublin Core property to capture the journal title, as distinct from
the article title, or the volume, issue and page details, other than
as part of a general description.

Thank heavens that's changed! Now, instead of shoving the journal
title, volume, issue and page details into an undifferentiated lump
of text in the description field, we can shove the journal title,
volume, issue and page details into an undifferentiated lump of text
in the bibliographicCitation field!

This, let me remind you, in a specification that includes SEVENTY
data elements -- the original fifteen core elements, plus 55 added in
Qualified DC, of which 15 are duplicates of the originals. And in
those 70 elements they couldn't make room for journal title?
Seriously?

[2-gebraten]

The official, sanctioned, allegedly interoperable encoding of my
perfectly simple article citation into Dublin Core

Here it is, folks, based on the Guide. Read it and weep:

<mikesMadeUpNamespace:article
xmlns:dc="http://purl.org/dc/elements/1.1/&#8221;
xmlns:dcterms="http://purl.org/dc/terms/&#8221;
xmlns:mikesMadeUpNamespace="whatever">
<dc:creator>Michael P. Taylor</dc:creator>
<dc:creator>Darren Naish</dc:creator>
<dcterms:issued>2007</dcterms:issued>
<dc:title>An unusual new neosauropod dinosaur
from the Lower Cretaceous Hastings Beds Group
of East Sussex, England.</dc:title>
<dcterms:isPartOf>urn:ISSN:0081-0239</dcterms:isPartOf>
<dc:publisher>Blackwell</dc:publisher>
<dc:type xxx="http://purl.org/dc/terms/DCMIType">Text</dc:type&gt;
<dcterms:bibliographicCitation>
Palaeontology 50(6), 1547-1564. (2007)
</dcterms:bibliographicCitation>
<dc:identifier>info:doi:10.1111/j.1475-4983.2007.00728.x</
dc:identifier>
</mikesMadeUpNamespace:article>

It makes me want to cry.

Note that:

  * There is still no standard XML format for Dublin Core records, so
    I had to make up my own wrapper element (which of course can't be
    in either of the two DC namespaces).
  * For the actual elements, I am supposed to use a mixture of
    elements from dc and dcterms namespaces.
  * The element containing the publication date is not called 
    publicationDate or datePublished, nor even issuedDate or 
    dateIssued, but just issued -- unlike, for example, dateSubmitted
    or dateAccepted.
  * The best I can do by way of trying to express the journal title
    is to use the dcterms:isPartOf element and give it the ISSN of
    the journal (wrapped up as a URI), in the hope that whoever uses
    this record will go and look that ISSN up to find out what
    journal it pertains to.
  * The publisher is considered an important part of the citation
    (unlike, say, the journal title, volume, issue or page-range)
    despite the fact that journal-article citations never include the
    publisher.
  * It's considered important to state that the type of the
    referenced item is Text.
  * The type "Text" is drawn from a vocabulary whose URI is known (I
    got it from the Guide) but I couldn't figure out what XML
    attribute I am supposed to use to point to that URI.

And of course all of this is on top of the utterly baffling
brain-damage that is the bibliographicCitation element. And by the
way, if the sample bibliographicCitation above doesn't seem too
dreadful to you, then consider this sample Big Undifferentiated Blob
Of Text, straight from the Guide:

    Proceedings of the International Conference on Dublin Core and
    metadata for e-communities, 2002; DC-2002: Metadata for
    e-Communities: Supporting Diversity and Convergence, Florence,
    Italy, 13-17 October 2002, pp 71-80

[2-kaviar-f]

bibliographicCitation format: the pain, the glory, the other pain

But at least the client software can reliably parse the journal
title, volume, issue, start-page and end-page out of the 
bibliographicCitation, right?  I mean, it must be in a standard
format, right?

Right?

Viewers of a sensitive disposition might wish to look away now.

Here's what section 2.2 of the Guide says:

    Plain text citations may be according to a recognised citation
    style. Several styles were reviewed by the DCMI Citation Working
    Group, and are listed on a Citation Styles page, but there is no
    particular recommendation for choice of style.

And indeed the two sample bibliographicCitation examples above are in
noticably different formats even allowing that one is for a journal
article and the other for a paper in a proceedings volume -- for
example, the date is parenthesised in the former but not in the
latter.

Oh, and from section 2.1:

    Other details of the resource, such as its title and creators,
    will be described using the usual Dublin Core properties.
    Optionally, but redundantly, these details may be included in the
    citation as well.

In other words, any old crap can be shoved in the 
bibliographicCitation field.

So let's review: the official way to represent journal title, volume
number, issue number, start page and end page in the 70-element
Qualified Dublin Core set is: jam them, and quite possibly some other
data you happen to have lying around, together into a text blob in
any format you happen to feel like.

Of course, for a computer reading the XML to make any use of this
information, it will need to parse the bibliographicCitation to
figure out what the journal title, etc., are. But since that field
can contain any combination of elements in any format, any parser
will need to try all sorts of heuristics to match the format and
figure out which bits represent what data. Which of course is exactly
what you'd have to do if all you had to work with was the plain-text
citation that we started this article with, long, long ago.

To summarise: Qualified Dublin Core, with its 70 fields, is no more
useful for expressing journal-article citations than plain text.

Oh, am I shouting? Sorry.

[2-riesenga]

Appendix. Don't even get me started on the use of the OpenURL 1.0
(ANSI/NISO Z39.88) ContextObject KEV format as an alternative for the
content of the <bibliographicCitation> field

Having written that heading, I feel no need to expand further on it.

OK, I'm out of here. I need to take a shower.

Tune in next time for yet more pain.

Share this:

  * Twitter
  * Facebook
  * 

Like this:

Like Loading...

Related

This entry was posted in Culture, Formats, Frustration, Not my
favourite. Bookmark the permalink.
- Bibliographic data, part 1: MARC and its vile progeny
Bibliographic data, part 3: Has anyone, anywhere, ever read the whole
of the RDA specification? -

22 responses to "Bibliographic data, part 2: Dublin Core's dirty
little secret"

 1. Pingback: Bibliographic data, part 1: MARC and its vile progeny |
    The Reinvigorated Programmer

 2. [00ce1] Michel S. | September 5, 2010 at 10:52 am |

    One would think they could learn a thing or two from BibTeX ...

 3. Pingback: Bibliographic data, part 3: Has anyone, anywhere, ever
    read the whole of the RDA specification? | The Reinvigorated
    Programmer

 4. [86745] Chris Purcell | September 6, 2010 at 7:05 am |

    You're supposed to use RDF (1.0) to encode DC. That way, OWL
    gives you the subtype and equality information

 5. [512b3] Khairul | September 6, 2010 at 8:43 am |

    Could this be because your standard brick-and-mortar library
    catalogs physical items? I mean, it makes sense that a library
    can lend out its print copy of a journal issue, but how does a
    library lend out a journal article?

 6. [d7611] Mike Taylor | September 6, 2010 at 9:23 am |

    Thanks, all, for some interesting comments.

    Chris, I don't think bringing on RDF and OWL helps very much
    here, because the actual fieldnames just don't exist. By saying
    that a journal article isPartOf a journal issues, you can then
    use the dc:title element to specify the journal name, but (A)
    that doesn't help with the volume number, issues number, and
    page-range; and (B) it's not really right anyway, since that
    specific issue doesn't have a title, it's an instance of a
    journal that has a title. If you have to say that the article
    isPartOf and issue and that the issue isInstanceOf a journal just
    so you can give the journal title, then things have gone very,
    very wrong -- it's information architecture astronautics gone
    wild.

    Khairul, you are almost certainly right that the poor support for
    journal articles in library standards goes back to
    brick-and-mortar days. But (A) that excuse would have been less
    unacceptable in 2000 than it is now, and (B) that doesn't explain
    the lack of volume and issue fields, which you'd surely need for
    cataloguing hardcopy issues.

 7. [512b3] Khairul | September 6, 2010 at 10:00 am |

    Unless there exist journals that can have multiple issues in a
    single day, a date and title uniquely identifies a journal issue.
    For retrieval purposes, I suppose that's all is needed.

 8. [d7611] Mike Taylor | September 6, 2010 at 10:15 am |

    The date that is included in citations is almost always a year
    alone -- not 12 March 1968, but just 1968. And the great majority
    of journals publish more than one issue a year, so "date" as
    given is certainly not an adequate substitute for volume and
    issue.

 9. [512b3] Khairul | September 6, 2010 at 1:41 pm |

    To go from citation to journal issue, one could pull up all the
    issues for the year and count, but that does sound rather
    inefficient.

10. [86745] Chris Purcell | September 7, 2010 at 12:33 am |

    Mike: Sure, I appreciate RDF/OWL don't solve your main problem,
    just wanted to address some of your sub-problems!

11. Pingback: Datos bibliograficos, parte 1: MARC y su vil progenie |
    BaDoc

12. [e42ec] Douglas Campbell | November 14, 2010 at 10:55 pm |

    Looking at the background discussions, the DC Citation Guidelines
    were developed in response to the question: How do I quickly add
    the journal details into an article's DC record? It seems no one
    has ever actually said to DC that they need to encode a full
    journal article citation - maybe you should?

    Since those guidelines came out, a couple of schemas have come
    out that play nicely with DC - Bibliographic Ontology http://
    bibliontology.com/ and PRISM http://www.prismstandard.org/ - they
    might do the trick?

13. Pingback: PSNC Digital Libraries Team >> Dublin Core's dirty
    little secret

14. [f99c6] Andy Powell | December 15, 2010 at 6:23 pm |

    Sorry... I'm rather late to this.

    It's very funny... but I think you are confused. (Being confused
    about DC is fine by the way... most people are (including me)).

    DC has evolved to be used as an RDF vocabulary.

    It didn't start out that way of course... because the original 15
    'elements' pre-dated RDF. It started out as a set of 'attributes'
    to be used in the HTML meta tag, flirted briefly with XML (not
    least in the form of the OAI-PMH) and finally emerged into the
    butterfly of an RDF vocabulary.

    I use the word 'butterfly' loosely.

    To complain that DC doesn't work as an XML language is like
    complaining that concrete makes bad cakes. You're right... but so
    what!?

    Part of the reason you are confused is because DCMI itself is
    confused. The long history of DC has left a lot of people with
    differing views about where DC sits in that HTML/XML/RDF
    spectrum. Indeed, many people 'inside' the DC camp consider that
    DC should function across the whole piste. The trouble is, in
    trying to do so DC becomes jack of all trades, master of none.

    My view (a view that is shared by a few others but that is also
    violently disagreed with by many others) is that DC now has to be
    viewed pretty much solely as an RDF vocabulary. The properties
    are now declared using RDFS for example. If viewed in that way...
    many of your complaints above disappear. Sure, DC doesn't have
    all the properties necessary to capture a full citation. So what?
    It was never intended to. The whole point of RDF is that others
    can come up with such a set of properties and use them
    inter-mixed with DC (or on their own) as necessary.

    Small pieces loosely joined and all that.

    All IMHO of course.

    (BTW, I hate bibliographicCitation as well).

15. [d7611] Mike Taylor | December 16, 2010 at 1:00 pm |

    I don't know, Andy. This "it's just for RDF" thing sounds like
    post-hoc pleading to me. There is literally nothing about RDF on
    the Dublin Core home page http://dublincore.org/ -- if that really
    was its whole raison d'etre, wouldn't you expect to see it at
    least mentioned?

    And let's not forget that even if you leap into the RDF swamp and
    model stuff like journal title as the title of a separate "The
    Journal" object that is linked to the article using isPartOf,
    that still doesn't get you anywhere near everything you need.
    Even a fully RDF-bedrangled bibliographic reference would be
    missing such core information as the volume, issue and
    page-range.

    All of this comes from the horrible tendency of library
    scientists such as your good self and, well, me, to want to model
    everything. This always -- always results in confusion, yet we
    never seem to learn: OpenURL 1.0, RDA, FRBR, Dublin Core/RDF
    scholarly references ... All that infrastructure, all that learning
    curve, and still we can't do the trivial thing that the RIS
    format has been happily doing since the dawn of time:

    TY  - JOUR
    AU  - Taylor, Michael P.
    AU  - Naish, Darren
    PY  - 2007
    TI  - An unusual new neosauropod dinosaur from the Lower Cretaceous Hastings Beds Group of East Sussex, England
    JO  - Palaeontology
    VL  - 50
    IS  - 6
    SP  - 1547
    EP  - 1564
    ID  - doi:10.1111/j.1475-4983.2007.00728.x
    ER  -

    There -- was that really so hard? I'm not saying this is a good
    format, but it does at least allow you to Say What You Mean,
    Simply And Directly.

16. Pingback: Semantic mapping is hard | The Reinvigorated Programmer

17. Pingback: links for 2010-10-29 << sonofbluerobot

18. [me3_n] Dan Brickley (@danbri) | May 7, 2013 at 1:16 pm |

    How about this? https://gist.github.com/danbri/5532479

19. [d7611] Mike Taylor | May 7, 2013 at 11:44 pm |

    Har!

20. Pingback: spitting my tea and thinking of Grant Campbell. |
    librarian @large

21. [cb45e] Mike Curtis | November 22, 2015 at 5:36 pm |

    Best written, most depressing article I've read about libraries
    and metadata, sigh!

22. [d7611] Mike Taylor | November 22, 2015 at 5:38 pm |

    Thanks for those, I guess, kind words :-)

Leave a Reply Cancel reply

Enter your comment here...
[                    ]

Fill in your details below or click an icon to log in:

  *  
  *  
  * 
  *  
  *  

Gravatar
Email (required) (Address never made public)
[                    ]
Name (required)
[                    ]
Website
[                    ]
WordPress.com Logo

You are commenting using your WordPress.com account. ( Log Out / 
Change )

Google photo

You are commenting using your Google account. ( Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. ( Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. ( Log Out /  Change )

Cancel

Connecting to %s

[ ] Notify me of new comments via email.

[ ] Notify me of new posts via email.

[Post Comment] 

 [                                             ]
 [                                             ]
 [                                             ]
 [                                             ]
 [                                             ]
 [                                             ]
 [                                             ]
D[                                             ]

This site uses Akismet to reduce spam. Learn how your comment data is
processed.

  * Search for: [                    ] [Search]
  * RSS Feeds

      + RSS - Posts
      + RSS - Comments
  * Recent Posts

      + I AM A SITH
      + A conservatory on the cheap
      + An accidentally sensational pizza
      + What I've been reading lately, part 41
      + Hands up who enjoys feeling old!
  * Recent Comments

    [c9b] Simon on Metal Jester: the full st...
    [a1f] Richard G. Whitbread on Metal Jester: the full st...
    [d76] Mike Taylor on Are you one of the 10% of prog...
    [969] Mr. A on Are you one of the 10% of prog...
    [c9b] Simon on Metal Jester: the full st...
  * Archives

      + July 2021
      + June 2021
      + May 2021
      + April 2021
      + March 2021
      + January 2021
      + December 2020
      + November 2020
      + October 2020
      + July 2020
      + June 2020
      + May 2020
      + April 2020
      + March 2020
      + January 2020
      + December 2019
      + November 2019
      + October 2019
      + August 2019
      + July 2019
      + June 2019
      + May 2019
      + April 2019
      + March 2019
      + February 2019
      + January 2019
      + December 2018
      + November 2018
      + October 2018
      + September 2018
      + August 2018
      + July 2018
      + June 2018
      + May 2018
      + April 2018
      + March 2018
      + February 2018
      + January 2018
      + December 2017
      + November 2017
      + October 2017
      + September 2017
      + August 2017
      + July 2017
      + June 2017
      + May 2017
      + April 2017
      + March 2017
      + February 2017
      + January 2017
      + December 2016
      + November 2016
      + October 2016
      + September 2016
      + August 2016
      + July 2016
      + June 2016
      + May 2016
      + April 2016
      + March 2016
      + February 2016
      + January 2016
      + December 2015
      + November 2015
      + October 2015
      + September 2015
      + August 2015
      + July 2015
      + June 2015
      + May 2015
      + April 2015
      + March 2015
      + February 2015
      + January 2015
      + December 2014
      + November 2014
      + October 2014
      + September 2014
      + August 2014
      + July 2014
      + June 2014
      + May 2014
      + April 2014
      + March 2014
      + February 2014
      + January 2014
      + December 2013
      + November 2013
      + October 2013
      + September 2013
      + July 2013
      + June 2013
      + May 2013
      + April 2013
      + March 2013
      + February 2013
      + January 2013
      + December 2012
      + November 2012
      + October 2012
      + September 2012
      + June 2012
      + May 2012
      + April 2012
      + March 2012
      + February 2012
      + January 2012
      + December 2011
      + November 2011
      + October 2011
      + September 2011
      + August 2011
      + July 2011
      + June 2011
      + May 2011
      + April 2011
      + March 2011
      + February 2011
      + January 2011
      + December 2010
      + November 2010
      + October 2010
      + September 2010
      + August 2010
      + July 2010
      + June 2010
      + May 2010
      + April 2010
      + March 2010
      + February 2010
  * Doctor Who, Series 5

    Reviews of all episodes.
  * Greatest Hits

    In reverse chronological order:

      + C. S. Lewis on intelligence in Christianity
      + Programming Pearls
      + Steve Jobs "never had any designs. He has not designed a
        single project"
      + Another challenge: can you write a correct selection sort
      + Still hatin' on git: now with added Actual Reasons!
      + The long-overdue serious attempt at The Silmarillion, part 1:
        what it isn't and what it is
      + Writing correct code, part 1: invariants
      + Only 10% of programmers can write a binary search
      + The long-overdue serious attempt at Lisp, part 2: is Lisp
        just too hard?
      + The C Programming Language
      + The eleventh Doctor: first impressions
      + A brief, yet helpful, lesson on elementary resource-locking
        strategy
      + The hacker, the architect and the superhero
      + Programming the Commodore 64
      + Where Dijkstra went wrong
      + The Elements of Programming Style
      + Whatever happened to programming?, redux
      + Whatever happened to programming?
  * Most viewed today/yesterday

      + Bibliographic data, part 2: Dublin Core's dirty little secret
      + Bibliographic data, part 1: MARC and its vile progeny
      + Bibliographic data, part 3: Has anyone, anywhere, ever read
        the whole of the RDA specification?
      + I wish Jackson hadn't ruined Galadriel's speech
      + About
      + Software
      + All the cool kids are using JSON instead of XML
      + Whatever happened to programming?
      + Are you one of the 10% of programmers who can write a binary
        search?
      + Ages of Doctor Who actors on their debuts
  * 

    Art Bad habits Books Challenges Culture Doctor Who Europe Food
    and drink Frustration Games Heavy Metal timeline Life Me singing
    folk songs Movies Music Nostalgia Not my favourite Politics
    Programming Recipe Reviews Series 5 Series 6 Star Wars Sushi The
    Real World Train wrecks TV Uncategorized What I've been reading
    lately
  * Blog Stats

      + 2,681,931 hits

Blog at WordPress.com.
 

  

Loading Comments...
 
Write a Comment... [                    ]
Email (Required) [                    ] Name (Required)
[                    ] Website [                    ]
[Post Comment]

[Close and accept] Privacy & Cookies: This site uses cookies. By
continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie
Policy
%d bloggers like this:

[b]