https://en.wikipedia.org/wiki/Overlapping_markup

Overlapping markup

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In markup languages and the digital humanities, overlap occurs when a
document has two or more structures that interact in a non-
hierarchical manner. A document with overlapping markup cannot be
represented as a tree. This is also known as concurrent markup.
Overlap happens, for instance, in poetry, where there may be a
metrical structure of feet and lines; a linguistic structure of
sentences and quotations; and a physical structure of volumes and
pages and editorial annotations.^[1]^[2]

[ ]

Contents

  * 1 History
  * 2 Properties and types
  * 3 Approaches and implementations
      + 3.1 Within hierarchical languages
          o 3.1.1 Multiple documents
          o 3.1.2 Milestones
          o 3.1.3 Joins
          o 3.1.4 Stand-off markup
          o 3.1.5 Challenges
      + 3.2 Special-purpose languages
          o 3.2.1 Historical formalisms
          o 3.2.2 Actively maintained standoff XML languages
      + 3.3 Graph-based formalisms
  * 4 Notes
  * 5 References

History[edit]

[220px-Frankenstein]
 
The structural differences between multiple editions of Frankenstein
have been analysed with overlapping techniques.^[3]

The problem of non-hierarchical structures in documents has been
recognised since 1988; resolving it against the dominant paradigm of
text as a single hierarchy (an ordered hierarchy of content objects
or OHCO) was initially thought to be merely a technical issue, but
has, in fact, proven much more difficult.^[4] In 2008, Jeni Tennison
identified markup overlap as "the main remaining problem area for
markup technologists".^[5] Markup overlap continues to be a primary
issue in the digital study of theological texts in 2019, and is a
major reason for the field retaining specialised markup formats--the
Open Scripture Information Standard and the Theological Markup
Language--rather than the inter-operable Text Encoding Initiative
-based formats common to the rest of the digital humanities.^[6]

Properties and types[edit]

A distinction exists between schemes that allow non-contiguous
overlap, and those that allow only contiguous overlap. Often, 'markup
overlap' strictly means the latter. Contiguous overlap can always be
represented as a linear document with milestones (typically
co-indexed start- and end-markers), without the need for fragmenting
a (logical) component into multiple physical ones. Non-contiguous
overlap may require document fragmentation. Another distinction in
overlapping markup schemes is whether elements can overlap with other
elements of the same kind (self-overlap).^[2]

A scheme may have a privileged hierarchy. Some XML-based schemes, for
example, represent one hierarchy directly in the XML document tree,
and represent other, overlapping, structures by another means; these
are said to be non-privileged.

Schmidt (2012) identifies a tripartite classification of instances of
overlap: 1. "Variation of content and structure", 2. "Overlay of
multiple perspectives or markup sets", and 3. "Overlap of individual
start and end tags within a single markup perspective"; additionally,
some apparent instances of overlap are in fact schema definition
problems, which can be resolved hierarchically. He contends that type
1 is best resolved by a system of multiple documents external to the
markup, but types 2 and 3 require dealing with internally.

Approaches and implementations[edit]

DeRose (2004, Evaluation criteria) identifies several criteria for
judging solutions to the overlap problem:

  * readability and maintainability,
  * tool support and compatibility with XML,
  * possible validation schemes, and
  * ease of processing.

Tag soup is, strictly speaking, not overlapping markup--it is
malformed HTML, which is a non-overlapping language, and may be
ill-defined. Some web browsers attempted to represent overlapping
start and end tags with non-hierarchical Document Object Models
(DOM), but this was not standardised across all browsers and was
incompatible with the innately hierarchical nature of the DOM.^[7]^
[8] HTML5 defines how processors should deal with such mis-nested
markup in the HTML syntax and turn it into a single hierarchy.^[9]
With XHTML and SGML-based HTML, however, mis-nested markup is a
strict error and makes processing by standards-compliant systems
impossible.^[10] The HTML standard defines a paragraph concept which
can cause overlap with other elements and can be non-contiguous.^[11]

SGML, which early versions of HTML were based on, has a feature
called CONCUR that allows multiple independent hierarchies to
co-exist without privileging any. DTD validation is only defined for
each individual hierarchy with CONCUR. Validation across hierarchies
is not defined by the standard. CONCUR cannot support self-overlap,
and it interacts poorly with some of SGML's abbreviatory features.
This feature has been poorly supported by tools and has seen very
little actual use; using CONCUR to represent document overlap was not
a recommended use case, according to a commentary by the standard's
editor.^[12]^[13]

Within hierarchical languages[edit]

There are several approaches to representing overlap in a
non-overlapping language.^[14] The Text Encoding Initiative, as an
XML-based markup scheme, cannot directly represent overlapping
markup. All four of the below approaches are suggested.^[15] The Open
Scripture Information Standard is another XML-based scheme, designed
to mark up the Bible. It uses empty milestone elements to encode
non-privileged components.^[16]

To illustrate these approaches, marking up the sentences and lines of
a fragment of Richard III by William Shakespeare will be used as a
running example. Where there is a privileged hierarchy, the lines
will be used.

Multiple documents[edit]

Multiple documents can each provide different internally consistent
hierarchies. The advantage of this approach is that each document is
simple and can be processed with existing tools, but requires
maintenance of redundant content and it can be difficult to
cross-reference between different views.^[17] With multiple
documents, the overlap can be analysed with data comparison and delta
encoding techniques, and, in an XML context, specific XML tree
differencing algorithms are available.^[18]^[19]

Schmidt (2012, 3.5 Variation) recommends this approach for encoding
multiple variants of a single text and to accept the duplication of
the parts which do not vary, rather than attempting to create a
structure that represents all of the variation present; further, he
suggests that this alignment be performed automatically, and that
misalignment is rare in practice.^[20]

Example, with lines marked up:

  <line>I, by attorney, bless thee from thy mother,</line>
  <line>Who prays continually for Richmond's good.</line>
  <line>So much for that.--The silent hours steal on,</line>
  <line>And flaky darkness breaks within the east.</line>

With sentences marked up:

  <sentence>I, by attorney, bless thee from thy mother,
  Who prays continually for Richmond's good.</sentence>
  <sentence>So much for that.</sentence><sentence>--The silent hours steal on,
  And flaky darkness breaks within the east.</sentence>

Milestones[edit]

Milestones are empty elements that mark the beginning and end of a
component, typically using the XML ID mechanism to indicate which
"begin" element goes with which "end" element. Milestones can be used
to embed a non-privileged structure within a hierarchical language,
In their basic form they can only represent contiguous overlap.
Generic XML can of course parse the milestone elements, but do not
understand their special meaning and so cannot easily process or
validate the non-privileged structure.^[21]^[22]

Milestone have the advantage that the markup for overlapping elements
is located right at the relevant boundaries, like other markup. This
is an advantage for maintainability and readability.^[23] CLIX (
DeRose 2004) is an example of such an approach.

Example:

  <line><sentence-start />I, by attorney, bless thee from thy mother,</line>
  <line>Who prays continually for Richmond's good.<sentence-end /></line>
  <line><sentence-start />So much for that.<sentence-end /><sentence-start />--The silent hours steal on,</line>
  <line>And flaky darkness breaks within the east.<sentence-end /></line>

Punctuation and spaces have been identified as a type of
milestone-style 'crypto-overlap' or 'pseudo-markup', as the
boundaries of words, clauses, sentences and the like do not
necessarily align with the formal markup boundaries hierarchically.^
[24]^[25]

It is also possible to use more complex milestones to represent
non-contiguous structures. For example, TAGML's "suspend" and
"resume" semantic^[26] can be expressed using milestones, for example
by adding an attribute to indicate whether each milestone represents
a start, suspend, resume, or end point. Re-ordering and even
self-overlap can be achieved similarly, by annotating each milestone
with a "next chunk" reference.

Joins[edit]

Joins are pointers within a privileged hierarchy to other components
of the privileged hierarchy, which may be used to reconstruct a
non-privileged component akin to following a linked list. A single
non-privileged element is segmented into several partial elements
within the privileged hierarchy; the partial elements themselves do
not represent a single unit in the non-privileged hierarchy, which
can be misleading and make processing difficult.^[27]^[28] While this
approach can support some discontiguous structures, it is not able to
re-order elements.^[29] A slightly different approach can, however,
express re-ordering by expressing the join away from the content, at
the cost of directness and maintainability.^[30]

Join-based representations can introduce the possibility of cycles
between elements; detecting and rejecting these adds complexity to
implementations.^[31]

Example:

  <line><sentence id="a">I, by attorney, bless thee from thy mother,</sentence></line>
  <line><sentence continues="a">Who prays continually for Richmond's good.</sentence></line>
  <line><sentence id="b">So much for that.</sentence><sentence id="c">--The silent hours steal on,</sentence></line>
  <line><sentence continues="c">And flaky darkness breaks within the east.</sentence></line>

Stand-off markup[edit]

Stand-off markup is similar to using joins, except that there may be
no privileged hierarchy: each part of the document is given a label
(or might be referred to by an offset), and the document structure is
expressed by pointing to the content from markup that 'stands off'
from the content (possibly in an entirely different file), and might
contain no content itself. The TEI guidelines identify the unity of
the elements as a primary advantage of stand-off markup over joins,
in addition to the ability to produce and distribute annotations
separately from the text, possibly even by different authors applying
markup to a read-only document,^[32] allowing collaborative
approaches to markup by a divide and conquer strategy.^[33]

Example:

  <span id="a">I, by attorney, bless thee from thy mother,</span>
  <span id="b">Who prays continually for Richmond's good.</span>
  <span id="c">So much for that.</span><span id="d">--The silent hours steal on,</span>
  <span id="e">And flaky darkness breaks within the east.</span>
  ...
  <line contents="a" />
  <line contents="b" />
  <line contents="c d" />
  <line contents="e" />
  <sentence contents="a b" />
  <sentence contents="c" />
  <sentence contents="d e" />

It has been claimed that separating markup and text can result in
overall simplification and increased maintainability,^[34] and by
2017, ``[t]he current state of the art to [represent] (...)
linguistically annotated data is to use a graph-based representation
serialized as standoff XML as a pivot format'',^[35] i.e., that
standoff was the most widely accepted approach to address the
overlapping markup challenge.

Standoff formalisms have been the basis for an ISO standard for
linguistic annotation,^[36] they have been successfully applied for
developing corpus management systems,^[37] and (as of April 2020)
they are actively being developed in the TEI.^[38]

Challenges[edit]

Representing overlapping markup within hierarchical languages is
challenging, for reasons of redundancy and/or complexity. In the
2000s to 2010s, standoff formalisms were generally accepted as the
most promising approach here,^[35] but a disadvantage of standoff is
that validation is very challenging.^[39] Standoff formalisms are not
natively supported by database management systems, so that (by 2017)
it was suggested to ``use ... standoff XML as a pivot format (...)
and relational data bases for querying.''^[35] In practical
applications, this requires complicated architectures and/or
labor-intense transformation between pivot format and internal
representation. As a result, maintenance is problematic.^[40] This
has been a motivation to develop corpus management systems on the
basis of graph data bases and for using established graph-based
formalisms as pivot formats.

Special-purpose languages[edit]

For implementing the above-mentioned strategies, either existing
markup languages (such as the TEI) can be extended or special-purpose
languages can be designed. To design an entirely new markup language
allow to forego the tool support in existing languages for a less
complicated semantic model and more convenient syntax.

Historical formalisms[edit]

  * LMNL is a non-hierarchical markup language first described in
    2002 by Jeni Tennison and Wendell Piez, annotating ranges of a
    document with properties and allowing self-overlap. CLIX, which
    originally stood for 'Canonical LMNL In XML', provides a method
    for representing any LMNL document in a milestone-style XML
    document.^[41] It also has another XML serialisation, xLMNL.^[42]
  * MECS was developed by the University of Bergen's Wittgenstein
    Archive. However, it had several problems: it allowed some
    non-sensical documents of overlapping elements, it could not
    support self-overlap, and it did not have the capacity to define
    a DTD-like grammar.^[43] The theory of General Ordered-Descendant
    Directed Acyclic Graphs (GODDAGs), while not strictly a markup
    language itself, is a general data model for non-hierarchical
    markup. Restricted GODDAGs were designed specifically to match
    the semantics of MECS; general GODDAGs may be non-contiguous and
    need a more powerful language.^[44] TexMECS is a successor to
    MECS, which has a formal grammar and is designed to represent
    every GODDAG and nothing that is not a GODDAG.^[45]
  * XCONCUR (previously MuLaX) is a melding-together of XML and
    SGML's CONCUR, and also contains a validation language,
    XCONCUR-CL, and a SAX-like API.^[46]^[47]^[48]
  * Marinelli, Vitali and Zacchiroli provide algorithms to convert
    between restricted GODDAGs, ECLIX, LMNL, parallel documents in
    XML, contiguous stand-off markup and TexMECS.^[49]

None of these formalisms seem to be maintained anymore. Consensus
community seems to be to employ standoff XML or graph-based
formalisms.

Actively maintained standoff XML languages[edit]

  * GrAF-XML,^[50] standoff-XML serialization of the Linguistic
    Annotation Framework (LAF),^[36] used, e.g., for the American
    National Corpus^[51]
  * PAULA-XML,^[52] standoff-XML serialization of the data model
    underlying the corpus management system ANNIS and the converter
    suite SALT^[53]
  * NAF (NLP Annotation Format / Newsreader Annotation Format),^[54]
    standoff XML format originally developed in the NewsReader
    project (FP7, 2013-2015^[55]), currently used by NLP tools such
    as FreeLing^[56] (with support for English, Spanish, Portuguese,
    Italian, French, German, Russian, Catalan, Galician, Croatian,
    Slovene, etc.), and EusTagger^[57] (with support for Basque,
    English, Spanish).
  * The Charles Harpur Critical Archive is encoded using
    'multi-version documents' (MVD) to represent the variant versions
    of documents and as a means of indicating additions, deletions
    and revisions using a tactical combination of multiple documents
    and stand-off ranges within an underlying graph-based model. MVD
    is presented as an application file format, requiring specialised
    tools to view or edit.^[58]

Standoff approaches have two parts, commonly called the "content" and
the "annotations." These can be expressed in unrelated
representations. Simple standoff annotations per se, involve no more
than a list of (location, type) pairs. Thus, in a few applications^[
example needed] standoff annotations are expressed in CSV, JSON(-LD,
or other representations. (e.g., Web Annotation^[59]) or graph
formalisms grounded in string URIs (see below). However, representing
and validating content in such representations is much more difficult
and much less common.

Graph-based formalisms[edit]

Standoff markup employs a data model based on directed graphs,^[60]
thus complicating its representation when grounding markup
information in a tree. Representing overlapping hierarchies in a
graph eliminates this challenge. Standoff annotations can thus be
more adequately represented as generalised directed multigraphs and
use formalisms and technologies developed for this purpose, most
notably those based on the Resource Description Framework (RDF).^[61]
^[62] EARMARK is an early RDF/OWL representation that encompasses
General Ordered-Descendant Directed Acyclic Graphs (GODDAGs).^[14]
The theory of GODDAGs, while not strictly a markup language itself,
is a general data model for non-hierarchical markup.

RDF is a semantic data model that is linearization-independent, and
it provides different linearisations, including an XML format (RDF/
XML) that can be modeled to mirror standoff XML, a linearisation that
lets RDF be expressed in XML attributes (RDFa), a JSON format (
JSON-LD), and binary formats designed to facilitate querying or
processing (RDF-HDT,^[63] RDF-Thrift^[64]). RDF is semantically
equivalent to graph-based data models underlying standoff markup, it
does not require special-purpose technology for storing, parsing and
querying. Multiple interlinked RDF files representing a document or a
corpus constitute an example of Linguistic Linked Open Data.

An established technique to link arbitrary graphs with an annotated
document is to use URI fragment identifiers to refer to parts of a
text and/or document, see overview under Web annotation. The Web
Annotation standard provides format-specific `selectors' as an
additional means, e.g., offset-, string-match- or XPath-based
selectors.^[65]

Native RDF vocabularies capable to represent linguistic annotations
include:^[66]

  * Web Annotation^[67]
  * NLP Interchange Format (NIF)^[68]
  * LAPPS Interchange Format (LIF)^[69]

Related vocabularies include

  * POWLA, an OWL2/DL serialization of PAULA-XML^[70]
  * RDF-NAF, an RDF serialization of the NLP Annotation Format^[71]

In early 2020, W3C Community Group LD4LT has launched an initiative
to harmonize these vocabularies and to develop a consolidated RDF
vocabulary for linguistic annotations on the web.^[72]

Notes[edit]

 1. ^ Text Encoding Initiative.
 2. ^ ^a ^b DeRose 2004, The problem types.
 3. ^ Piez 2014.
 4. ^ Renear, Mylonas & Durand 1993.
 5. ^ Tennison 2008.
 6. ^ MoChridhe 2019.
 7. ^ Hickson 2002. sfn error: no target: CITEREFHickson2002 (help)
 8. ^ Sivonen 2003. sfn error: no target: CITEREFSivonen2003 (help)
 9. ^ HTML, SS 8.2.8 An introduction to error handling and strange
    cases in the parser.
10. ^ Sperberg-McQueen & Huitfeldt 2000, 2.1. Non-SGML Notations.
11. ^ HTML, SS 3.2.5.4 Paragraphs.
12. ^ Sperberg-McQueen & Huitfeldt 2000, 2.2. CONCUR.
13. ^ DeRose 2004, SGML CONCUR.
14. ^ ^a ^b Di Iorio, Peroni & Vitali 2009.
15. ^ Text Encoding Initiative, SS 20 Non-hierarchical Structures.
16. ^ Durusau 2006.
17. ^ Text Encoding Initiative, SS 20.1 Multiple Encodings of the Same
    Information.
18. ^ Schmidt 2009.
19. ^ La Fontaine 2016.
20. ^ Schmidt 2012, 4.1 Automating Variation.
21. ^ Text Encoding Initiative, SS 20.2 Boundary Marking with Empty
    Elements.
22. ^ Sperberg-McQueen & Huitfeldt 2000, 2.4. Milestones.
23. ^ DeRose 2004, TEI-style milestones.
24. ^ Birnbaum & Thorsen 2015.
25. ^ Haentjens Dekker & Birnbaum 2017.
26. ^ Dekker 2018. sfn error: no target: CITEREFDekker2018 (help)
27. ^ Text Encoding Initiative, SS 20.3 Fragmentation and
    Reconstitution of Virtual Elements.
28. ^ DeRose 2004, Segmentation.
29. ^ Sperberg-McQueen & Huitfeldt 2000, 2.5. Fragmentation.
30. ^ DeRose 2004, Joins.
31. ^ Schmidt 2012, 3.4 Interlinking.
32. ^ Text Encoding Initiative, SS 20.4 Stand-off Markup.
33. ^ Schmidt 2012, 4.2 Markup Outside the Text.
34. ^ Eggert & Schmidt 2019, Conclusion.
35. ^ ^a ^b ^c Ide et al. 2017, p.99.
36. ^ ^a ^b "Iso 24612:2012".
37. ^ Chiarcos et al. 2008.
38. ^ "Standoff: Annotation microstructure * Issue #1745 * TEIC/TEI".
    GitHub.
39. ^ Sperberg-McQueen & Huitfeldt 2000, 2.6. Standoff Markup.
40. ^ DeRose 2004, Standoff markup.
41. ^ DeRose 2004, CLIX and LMNL.
42. ^ Piez 2012.
43. ^ Sperberg-McQueen & Huitfeldt 2000, 2.7. MECS.
44. ^ Sperberg-McQueen & Huitfeldt 2000.
45. ^ Huitfeldt & Sperberg-McQueen 2003.
46. ^ Hilbert, Schonefeld & Witt 2005.
47. ^ Witt et al. 2007.
48. ^ Schonefeld 2008.
49. ^ Marinelli, Vitali & Zacchiroli 2008.
50. ^ "ISO GrAF".
51. ^ "Home". anc.org.
52. ^ https://www.sfb632.uni-potsdam.de/en/paula.html^[bare URL]
53. ^ Zipser, Florian (2016-11-18). "Salt". corpus-tools.org. doi:
    10.5281/zenodo.17557. Retrieved 2022-09-11. {{cite journal}}: 
    Cite journal requires |journal= (help)
54. ^ "NAF". GitHub. 30 June 2021.
55. ^ "Building structured event indexes of large volumes of
    financial and economic data for decision making". Community
    Research and Development Information Service (CORDIS).
56. ^ "Home - FreeLing Home Page". Archived from the original on
    2012-04-29. Retrieved 2020-04-06.
57. ^ "Text Analysis | HiTZ Zentroa".
58. ^ Eggert & Schmidt 2019.
59. ^ "Web Annotation Data Model".
60. ^ Ide & Suderman 2007.
61. ^ Cassidy 2010, cassidy.
62. ^ Chiarcos 2012, POWLA.
63. ^ "Home". rdfhdt.org.
64. ^ "RDF Binary using Apache Thrift".
65. ^ "Selectors and States".
66. ^ Cimiano, Philipp; Chiarcos, Christian; McCrae, John P.; Gracia,
    Jorge (2020). Linguistic Linked Data. Representation, Generation
    and Applications. Cham: Springer.
67. ^ Verspoor, Karin; Livingston, Kevin (2012). "Towards Adaptation
    of Linguistic Annotations to Scholarly Annotation Formalisms on
    the Semantic Web". Proceedings of the Sixth Linguistic Annotation
    Workshop, Jeju, Republic of Korea: 75-84. Retrieved 6 April 2020.
68. ^ "NLP Interchange Format (NIF) 2.0 - Overview and Documentation"
    .
69. ^ "LIF Overview".
70. ^ "POWLA". January 2022.
71. ^ "NLP Annotation Format | Background information on NAF".
72. ^ "Towards a consolidated LOD vocabulary for linguistic
    annotations". GitHub. 7 September 2021.

References[edit]

  * Birnbaum, David J; Thorsen, Elise (2015). "Markup and meter:
    Using XML tools to teach a computer to think about versification"
    . Proceedings of Balisage: The Markup Conference 2015. Balisage:
    The Markup Conference 2015. Vol. 15. Montreal. doi:10.4242/
    BalisageVol15.Birnbaum01. ISBN 978-1-935958-11-6.
  * Cassidy, Steve (2010). An RDF realisation of LAF in the DADA
    annotation server (PDF). Proceedings of ISA-5. Hong Kong.
    CiteSeerX 10.1.1.454.9146.
  * Chiarcos, Christian (2012). "POWLA: Modeling linguistic corpora
    in OWL/DL" (PDF). The Semantic Web: Research and Applications.
    Proceedings of the 9th Extended Semantic Web Conference (ESWC
    2012, Heraklion, Crete; LNCS 7295). Lecture Notes in Computer
    Science. Vol. 7295. pp. 225-239. doi:10.1007/978-3-642-30284-8_22
    . ISBN 978-3-642-30283-1. Retrieved 2016-05-24.
  * Chiarcos, Christian; Dipper, Stefanie; Gotze, Michael; Leser,
    Ulf; Ludeling, Anke; Ritz, Julia; Stede, Manfred (2008). "A
    flexible framework for integrating annotations from different
    tools and tagsets". Traitement Automatique des Langues. 49 (2):
    271-293.
  * Dekker, Ronald Haentjens; Bleeker, Elli; Buitendijk, Bram;
    Kulsdom, Astrid; Birnbaum, David J (2018). "TAGML: A markup
    language of many dimensions". Proceedings of Balisage: The Markup
    Conference 2018. Balisage: The Markup Conference 2018. Vol. 21.
    Rockville, MD. doi:10.4242/BalisageVol21.HaentjensDekker01. ISBN 
    978-1-935958-18-5.

  * DeRose, Steven (2004). Markup Overlap: A Review and a Horse.
    Extreme Markup Languages 2004. Montreal. CiteSeerX 
    10.1.1.108.9959. Retrieved 2014-10-14.
  * Di Iorio, Angelo; Peroni, Silvio; Vitali, Fabio (August 2009).
    "Towards markup support for full GODDAGs and beyond: the EARMARK
    approach". Proceedings of Balisage: The Markup Conference 2009.
    Balisage: The Markup Conference 2009. Vol. 3. Montreal. doi:
    10.4242/BalisageVol3.Peroni01. ISBN 978-0-9824344-2-0.
  * Eggert, Paul; Schmidt, Desmond A (2019). "The Charles Harpur
    Critical Archive: A History and Technical Report". International
    Journal of Digital Humanities. 1 (1). Retrieved 2019-03-25.
  * Haentjens Dekker, Ronald; Birnbaum, David J (2017). "It's more
    than just overlap: Text As Graph". Proceedings of Balisage: The
    Markup Conference 2017. Balisage: The Markup Conference 2017.
    Vol. 19. Montreal. doi:10.4242/BalisageVol19.Dekker01. ISBN 
    978-1-935958-15-4.
  * Durusau, Patrick (2006). OSIS Users Manual (OSIS Schema 2.1.1) 
    (PDF). Archived from the original (PDF) on 2014-10-23. Retrieved 
    2014-10-14.
  * Ian Hickson (2002-11-21). "Tag Soup: How UAs handle <x> <y> </x>
    </y>". Retrieved 2017-11-05.
  * Hilbert, Mirco; Schonefeld, Oliver; Witt, Andreas (2005). Making
    CONCUR work. Extreme Markup Languages 2005. Montreal. CiteSeerX 
    10.1.1.104.634. Retrieved 2014-10-14.
  * Huitfeldt, Claus; Sperberg-McQueen, C M (2003). "TexMECS: An
    experimental markup meta-language for complex documents".
    Archived from the original on 2017-02-27. Retrieved 2014-10-14.
  * Ide, Nancy; Chiarcos, Christian; Stede, Manfred; Cassidy, Steve
    (2017). "Designing Annotation Schemes: From Model to
    Representation". In Ide, Nancy; Pustejovsky, James (eds.).
    Handbook of Linguistic Annotation. Dordrecht: Springer. p. 99.
    doi:10.1007/978-94-024-0881-2_3. ISBN 978-94-024-0879-9.
  * La Fontaine, Robin (2016). "Representing Overlapping Hierarchy as
    Change in XML". Proceedings of Balisage: The Markup Conference
    2016. Balisage: The Markup Conference 2016. Vol. 17. Montreal.
    doi:10.4242/BalisageVol17.LaFontaine01. ISBN 978-1-935958-13-0.
  * Marinelli, Paolo; Vitali, Fabio; Zacchiroli, Stefano (January
    2008). "Towards the unification of formats for overlapping
    markup" (PDF). New Review of Hypermedia and Multimedia. 14 (1):
    57-94. CiteSeerX 10.1.1.383.1636. doi:10.1080/13614560802316145.
    ISSN 1361-4568. S2CID 16909224. Retrieved 2014-10-14.
  * MoChridhe, Race J (2019-04-24). "Twenty Years of Theological
    Markup Languages: A Retro- and Prospective". Theological
    Librarianship. 12 (1). doi:10.31046/tl.v12i1.523. ISSN 1937-8904.
    S2CID 171582852. Retrieved 2019-07-15.
  * Piez, Wendell (August 2012). "Luminescent: parsing LMNL by XSLT
    upconversion". Proceedings of Balisage: The Markup Conference
    2012. Balisage: The Markup Conference 2012. Vol. 8. Montreal. doi
    :10.4242/BalisageVol8.Piez01. ISBN 978-1-935958-04-8. Retrieved 
    2014-10-14.
  * Piez, Wendell (2014). Hierarchies within range space: From LMNL
    to OHCO. Balisage: The Markup Conference 2014. Montreal. doi:
    10.4242/BalisageVol13.Piez01.
  * Renear, Allen; Mylonas, Elli; Durand, David (1993-01-06).
    "Refining our Notion of What Text Really Is: The Problem of
    Overlapping Hierarchies". CiteSeerX 10.1.1.172.9017. hdl:2142/
    9407. Retrieved 2016-10-02. {{cite journal}}: Cite journal
    requires |journal= (help)
  * Schonefeld, Oliver (August 2008). A Simple API for XCONCUR:
    Processing concurrent markup using an event-centric API.
    Balisage: The Markup Conference 2008. Montreal. doi:10.4242/
    BalisageVol1.Schonefeld01. Retrieved 2014-10-14.
  * Sperberg-McQueen, C M; Huitfeldt, Claus (2000). "GODDAG: A Data
    Structure for Overlapping Hierarchies". Lecture Notes in Computer
    Science. 2023 (2023): 139-160. doi:10.1007/978-3-540-39916-2_12.
    ISBN 978-3-540-21070-2. Retrieved 2014-10-14.
  * Schmidt, Desmond (2009). "Merging Multi-Version Texts: A Generic
    Solution to the Overlap Problem". Merging Multi-Version Texts: a
    General Solution to the Overlap Problem. Balisage: The Markup
    Conference 2009. Proceedings of Balisage: The Markup Conference
    2009. Vol. 3. Montreal. doi:10.4242/BalisageVol3.Schmidt01. ISBN 
    978-0-9824344-2-0.
  * Schmidt, Desmond (2012). "The role of markup in the digital
    humanities". Historical Social Research. 27 (3): 125-146. doi:
    10.12759/hsr.37.2012.3.125-146.
  * Henri Sivonen (2003-08-16). "Tag Soup: How Mac IE 5 and Safari
    handle <x> <y> </x> </y>". Retrieved 2017-11-05.
  * Ide, Nancy; Suderman, Keith (2007). GrAF: A graph-based format
    for linguistic annotations (PDF). Proceedings of the First
    Linguistic Annotation Workshop (LAW-2007, Prague, Czech
    Republic). pp. 1-8. CiteSeerX 10.1.1.146.4543.
  * Tennison, Jenni (2008-12-06). "Overlap, Containment and
    Dominance". Retrieved 2016-10-02.
  * Witt, Andreas; Schonefeld, Oliver; Rehm, Georg; Khoo, Jonathan;
    Evang, Kilian (2007). On the Lossless Transformation of
    Single-File, Multi-Layer Annotations into Multi-Rooted Trees.
    Extreme Markup Languages 2007. Montreal. Retrieved 2014-10-14.
  * Text Encoding Initiative Consortium (16 September 2014).
    "Guidelines for Electronic Text Encoding and Interchange" (5 ed.)
    . Retrieved 2014-10-14.
  * WHATWG. "HTML Living Standard". Retrieved 2019-03-25.

*
Retrieved from "https://en.wikipedia.org/w/index.php?title=
Overlapping_markup&oldid=1123781167"
Categories:

  * Markup languages
  * Digital humanities
  * Open problems

Hidden categories:

  * Harv and Sfn no-target errors
  * All articles with bare URLs for citations
  * Articles with bare URLs for citations from September 2022
  * CS1 errors: missing periodical
  * All articles needing examples
  * Articles needing examples from July 2020

Navigation menu

Personal tools

  * Not logged in
  * Talk
  * Contributions
  * Create account
  * Log in

Namespaces

  * Article
  * Talk

[ ] English

Views

  * Read
  * Edit
  * View history

[ ] More

[                    ] [Search] [Go]
 

Navigation

  * Main page
  * Contents
  * Current events
  * Random article
  * About Wikipedia
  * Contact us
  * Donate

Contribute

  * Help
  * Learn to edit
  * Community portal
  * Recent changes
  * Upload file

Tools

  * What links here
  * Related changes
  * Upload file
  * Special pages
  * Permanent link
  * Page information
  * Cite this page
  * Wikidata item

Print/export

  * Download as PDF
  * Printable version

Languages

Add links

  * This page was last edited on 25 November 2022, at 17:19 (UTC).
  * Text is available under the Creative Commons
    Attribution-ShareAlike License 3.0 ; additional terms may apply.
    By using this site, you agree to the Terms of Use and Privacy
    Policy. Wikipedia(r) is a registered trademark of the Wikimedia
    Foundation, Inc., a non-profit organization.

  * Privacy policy
  * About Wikipedia
  * Disclaimers
  * Contact Wikipedia
  * Mobile view
  * Developers
  * Statistics
  * Cookie statement

  * Wikimedia Foundation
  * Powered by MediaWiki