https://github.com/latex3/tagging-project/discussions/72

Skip to content

Navigation Menu

Toggle navigation
 
Sign in

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        GitHub Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Enterprise
      +  
        Enterprise platform
        AI-powered developer platform
    Available add-ons
      +  
        Advanced Security
        Enterprise-grade security features
      +  
        GitHub Copilot
        Enterprise-grade AI features
      +  
        Premium Support
        Enterprise-grade 24/7 support
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
latex3 / tagging-project Public

  * Notifications You must be signed in to change notification
    settings
  * Fork 2
  * Star 15

  * Code
  * Issues 57
  * Pull requests 0
  * Discussions
  * Actions
  * Projects 0
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Projects
  * Security
  * Insights

WTPDF / PDF/UA-2 Examples by the LaTeX Project #72

davidcarlisle started this conversation in General
WTPDF / PDF/UA-2 Examples by the LaTeX Project #72
@davidcarlisle davidcarlisle
Mar 25, 2024 * 4 comments * 13 replies
Return to top
Discussion options

  * 

{{title}}

Something went wrong.
Quote reply
edited

  * 

{{editor}}'s edit

{{actor}} deleted this content .

{{editor}}'s edit

Something went wrong.

[126]
davidcarlisle
Mar 25, 2024
Maintainer

-

WTPDF / PDF/UA-2 Examples by the LaTeX Project

The following files demonstrate various aspects of Well Tagged PDF
documents conforming to PDF/UA-2.

They were all generated with LuaLaTeX (lualatex-dev in TeX Live 2024
).

The files are a mixture of small examples demonstrating specific
features, older out of copyright documents that have been re-typeset
as tagged pdf, and contemporary documents including recently
published arXiv papers, course notes, and conference papers.

---------------------------------------------------------------------

Access to the Files

The full collection of PDF files is available at Google Drive, where
you may select one or more individual files to download, or, at the
top of the page is a Download all link which will generate a zip file
and download the full collection.

Google drive directory of all example PDF files

At the present time we are not distributing the modified TeX sources
that generate these tagged examples although, where appropriate, we
do link to the original files used as source material.

---------------------------------------------------------------------

Verification of PDF/UA-2 compliance

There are not yet many validators that correctly handle UA-2 (given
that the standard was released in March 2024 not that
surprisingly). One online validator you can try on the smaller
examples is

VeraPDF -- PDF/A and PDF/UA Validation

Please note that some PDF viewers modify the PDF when opening it (to
allow for annotations, for example). In some cases this is known to
break the PDF/UA-2 standard. If that happens re-download and use a
different viewer.

---------------------------------------------------------------------

The Samples

Simple Examples with MathML Associated files

All three conform to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/
Reuse Arlington

Three small examples demonstrating the use of Associated Files to Tag
mathematics. Each formula is associated with two associated files. A
LaTeX fragment representing the original source, and a MathML
document.

mathml-AF-ex1

mathml-AF-ex2

Sample-AF-Math-LaTeX

amsmath LaTeX package documentation

Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse
Arlington

The amsmath package defines the main markup structures for
mathematics in LaTeX.

This manual has examples of many kinds of aligned equations and
similar structures. This version has been enhanced to produce Well
Tagged PDF.

amsldoc-tagged

tagpdf LaTeX package documentation

Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse
Arlington

The tagpdf LaTeX package is a core part of the LaTeX support for
tagged PDF.

Its documentation already conforms to WTPDF and PDF/UA-2 and a
snapshot is included here.

tagpdf

ArXiv publications

Tagged using MathML extracted from the arXiv-supplied html versions
of the documents.

They were each submitted to arXiv under a CC Licence permitting
re-use such as this experiment, The tagged documents are available
under the same licence.

Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse
Arlington

2401.09965v1-tagged -- Original Source

Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse
Arlington

2401.09436v1-tagged -- Original Source

Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse
Arlington

2401.05361v1-tagged -- Original Source

Niels Bohr: The Theory of Spectra and Atomic Constitution; Three
Essays

Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse
Arlington

These essays by Niels Bohr are available as LaTeX source from The
Project Gutenberg.

Additional TeX markup has been added to produce Tagged PDF. Also all
math expressions were converted to MathML using LaTeXML.

47464-t-tagged -- Original Source

William Shakespeare: MACBETH

Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse
Arlington

macbeth-tagged -- Original Source

This document uses a provided LaTeX source of the play text. The
LaTeX markup has been enhanced to produce Well Tagged PDF.

American Standard Version of the Bible (1901 text)

Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse
Arlington

The plain text source of the ASV Bible, 1901 as provided by
Wikisource. This has been marked up as LaTeX to generate well tagged
PDF. This example demonstrates a custom role map with structured
tagging corresponding to the Testament/Book/Chapter/Verse structure
shown in this work.

ASV Bible -- Original Source

DEIMS 2024 Conference paper

Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse
Arlington

The paper Enhancing LATEX to Automatically Produce Tagged and
Accessible PDF submitted to DEIMS 2024, Tokyo.

As well as describing the approach to PDF tagging used for these
examples, the paper does itself form an example of tagging a
contemporary conference paper. This is the version as prepared for
the TeX Users Group publication, TUGBoat.

tb139mitt-deims24

The presentation at the DEIMS conference including a demonstration is
available as a video.

PDF Association sample poster

Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse
Arlington

An article describing the PDF Association work on accessibility
produced for the PDF Association launch of Well Tagged PDF.

pdfa-art

Sample Chemistry/Math notes

This is a small contemporary document used as notes on mathematical
aspects of Chemistry.

In this example, the math is associated with just LaTeX source
Associated files, not MathML.

Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse
Arlington

525Da-23-group-theory

A small template exam paper.

Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse
Arlington

PHY-exam

Wilhelm Busch: Max and Moritz

Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse
Arlington

A LaTeX document that does not have math and the main language is not
English. Showing tagging of images, verse structures and the use of
more than one (marked up) language in a document.

pg17161-tagged -- Original Source

Beta Was this translation helpful? Give feedback.

4 You must be logged in to vote
All reactions

Replies: 4 comments * 13 replies

  * Oldest
  * Newest
  * Top

Comment options

  * 

{{title}}

Something went wrong.
Quote reply

[265]
petervwyatt
Mar 29, 2024

-

Awesome work! But I did note 2 files with errors and a few other
issues:

  * for 2401.09965v1-tagged.pdf

      + There are 2 x Ref objects (objects 2682 and 2705) in the
        structure tree which are not structure element dictionaries
        but file specification dictionaries (for associated files).
        This is incorrect. Table 355 requires Ref to be structure
        elements. This is an error.
  * for PHY-exam.pdf

      + Page 3, Widget annot for the button is missing AP (appearance
        stream info), as required in PDF 2.0. See Table 166 in ISO
        32000-2:2020.
  * for 2401.09436v1-tagged.pdf

      + has deprecated ProcSets a few times (not technically an issue
        as future PDF/A-4 dated revision will permit deprecated
        features, but could save some space).
      + Has Type 1 FontDescriptor/CharSet a few times - also
        deprecated in PDF 2.0 (not technically an issue as future PDF
        /A-4 dated revision will permit deprecated features, but
        could save some space).
      + in Adobe Acrobat, all the embedded Mathml XML files show as
        "Size = 0.00 bytes" - if you set the Size entry in the F/UF
        Params dictionary then I think the correct size will display
        in Acrobat's file list nav pane. You might want to consider
        adding if that works...

Just for discussion: several files have private PTEX entries for
XObjects, etc. such as PTEX.FileName and PTEX.InfoDict which can
include author, filename, etc as per the pdfTEX documentation (https:
//texdoc.org/serve/pdftex-a.pdf/0). Since PDF/A files are intended
for long-term preservation, this has the potential to cause issues
for FOIA and similar requests since the presence of private data
might slip past various redaction workflows. A modern equivalent is
to use an XMP Metadata stream instead of 2nd class custom PDF keys
which makes this more discoverable.

Beta Was this translation helpful? Give feedback.

1 You must be logged in to vote
All reactions

7 replies
Show 2 previous replies
@u-fischer
Comment options

  * 

{{title}}

Something went wrong.
Quote reply

u-fischer Mar 29, 2024
Maintainer

-

@petervwyatt Thanks for the report.

I will add an appearance to PHY-exam.pdf.

But while testing that I got two curious complains from arlington (I
used the lastest verapdf version) from the attached pdf:

  * It complained that a DA key is missing. Why? There is no variable
    text, the text is inside the appearance.
  * It complained that the Ff bitmask is wrong for a radio field. Why
    that? This is a pushbutton so naturally it has not a Radio
    bitmask.

test-utf8.pdf

Beta Was this translation helpful? Give feedback.

All reactions

@FrankMittelbach
Comment options

  * 

{{title}}

Something went wrong.
Quote reply

FrankMittelbach Mar 29, 2024
Maintainer

-

@davidcarlisle one question: does LaTeX not warn for
2401.09965v1-tagged.pdf that it needs one more run or was this simply
overlooked?

Beta Was this translation helpful? Give feedback.

All reactions

@davidcarlisle
Comment options

  * 

{{title}}

Something went wrong.
Quote reply

davidcarlisle Mar 29, 2024
Maintainer Author

-

@FrankMittelbach yes it did warn but got lost in all the tagging
debug logging, so user error, me :( If you see the build script in
the sources it now grep's the log and re-runs as needed after an
update this morning so that shouldn't happen in future.

Beta Was this translation helpful? Give feedback.

All reactions

@petervwyatt
Comment options

  * 

{{title}}

Something went wrong.
Quote reply

petervwyatt Mar 30, 2024

-

@u-fischer I can't speak to veraPDF's Arlington implementation (I'm
not 100% sure which version they used and whether they augmented the
rules, or how they determine what each object is) but DA messages are
most likely related to PDF Errata #323 as this is not well specified.
When I ran against the latest Arlington PDF Model (in GitHub, using
my PoC C++ test harness) I only got the messages I listed above (+
other messages that I checked and confirmed as noise/limitations of
my implementation) - I did not get any DA or Ff messages..

Beta Was this translation helpful? Give feedback.

All reactions

@davidcarlisle
Comment options

  * 

{{title}}

Something went wrong.
Quote reply

davidcarlisle Mar 30, 2024
Maintainer Author

-

PHY-exam.pdf has been updated

Beta Was this translation helpful? Give feedback.

 1
All reactions

  *  1

Comment options

  * 

{{title}}

Something went wrong.
Quote reply
edited

  * 

{{editor}}'s edit

{{actor}} deleted this content .

{{editor}}'s edit

Something went wrong.

[109]
bdoubrov
Apr 10, 2024

-

    @u-fischer I can't speak to veraPDF's Arlington implementation
    (I'm not 100% sure which version they used and whether they
    augmented the rules, or how they determine what each object is)
    but DA messages are most likely related to PDF Errata #323 as
    this is not well specified. When I ran against the latest
    Arlington PDF Model (in GitHub, using my PoC C++ test harness) I
    only got the messages I listed above (+ other messages that I
    checked and confirmed as noise/limitations of my implementation)
    - I did not get any DA or Ff messages..

@u-fischer @petervwyatt

  * Ff bitmask error is indeed a bug in veraPDF implementation. It is
    fixed in the latest dev build 1.25.278
  * DA is still required for push buttons according to Arlington
    model: FieldBitPush.tsv. This might indeed change after PDF
    Errata #323 is resolved. But for the moment, as far as I
    understand, Arlington model follows Table 228 of ISO 32000, where
    DA is specified as required. Though I'm not sure if Push buttons
    fit into the category of fields that contain variable text.

Beta Was this translation helpful? Give feedback.

1 You must be logged in to vote
All reactions

0 replies
Comment options

  * 

{{title}}

Something went wrong.
Quote reply
edited

  * 

{{editor}}'s edit

{{actor}} deleted this content .

{{editor}}'s edit

Something went wrong.

[208]
ErroneousBosch
May 13, 2024

-

Speaking as an accessibility professional ( am no expert in LaTeX),
the dependence on VeraPDF is not wise. While it seems to be able to
verify that tags exist in a nominal structure, the quality and
usefulness of the actual tags and structure being generated
sub-standard. The lack of ActualText in math equations means this
fails to meet the PDF/UA-2 or WTPDF standards. Tables are very
baseline and primitive, without any header cells or scoping. Image
captions are not contained correctly and alt text seems to have some
issue where it is not being picked up by screen readers.

It is premature to claim any level of real compliance. All of these
issues are ones that there is no automatic checker for and can only
be picked up through human testing.

Beta Was this translation helpful? Give feedback.

1 You must be logged in to vote
All reactions

6 replies
Show 1 previous reply
@ErroneousBosch
Comment options

  * 

{{title}}

Something went wrong.
Quote reply
edited

  * 

{{editor}}'s edit

{{actor}} deleted this content .

{{editor}}'s edit

Something went wrong.

ErroneousBosch May 13, 2024

-

I'm in the process of gathering the information, and need time to
injest ISO 32000-2:2020. I work at an academic institution, and we
are approaching this from a policy/legal compliance standpoint, in
our case WCAG 2.1 AA and Section 508. More importantly, we have to
test for demonstrable accessibility which both the example files
above and files we generated with TeXLive 2024 do not meet.

Screen reader performance was especially poor, checking with Apple
VO, NVDA, and Adobe's own reader. That is honestly where the rubber
meets the road. Compliance to a standard that isn't implemented
anywhere isn't a useful compliance, especially if it means not
meeting real-world accessibility needs.

Like I said, I am gathering more useful details to submit in one or
more issues.

Beta Was this translation helpful? Give feedback.

All reactions

@josephwright
Comment options

  * 

{{title}}

Something went wrong.
Quote reply
edited

  * 

{{editor}}'s edit

{{actor}} deleted this content .

{{editor}}'s edit

Something went wrong.

josephwright May 13, 2024
Maintainer

-

    Compliance to a standard that isn't implemented anywhere isn't a
    useful compliance, especially if it means not meeting real-world
    accessibility needs.

True to some extent, but one issue in this area is that without good
examples (complex inputs meeting the agreed standards in terms of
structure), viewers, etc., will not be developed that can read them.
So simply saying 'target only what is readable now' doesn't really
work: current reader implementations have significant gaps in
coverage

Beta Was this translation helpful? Give feedback.

All reactions

@davidcarlisle
Comment options

  * 

{{title}}

Something went wrong.
Quote reply

davidcarlisle May 13, 2024
Maintainer Author

-

Note that this collection is specifically a collection of examples
for PDF/UA-2, that is the new PDF 2.0 based standard. It is
explicitly here to allow implementors of PDF consuming software to
have a collection of documents to test against. So while, to produce
an accessible document for end users today, you do need to target PDF
1.x and PDF/UA-1 this collection is to allow us to test PDF/UA-2
generation and allow consuming applications to have a set of PDF/UA-2
documents to test.

Beta Was this translation helpful? Give feedback.

All reactions

@u-fischer
Comment options

  * 

{{title}}

Something went wrong.
Quote reply

u-fischer May 14, 2024
Maintainer

-

"real-world accessibility" for eg math is currently not given anyway:
in 1.7 not as the standard has no support for it, in 2.0 not as the
implementations do not support it. Our files are meant to push
development forward. I just was at the PDF week where we used the
files to demonstrate and discuss the implementation problems.

Beta Was this translation helpful? Give feedback.

All reactions

@car222222
Comment options

  * 

{{title}}

Something went wrong.
Quote reply

car222222 May 18, 2024
Maintainer

-

@ErroneousBosch wrote:

    we are approaching this from a policy/legal compliance
    standpoint,
    in our case WCAG 2.1 AA . . .

This is relevant, since ensuring (as far as possible) that the PDF
output is compliant
with the latest WCAG level AA provisions is definitely important.
But (a very big BUT:-) there are some severe limitations on what can
be done
to achieve such compliance within the LaTeX processing, since:

 1. PDF is not a "web-technology", so that much of WCAG needs
    reinterpretation.
    Also, it may not be possible to achieve much "within PDF itself",
    since PDF is
    "only a format", and in many areas the PDF specification mandates
    very little
    concerning how processors and associated AT behave.
    Thus, very much unlike WCAG, the PDF standard does not prescribe
    how, and
    even whether, any processor (or associated AT) concretely
    implements anything:
    for example, most of its very detailed provisions concerning how
    to produce visual output are described only in terms of various
    models, and not in terms of actual
    physical actions, or specific code to be interpreted outside
    these abstract models.

 2. It is not feasible to apply many WCAG provisions to "the PDF
    format and PDF producers alone", since much of it largely
    concerns the capabilities and behaviour of consumer applications
    (i.e., how they interpret the format in order to present a PDF
    document to users); this includes, of course, their use of AT.

Therefore, for much that is important to compliance with WCAG
provisions, it is impossible for PDF production software alone to
ensure, or even provide support
for, these requirements.

We probably need to look at this, to see if it helps to improve our
WCAG compliance:
favicon.ico

    and Section 508.

And it would be unwise for us to go anywhere near supporting such
purely local "legal requirements"!

Beta Was this translation helpful? Give feedback.

All reactions

Comment options

  * 

{{title}}

Something went wrong.
Quote reply

[106]
FrankMittelbach
May 14, 2024
Maintainer

-

Am 14.05.24 um 01:54 schrieb ErroneousBosch:
More importantly, we have to test for demonstrable accessibility
which both the example files above and files we generated with
TeXLive 2024 do not meet.
The big question here is why are they not meeting it? Because there
are errors in them with respect to implementing UA-2 or because
consuming software up to now is not capable of properly handling PDF
2 structures yet? So far industry hasn't bothered much with the
improved structures provided by PDF 2 (and necessary for higher
quality accessibility) because there were (nearly) no documents that
used them --- so so why bother if there is no use case?
Screen reader performance was especially poor, checking with Apple
VO, NVDA, and Adobe's own reader. That is honestly where the rubber
meets the road. Compliance to a standard that isn't implemented
anywhere isn't a useful compliance, especially if it means not
meeting real-world accessibility needs.
all true, but if your road is currently a gravel surface with huge
holes in it, the question is: do you want to continue running over it
only with noisy tanks because everything else that would be a
comfortable car will break down, or do you strife for improving the
road? right now accessibility of PDFs is so poor because consuming
software is based on 1.7 and UA-1 + a lot of heuristics (which differ
from implementation to implementation and therefore also do not give
a good user experience over all). Now, by producing UA-2 docs you to
not get magically better accessibility, in fact you are likely to get
even worse, because consumer software handles the improved structures
badly or not at all and their heuristics fail with such documents.
But as it was pointed out, the goal of producing documents that
comply to the new (and better) standards, was to make showcases of
where in the current consumer software fails with PDF/UA-2 and this
way drive good implementations of the new standard in the consumer
software. With the ability of providing a corpus of complex documents
that meet PDF/UA-2, we are fairly confident that this could happen
and in fact we already see movements in this respect
Like I said, I am gathering more useful details to submit in one or
more issues.
please do, but also please keep in mind the purpose of the generated
documents, e.g., - things in which we go wrong should be improved on
our end to make the documents better - but things that go wrong in
consumer apps because they do not understand the standard, should
really (with some pressure) communicated by the community to the
vendors.

Beta Was this translation helpful? Give feedback.

1 You must be logged in to vote
All reactions

0 replies
Sign up for free to join this conversation on GitHub. Already have an
account? Sign in to comment
Category
 

General
Labels
None yet
8 participants
@davidcarlisle @josephwright @FrankMittelbach @u-fischer @bdoubrov 
@ErroneousBosch @car222222 @petervwyatt
Heading
Bold
Italic
Quote
Code
Link
---------------------------------------------------------------------
Numbered list
Unordered list
Task list
---------------------------------------------------------------------
Attach files
Mention
Reference
Menu

  * Heading
  * Bold
  * Italic
  * Quote
  * Code
  * Link
  * 
  * Numbered list
  * Unordered list
  * Task list
  * 
  * Attach files
  * Mention
  * Reference

Select a reply

Create a new saved reply
 1 reacted with thumbs up emoji  1 reacted with thumbs down emoji 
1 reacted with laugh emoji  1 reacted with hooray emoji  1 reacted
with confused emoji [?] 1 reacted with heart emoji  1 reacted with
rocket emoji  1 reacted with eyes emoji

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.