https://github.com/latex3/tagging-project/discussions/72 Skip to content Navigation Menu Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + GitHub Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} latex3 / tagging-project Public * Notifications You must be signed in to change notification settings * Fork 2 * Star 15 * Code * Issues 57 * Pull requests 0 * Discussions * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Discussions * Actions * Projects * Security * Insights WTPDF / PDF/UA-2 Examples by the LaTeX Project #72 davidcarlisle started this conversation in General WTPDF / PDF/UA-2 Examples by the LaTeX Project #72 @davidcarlisle davidcarlisle Mar 25, 2024 * 4 comments * 13 replies Return to top Discussion options * {{title}} Something went wrong. Quote reply edited * {{editor}}'s edit {{actor}} deleted this content . {{editor}}'s edit Something went wrong. [126] davidcarlisle Mar 25, 2024 Maintainer - WTPDF / PDF/UA-2 Examples by the LaTeX Project The following files demonstrate various aspects of Well Tagged PDF documents conforming to PDF/UA-2. They were all generated with LuaLaTeX (lualatex-dev in TeX Live 2024 ). The files are a mixture of small examples demonstrating specific features, older out of copyright documents that have been re-typeset as tagged pdf, and contemporary documents including recently published arXiv papers, course notes, and conference papers. --------------------------------------------------------------------- Access to the Files The full collection of PDF files is available at Google Drive, where you may select one or more individual files to download, or, at the top of the page is a Download all link which will generate a zip file and download the full collection. Google drive directory of all example PDF files At the present time we are not distributing the modified TeX sources that generate these tagged examples although, where appropriate, we do link to the original files used as source material. --------------------------------------------------------------------- Verification of PDF/UA-2 compliance There are not yet many validators that correctly handle UA-2 (given that the standard was released in March 2024 not that surprisingly). One online validator you can try on the smaller examples is VeraPDF -- PDF/A and PDF/UA Validation Please note that some PDF viewers modify the PDF when opening it (to allow for annotations, for example). In some cases this is known to break the PDF/UA-2 standard. If that happens re-download and use a different viewer. --------------------------------------------------------------------- The Samples Simple Examples with MathML Associated files All three conform to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/ Reuse Arlington Three small examples demonstrating the use of Associated Files to Tag mathematics. Each formula is associated with two associated files. A LaTeX fragment representing the original source, and a MathML document. mathml-AF-ex1 mathml-AF-ex2 Sample-AF-Math-LaTeX amsmath LaTeX package documentation Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse Arlington The amsmath package defines the main markup structures for mathematics in LaTeX. This manual has examples of many kinds of aligned equations and similar structures. This version has been enhanced to produce Well Tagged PDF. amsldoc-tagged tagpdf LaTeX package documentation Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse Arlington The tagpdf LaTeX package is a core part of the LaTeX support for tagged PDF. Its documentation already conforms to WTPDF and PDF/UA-2 and a snapshot is included here. tagpdf ArXiv publications Tagged using MathML extracted from the arXiv-supplied html versions of the documents. They were each submitted to arXiv under a CC Licence permitting re-use such as this experiment, The tagged documents are available under the same licence. Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse Arlington 2401.09965v1-tagged -- Original Source Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse Arlington 2401.09436v1-tagged -- Original Source Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse Arlington 2401.05361v1-tagged -- Original Source Niels Bohr: The Theory of Spectra and Atomic Constitution; Three Essays Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse Arlington These essays by Niels Bohr are available as LaTeX source from The Project Gutenberg. Additional TeX markup has been added to produce Tagged PDF. Also all math expressions were converted to MathML using LaTeXML. 47464-t-tagged -- Original Source William Shakespeare: MACBETH Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse Arlington macbeth-tagged -- Original Source This document uses a provided LaTeX source of the play text. The LaTeX markup has been enhanced to produce Well Tagged PDF. American Standard Version of the Bible (1901 text) Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse Arlington The plain text source of the ASV Bible, 1901 as provided by Wikisource. This has been marked up as LaTeX to generate well tagged PDF. This example demonstrates a custom role map with structured tagging corresponding to the Testament/Book/Chapter/Verse structure shown in this work. ASV Bible -- Original Source DEIMS 2024 Conference paper Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse Arlington The paper Enhancing LATEX to Automatically Produce Tagged and Accessible PDF submitted to DEIMS 2024, Tokyo. As well as describing the approach to PDF tagging used for these examples, the paper does itself form an example of tagging a contemporary conference paper. This is the version as prepared for the TeX Users Group publication, TUGBoat. tb139mitt-deims24 The presentation at the DEIMS conference including a demonstration is available as a video. PDF Association sample poster Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse Arlington An article describing the PDF Association work on accessibility produced for the PDF Association launch of Well Tagged PDF. pdfa-art Sample Chemistry/Math notes This is a small contemporary document used as notes on mathematical aspects of Chemistry. In this example, the math is associated with just LaTeX source Associated files, not MathML. Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse Arlington 525Da-23-group-theory A small template exam paper. Conforms to: PDF/UA-2 PDF/A-4F WTPDF/Accessibility WTPDF/Reuse Arlington PHY-exam Wilhelm Busch: Max and Moritz Conforms to: PDF/UA-2 PDF/A-4 WTPDF/Accessibility WTPDF/Reuse Arlington A LaTeX document that does not have math and the main language is not English. Showing tagging of images, verse structures and the use of more than one (marked up) language in a document. pg17161-tagged -- Original Source Beta Was this translation helpful? Give feedback. 4 You must be logged in to vote All reactions Replies: 4 comments * 13 replies * Oldest * Newest * Top Comment options * {{title}} Something went wrong. Quote reply [265] petervwyatt Mar 29, 2024 - Awesome work! But I did note 2 files with errors and a few other issues: * for 2401.09965v1-tagged.pdf + There are 2 x Ref objects (objects 2682 and 2705) in the structure tree which are not structure element dictionaries but file specification dictionaries (for associated files). This is incorrect. Table 355 requires Ref to be structure elements. This is an error. * for PHY-exam.pdf + Page 3, Widget annot for the button is missing AP (appearance stream info), as required in PDF 2.0. See Table 166 in ISO 32000-2:2020. * for 2401.09436v1-tagged.pdf + has deprecated ProcSets a few times (not technically an issue as future PDF/A-4 dated revision will permit deprecated features, but could save some space). + Has Type 1 FontDescriptor/CharSet a few times - also deprecated in PDF 2.0 (not technically an issue as future PDF /A-4 dated revision will permit deprecated features, but could save some space). + in Adobe Acrobat, all the embedded Mathml XML files show as "Size = 0.00 bytes" - if you set the Size entry in the F/UF Params dictionary then I think the correct size will display in Acrobat's file list nav pane. You might want to consider adding if that works... Just for discussion: several files have private PTEX entries for XObjects, etc. such as PTEX.FileName and PTEX.InfoDict which can include author, filename, etc as per the pdfTEX documentation (https: //texdoc.org/serve/pdftex-a.pdf/0). Since PDF/A files are intended for long-term preservation, this has the potential to cause issues for FOIA and similar requests since the presence of private data might slip past various redaction workflows. A modern equivalent is to use an XMP Metadata stream instead of 2nd class custom PDF keys which makes this more discoverable. Beta Was this translation helpful? Give feedback. 1 You must be logged in to vote All reactions 7 replies Show 2 previous replies @u-fischer Comment options * {{title}} Something went wrong. Quote reply u-fischer Mar 29, 2024 Maintainer - @petervwyatt Thanks for the report. I will add an appearance to PHY-exam.pdf. But while testing that I got two curious complains from arlington (I used the lastest verapdf version) from the attached pdf: * It complained that a DA key is missing. Why? There is no variable text, the text is inside the appearance. * It complained that the Ff bitmask is wrong for a radio field. Why that? This is a pushbutton so naturally it has not a Radio bitmask. test-utf8.pdf Beta Was this translation helpful? Give feedback. All reactions @FrankMittelbach Comment options * {{title}} Something went wrong. Quote reply FrankMittelbach Mar 29, 2024 Maintainer - @davidcarlisle one question: does LaTeX not warn for 2401.09965v1-tagged.pdf that it needs one more run or was this simply overlooked? Beta Was this translation helpful? Give feedback. All reactions @davidcarlisle Comment options * {{title}} Something went wrong. Quote reply davidcarlisle Mar 29, 2024 Maintainer Author - @FrankMittelbach yes it did warn but got lost in all the tagging debug logging, so user error, me :( If you see the build script in the sources it now grep's the log and re-runs as needed after an update this morning so that shouldn't happen in future. Beta Was this translation helpful? Give feedback. All reactions @petervwyatt Comment options * {{title}} Something went wrong. Quote reply petervwyatt Mar 30, 2024 - @u-fischer I can't speak to veraPDF's Arlington implementation (I'm not 100% sure which version they used and whether they augmented the rules, or how they determine what each object is) but DA messages are most likely related to PDF Errata #323 as this is not well specified. When I ran against the latest Arlington PDF Model (in GitHub, using my PoC C++ test harness) I only got the messages I listed above (+ other messages that I checked and confirmed as noise/limitations of my implementation) - I did not get any DA or Ff messages.. Beta Was this translation helpful? Give feedback. All reactions @davidcarlisle Comment options * {{title}} Something went wrong. Quote reply davidcarlisle Mar 30, 2024 Maintainer Author - PHY-exam.pdf has been updated Beta Was this translation helpful? Give feedback. 1 All reactions * 1 Comment options * {{title}} Something went wrong. Quote reply edited * {{editor}}'s edit {{actor}} deleted this content . {{editor}}'s edit Something went wrong. [109] bdoubrov Apr 10, 2024 - @u-fischer I can't speak to veraPDF's Arlington implementation (I'm not 100% sure which version they used and whether they augmented the rules, or how they determine what each object is) but DA messages are most likely related to PDF Errata #323 as this is not well specified. When I ran against the latest Arlington PDF Model (in GitHub, using my PoC C++ test harness) I only got the messages I listed above (+ other messages that I checked and confirmed as noise/limitations of my implementation) - I did not get any DA or Ff messages.. @u-fischer @petervwyatt * Ff bitmask error is indeed a bug in veraPDF implementation. It is fixed in the latest dev build 1.25.278 * DA is still required for push buttons according to Arlington model: FieldBitPush.tsv. This might indeed change after PDF Errata #323 is resolved. But for the moment, as far as I understand, Arlington model follows Table 228 of ISO 32000, where DA is specified as required. Though I'm not sure if Push buttons fit into the category of fields that contain variable text. Beta Was this translation helpful? Give feedback. 1 You must be logged in to vote All reactions 0 replies Comment options * {{title}} Something went wrong. Quote reply edited * {{editor}}'s edit {{actor}} deleted this content . {{editor}}'s edit Something went wrong. [208] ErroneousBosch May 13, 2024 - Speaking as an accessibility professional ( am no expert in LaTeX), the dependence on VeraPDF is not wise. While it seems to be able to verify that tags exist in a nominal structure, the quality and usefulness of the actual tags and structure being generated sub-standard. The lack of ActualText in math equations means this fails to meet the PDF/UA-2 or WTPDF standards. Tables are very baseline and primitive, without any header cells or scoping. Image captions are not contained correctly and alt text seems to have some issue where it is not being picked up by screen readers. It is premature to claim any level of real compliance. All of these issues are ones that there is no automatic checker for and can only be picked up through human testing. Beta Was this translation helpful? Give feedback. 1 You must be logged in to vote All reactions 6 replies Show 1 previous reply @ErroneousBosch Comment options * {{title}} Something went wrong. Quote reply edited * {{editor}}'s edit {{actor}} deleted this content . {{editor}}'s edit Something went wrong. ErroneousBosch May 13, 2024 - I'm in the process of gathering the information, and need time to injest ISO 32000-2:2020. I work at an academic institution, and we are approaching this from a policy/legal compliance standpoint, in our case WCAG 2.1 AA and Section 508. More importantly, we have to test for demonstrable accessibility which both the example files above and files we generated with TeXLive 2024 do not meet. Screen reader performance was especially poor, checking with Apple VO, NVDA, and Adobe's own reader. That is honestly where the rubber meets the road. Compliance to a standard that isn't implemented anywhere isn't a useful compliance, especially if it means not meeting real-world accessibility needs. Like I said, I am gathering more useful details to submit in one or more issues. Beta Was this translation helpful? Give feedback. All reactions @josephwright Comment options * {{title}} Something went wrong. Quote reply edited * {{editor}}'s edit {{actor}} deleted this content . {{editor}}'s edit Something went wrong. josephwright May 13, 2024 Maintainer - Compliance to a standard that isn't implemented anywhere isn't a useful compliance, especially if it means not meeting real-world accessibility needs. True to some extent, but one issue in this area is that without good examples (complex inputs meeting the agreed standards in terms of structure), viewers, etc., will not be developed that can read them. So simply saying 'target only what is readable now' doesn't really work: current reader implementations have significant gaps in coverage Beta Was this translation helpful? Give feedback. All reactions @davidcarlisle Comment options * {{title}} Something went wrong. Quote reply davidcarlisle May 13, 2024 Maintainer Author - Note that this collection is specifically a collection of examples for PDF/UA-2, that is the new PDF 2.0 based standard. It is explicitly here to allow implementors of PDF consuming software to have a collection of documents to test against. So while, to produce an accessible document for end users today, you do need to target PDF 1.x and PDF/UA-1 this collection is to allow us to test PDF/UA-2 generation and allow consuming applications to have a set of PDF/UA-2 documents to test. Beta Was this translation helpful? Give feedback. All reactions @u-fischer Comment options * {{title}} Something went wrong. Quote reply u-fischer May 14, 2024 Maintainer - "real-world accessibility" for eg math is currently not given anyway: in 1.7 not as the standard has no support for it, in 2.0 not as the implementations do not support it. Our files are meant to push development forward. I just was at the PDF week where we used the files to demonstrate and discuss the implementation problems. Beta Was this translation helpful? Give feedback. All reactions @car222222 Comment options * {{title}} Something went wrong. Quote reply car222222 May 18, 2024 Maintainer - @ErroneousBosch wrote: we are approaching this from a policy/legal compliance standpoint, in our case WCAG 2.1 AA . . . This is relevant, since ensuring (as far as possible) that the PDF output is compliant with the latest WCAG level AA provisions is definitely important. But (a very big BUT:-) there are some severe limitations on what can be done to achieve such compliance within the LaTeX processing, since: 1. PDF is not a "web-technology", so that much of WCAG needs reinterpretation. Also, it may not be possible to achieve much "within PDF itself", since PDF is "only a format", and in many areas the PDF specification mandates very little concerning how processors and associated AT behave. Thus, very much unlike WCAG, the PDF standard does not prescribe how, and even whether, any processor (or associated AT) concretely implements anything: for example, most of its very detailed provisions concerning how to produce visual output are described only in terms of various models, and not in terms of actual physical actions, or specific code to be interpreted outside these abstract models. 2. It is not feasible to apply many WCAG provisions to "the PDF format and PDF producers alone", since much of it largely concerns the capabilities and behaviour of consumer applications (i.e., how they interpret the format in order to present a PDF document to users); this includes, of course, their use of AT. Therefore, for much that is important to compliance with WCAG provisions, it is impossible for PDF production software alone to ensure, or even provide support for, these requirements. We probably need to look at this, to see if it helps to improve our WCAG compliance: favicon.ico and Section 508. And it would be unwise for us to go anywhere near supporting such purely local "legal requirements"! Beta Was this translation helpful? Give feedback. All reactions Comment options * {{title}} Something went wrong. Quote reply [106] FrankMittelbach May 14, 2024 Maintainer - Am 14.05.24 um 01:54 schrieb ErroneousBosch: More importantly, we have to test for demonstrable accessibility which both the example files above and files we generated with TeXLive 2024 do not meet. The big question here is why are they not meeting it? Because there are errors in them with respect to implementing UA-2 or because consuming software up to now is not capable of properly handling PDF 2 structures yet? So far industry hasn't bothered much with the improved structures provided by PDF 2 (and necessary for higher quality accessibility) because there were (nearly) no documents that used them --- so so why bother if there is no use case? Screen reader performance was especially poor, checking with Apple VO, NVDA, and Adobe's own reader. That is honestly where the rubber meets the road. Compliance to a standard that isn't implemented anywhere isn't a useful compliance, especially if it means not meeting real-world accessibility needs. all true, but if your road is currently a gravel surface with huge holes in it, the question is: do you want to continue running over it only with noisy tanks because everything else that would be a comfortable car will break down, or do you strife for improving the road? right now accessibility of PDFs is so poor because consuming software is based on 1.7 and UA-1 + a lot of heuristics (which differ from implementation to implementation and therefore also do not give a good user experience over all). Now, by producing UA-2 docs you to not get magically better accessibility, in fact you are likely to get even worse, because consumer software handles the improved structures badly or not at all and their heuristics fail with such documents. But as it was pointed out, the goal of producing documents that comply to the new (and better) standards, was to make showcases of where in the current consumer software fails with PDF/UA-2 and this way drive good implementations of the new standard in the consumer software. With the ability of providing a corpus of complex documents that meet PDF/UA-2, we are fairly confident that this could happen and in fact we already see movements in this respect Like I said, I am gathering more useful details to submit in one or more issues. please do, but also please keep in mind the purpose of the generated documents, e.g., - things in which we go wrong should be improved on our end to make the documents better - but things that go wrong in consumer apps because they do not understand the standard, should really (with some pressure) communicated by the community to the vendors. Beta Was this translation helpful? Give feedback. 1 You must be logged in to vote All reactions 0 replies Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment Category General Labels None yet 8 participants @davidcarlisle @josephwright @FrankMittelbach @u-fischer @bdoubrov @ErroneousBosch @car222222 @petervwyatt Heading Bold Italic Quote Code Link --------------------------------------------------------------------- Numbered list Unordered list Task list --------------------------------------------------------------------- Attach files Mention Reference Menu * Heading * Bold * Italic * Quote * Code * Link * * Numbered list * Unordered list * Task list * * Attach files * Mention * Reference Select a reply Create a new saved reply 1 reacted with thumbs up emoji 1 reacted with thumbs down emoji 1 reacted with laugh emoji 1 reacted with hooray emoji 1 reacted with confused emoji [?] 1 reacted with heart emoji 1 reacted with rocket emoji 1 reacted with eyes emoji Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.