A11y for FPDF – Do It Yourself #31
jeanjann
started this conversation in
Show and tell
Replies: 2 comments 4 replies
-
|
You could do it as a trait |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
Thank you for sharing this guidance; I've published our Classic ASP port of FPDF with accessibility tags as a result of this information. I have a few questions about the
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
A11y for FPDF – Do It Yourself
A simple guide on how to use FPDF to create accessible PDFs with tables, lists, and images.
Currently, FPDF lacks the necessary functions to meet PDF/UA requirements such as
However, these can be added quite easily.
We’ll use two terms in this guide:
makePDF – the script containing the commands to create the PDF
FPDF – the library
This is a very pragmatic approach: we use only a few native FPDF functions and do not write new FPDF-style methods. Instead, we add one procedural 'spaghetti' function and patch FPDF’s output to create the structure tree.
We work with an ASCII PDF – it’s easier to read and edit ;)
It helps to be familiar with the general structure of a PDF and to review the example PDF provided (which is validated as PDF/UA in PAC). At the end of the text, you’ll find some resource links.
Adjustments in makePDF
We extend the cell() / multicell() calls with:
Standard tags are P, H1–H6, L, Figure, Artifact (depending on layout, also Table, TR, TD, TH, LI, Form, Annotation, Sect, Quote, Reference, Link, Caption, …). It can be useful to create custom tags, for example to mark P elements in a table header or a P inside a heading.
Additionally, in the Row() function create an array that stores the structure of tables:
table index, number of TR, number of TH/TD, number of text elements (multicell).
Also set metadata: Title, Author, Creator, Subject ($pdf->SetTitle …).
To allow editing the PDF in ASCII, set $pdf->Compression to false and initially do not embed fonts as binary streams
Adjustments in FPDF
FPDF only writes the simple sequence of physical text elements (with no structural information).
In an accessible PDF, a logical structure is added, with tags in text elements and a referencing hierarchical structure tree.
The output of each text element must be extended with:
Example of a tagged text element with the tag P (Paragraph), nesting may vary:
Where:
So extend the following native FPDF functions:
AddPage()
– extend to reset MCID counter to 0 for each page
Line(), Rect()
– extend with /Artifact
Cell(), MultiCell()
– add Tag parameter, insert MCID sequences
For borders and other non-content elements, add /Artifact
Special text elements can be marked with:
Image()
Use an array to store page number, MCID, and additional information (alt text, artifact) for each element
Output()
– extend to call a new function a11y()
The new function a11y()
Contains all commands to add the new structure tree and other changes.
Typically, we don’t use native FPDF methods here – instead, we directly edit/extend the already generated PDF ($this->buffer).
Steps:
Create new objects for the logical structure tree, with sequential numbers (++$this->n)
Reserve fixed numbers in advance for: Outlines, Title, Metadata, StructTreeRoot, ParentTree (/Nums), Document.
(# is a number, #Ref a number referencing to an object, #Next points to the next unused ParentTree key, depends on your structure)
add a language, a title reference, and referencing all root children (H1, H2, table – but not nested TR, TD …; list – but not nested list items):
Now loop through your arrays for each MCID and add objects.
If it's
a plain P or H1–H6 Tag:
add a matching object
a Table start (PT tag if you use it as an table start marker):
add referencing hierarchical objects for
StructElem /Table ->
..StructElem /TR ->
....StructElem /TH /TD ->
....../StructElem /P -> Content-Stream /P << /MCID 11 >> BDC … EMC
For better a11y in a complex table layout, add Scope and IDs to the TH and connect the TD to these headers (see the discussion below):
When using IDs, you must add an additional IDTree object (referenced from the StructTreeRoot), matching the IDs to these objects
... and a table summary.
a List start (PL tag):
add referencing hierarchical objects for
StructElem /L ->
StructElem /Lbl -> Content-Stream /Lbl << /MCID 11 >> BDC … EMC
StructElem /LBody ->
StructElem /P -> Content-Stream /P << /MCID 11 >> BDC … EMC
a figure:
Add alt text and bounding box of the image [xMin, yMin, xMax, yMax]:
After looping through all structure elements, write the /K array in the /Document object (all top-level un-nested elements: H1, H2, P, Table, L, Figure).
Add Special objects and extensions:
per-page indexed list of all semantic objects referencing a physical MCID:
(Metadata structure is flexible; in this example, it's based on an OpenOffice-generated PDF/UA)
a simple bookmark pointing to H1:
Insert all generated code before /xref in buffer
Recalculate xref: if an object with a /Length parameter (usually stream / endstream) was changed, update its length.
The xref table contains the count and byte offset of all objects – recalculate because we’ve added new ones (use e.g preg_match_all with PREG_OFFSET_CAPTURE for all object numbers)
In trailer
Done ;)
Use the included simple fairly accessible ASCII PDF as a reference for the structure.
Use the specified tools to find any errors in your code / PDF. PAC is notorious for issuing poor error messages and crashes quickly when encountering faulty PDFs. Therefore, it makes sense to write your own scripts for basic syntax / references checking.
a11y-demo2.pdf
this updated demo addresses some issues with /Headers - kudos clee-sudbury's comments
Further Information
Validation
PAC – PDF Accessibility Checker:
https://pac.pdf-accessibility.org/de
https://check.axes4.com/de
AxesPDF – PDF editor variant of PAC (demo)
https://www.axes4.com/de/software-services/axespdf
VeraPDF – CLI/GUI validator, choose your standard:
https://verapdf.org/software
pdfcpu – CLI PDF processor (Go):
https://pdfcpu.io
PDF-XChange Editor – free viewer/editor with a11y checks
https://www.pdf-xchange.de/pdf-xchange-viewer
AVE-Pdf, limited validation
https://avepdf.com/de/pdfa-validation
Tools - stream unpacking etc
mutool - CLI tool (part of MuPDF):
https://mupdf.readthedocs.io
QPDF – CLI editing tool:
https://qpdf.readthedocs.io
Ghostscript – CLI editing tool:
https://ghostscript.com
Tutorials
PDF Reference Version 1.7 - Adobe Open Source
https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf
Matterhorn-Protokoll – PDF/UA standard
https://pdfa.org/resource
What is PDF? Basics to Tagged PDF
https://news.speedata.de/2024/03/19/insidepdf-01
A Minimum Complete Tutorial of Portable Document Format (PDF) - Short overview
https://metebalci.com/blog/a-minimum-complete-tutorial-of-pdf-with-pdfsh
Introduction to tagged PDF files
https://de.overleaf.com/learn/latex/An_introduction_to_tagged_PDF_files%3A_internals_and_the_challenges_of_accessibility
Tagged PDF Best Practice Guide
https://pdfa.org/resource/tagged-pdf-best-practice-guide-syntax/
PDF Knowledge BasePDF/UA
https://www.pdflib.com/pdf-knowledge-base/pdfua/
Beta Was this translation helpful? Give feedback.
All reactions