A11y for FPDF – Do It Yourself #31

jeanjann · 2025-08-18T15:52:17Z

jeanjann
Aug 18, 2025

A11y for FPDF – Do It Yourself

A simple guide on how to use FPDF to create accessible PDFs with tables, lists, and images.

Currently, FPDF lacks the necessary functions to meet PDF/UA requirements such as

Tagged elements
Structure tree
Structural elements
Sufficient metadata

However, these can be added quite easily.

We’ll use two terms in this guide:
makePDF – the script containing the commands to create the PDF
FPDF – the library

This is a very pragmatic approach: we use only a few native FPDF functions and do not write new FPDF-style methods. Instead, we add one procedural 'spaghetti' function and patch FPDF’s output to create the structure tree.

We work with an ASCII PDF – it’s easier to read and edit ;)

It helps to be familiar with the general structure of a PDF and to review the example PDF provided (which is validated as PDF/UA in PAC). At the end of the text, you’ll find some resource links.

Adjustments in makePDF

We extend the cell() / multicell() calls with:

a parameter for the tag used (for all elements)
a parameter to mark elements as artifacts
for image(), a parameter for alt text (figure)

Standard tags are P, H1–H6, L, Figure, Artifact (depending on layout, also Table, TR, TD, TH, LI, Form, Annotation, Sect, Quote, Reference, Link, Caption, …). It can be useful to create custom tags, for example to mark P elements in a table header or a P inside a heading.

Additionally, in the Row() function create an array that stores the structure of tables:
table index, number of TR, number of TH/TD, number of text elements (multicell).

Also set metadata: Title, Author, Creator, Subject ($pdf->SetTitle …).

To allow editing the PDF in ASCII, set $pdf->Compression to false and initially do not embed fonts as binary streams

Adjustments in FPDF

FPDF only writes the simple sequence of physical text elements (with no structural information).
In an accessible PDF, a logical structure is added, with tags in text elements and a referencing hierarchical structure tree.

The output of each text element must be extended with:

Marked Content Identifier (MCID), starting at 0 for each object
Marked Content Operators – BDC / EMC

Example of a tagged text element with the tag P (Paragraph), nesting may vary:

/P <</MCID 0>> BDC
BT
/F1 12 Tf
100 700 Td
(Hello World) Tj
ET
EMC

Where:

/P <</MCID 0>> – Tag with MCID
BDC / EMC – start/end of marked content sequence
BT / ET – start/end of text block
/F1 12 Tf – font reference/size
100 700 Td – text position
(Hello World) Tj – text

So extend the following native FPDF functions:

AddPage()
– extend to reset MCID counter to 0 for each page

Line(), Rect()
– extend with /Artifact

  /Artifact <<>> BDC %.2F %.2F m %.2F %.2F l S EMC

Cell(), MultiCell()
– add Tag parameter, insert MCID sequences
For borders and other non-content elements, add /Artifact

Special text elements can be marked with:

  /Artifact << /Type /Header >>
  /Artifact << /Type /Pagination >>

Image()

add Tag parameter (/Figure)
Alternate Text parameter, and /Artifact parameter
add sequential MCID / BDC

Use an array to store page number, MCID, and additional information (alt text, artifact) for each element

Output()
– extend to call a new function a11y()

The new function a11y()

Contains all commands to add the new structure tree and other changes.
Typically, we don’t use native FPDF methods here – instead, we directly edit/extend the already generated PDF ($this->buffer).

Steps:

Collect all structure information

Use regex to parse the buffer for table and list structures (recognized by tags – it helps to use custom tags for start/end) and store information in array: table structure, MCIDs, object numbers, parents, content indexes.
Use regex to loop through all pages and kids, storing object number, parent, content, and a structparent index in an array.
Use regex to loop through all MCIDs, storing object number, tag, MCID, and parent in an array.

Create structure elements

Create new objects for the logical structure tree, with sequential numbers (++$this->n)
Reserve fixed numbers in advance for: Outlines, Title, Metadata, StructTreeRoot, ParentTree (/Nums), Document.

Create a StructTreeRoot object (with mapping custom tags to generic tags):
(# is a number, #Ref a number referencing to an object, #Next points to the next unused ParentTree key, depends on your structure)

# 0 obj
<<
 /Type /StructTreeRoot
 /K #Ref_Document 0 R
 /ParentTree #Ref_ParentTree 0 R
 /ParentTreeNextKey #Next
 /IDTree #Ref_IDTree 0 R   (if used)
 /RoleMap <<
   /PH /P
   /PT /P
   /HP /P
   /PL /P
   /PLE /P
 >>
>>
endobj

Create a Document object
add a language, a title reference, and referencing all root children (H1, H2, table – but not nested TR, TD …; list – but not nested list items):

# 0 obj
<<
 /Type /StructElem
 /S /Document
 /Lang (en)
 /T (yourTitle)
 /Pg #Ref_Page 0 R
 /P #Ref_Parent 0 R
 /K [# 0 R ... list of root kids]
>>
endobj

Now loop through your arrays for each MCID and add objects.

If it's

a plain P or H1–H6 Tag:
add a matching object
a Table start (PT tag if you use it as an table start marker):
add referencing hierarchical objects for
StructElem /Table ->
..StructElem /TR ->
....StructElem /TH /TD ->
....../StructElem /P -> Content-Stream /P << /MCID 11 >> BDC … EMC

For better a11y in a complex table layout, add Scope and IDs to the TH and connect the TD to these headers (see the discussion below):

- /A [ << /O /Table /Scope /Column >> ] to /TH
  /ID (some_ID) to /TH
- /A << /O /Table /Headers [ (#ref_TH_ID) ]>> to /TD

When using IDs, you must add an additional IDTree object (referenced from the StructTreeRoot), matching the IDs to these objects

# 0 obj
<<
  /Names [ 
  (some_ID) #Ref_TH 0 R 
  (some_ID2) #Ref_TH2 0 R 
  (some_ID3) #Ref_TH3 0 R 
 ]
>>
endobj

... and a table summary.

a List start (PL tag):
add referencing hierarchical objects for
StructElem /L ->
StructElem /Lbl -> Content-Stream /Lbl << /MCID 11 >> BDC … EMC
StructElem /LBody ->
StructElem /P -> Content-Stream /P << /MCID 11 >> BDC … EMC
a figure:
Add alt text and bounding box of the image [xMin, yMin, xMax, yMax]:

  /S /Figure
  /Alt (alt text)
  /A <<
    /BBox $bbox
    /O /Layout
    /Placement /Block
  >>

After looping through all structure elements, write the /K array in the /Document object (all top-level un-nested elements: H1, H2, P, Table, L, Figure).

Add Special objects and extensions:

a ParentTree object
per-page indexed list of all semantic objects referencing a physical MCID:

 # 0 obj
 << /Nums [0 [28 0 R 29 0 R ... ]] >>
 endobj

a Metadata object – must include:
- PDF/UA entry (https://taggedpdf.com/508-pdf-help-center/pdfua-identifier-missing/)
- Title entry (matching the native FPDF title)
- Producer entry (matching native FPDF producer)
- Language entry

(Metadata structure is flexible; in this example, it's based on an OpenOffice-generated PDF/UA)

an Outline object / target
a simple bookmark pointing to H1:

 # 0 obj
 << /Type /Outlines /First #_OutlineTarget 0 R /Last #_OutlineTarget 0 R /Count 1 >>
 endobj

 # 0 obj
 << /Title (h1) /Parent #_Ref_parent 0 R /Dest [ #_Ref_page 0 R /Fit] >>
 endobj

Modifying FPDF output:

Replace PDF version marker (the version number is not tooo important):

%PDF-1.3
->
%PDF-1.7
%âãÏÓ
(a marker for binary content, 4 bytes > 128

Extend /Catalog object with:

  /Outlines #_Ref_outline 0 R
  /Lang (de-DE)
  /StructTreeRoot #_Ref_StructTreeRoot 0 R
  /MarkInfo << /Marked true >>
  /Metadata #_Ref_Metadata 0 R
  /ViewerPreferences << /DisplayDocTitle true >>

Extend each /Page object with a sequential StructParents index pointing to corresponding ParentTree index

/StructParents 0

Merging output

Insert all generated code before /xref in buffer
Recalculate xref: if an object with a /Length parameter (usually stream / endstream) was changed, update its length.
The xref table contains the count and byte offset of all objects – recalculate because we’ve added new ones (use e.g preg_match_all with PREG_OFFSET_CAPTURE for all object numbers)

In trailer

Adjust /Size
Add /ID entry (value arbitrary):

/ID [ <00000000000000000000000000000000> <00000000000000000000000000000000> ]
  /ID [ <00000000000000000000000000000000> <00000000000000000000000000000000> ]

Recalculate startxref

Done ;)

Use the included simple fairly accessible ASCII PDF as a reference for the structure.
Use the specified tools to find any errors in your code / PDF. PAC is notorious for issuing poor error messages and crashes quickly when encountering faulty PDFs. Therefore, it makes sense to write your own scripts for basic syntax / references checking.

a11y-demo2.pdf
this updated demo addresses some issues with /Headers - kudos clee-sudbury's comments

Further Information

Validation

PAC – PDF Accessibility Checker:
https://pac.pdf-accessibility.org/de
https://check.axes4.com/de

AxesPDF – PDF editor variant of PAC (demo)
https://www.axes4.com/de/software-services/axespdf

VeraPDF – CLI/GUI validator, choose your standard:
https://verapdf.org/software

pdfcpu – CLI PDF processor (Go):
https://pdfcpu.io

PDF-XChange Editor – free viewer/editor with a11y checks
https://www.pdf-xchange.de/pdf-xchange-viewer

AVE-Pdf, limited validation
https://avepdf.com/de/pdfa-validation

Tools - stream unpacking etc

mutool - CLI tool (part of MuPDF):
https://mupdf.readthedocs.io

QPDF – CLI editing tool:
https://qpdf.readthedocs.io

Ghostscript – CLI editing tool:
https://ghostscript.com

Tutorials

PDF Reference Version 1.7 - Adobe Open Source
https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf

Matterhorn-Protokoll – PDF/UA standard
https://pdfa.org/resource

What is PDF? Basics to Tagged PDF
https://news.speedata.de/2024/03/19/insidepdf-01

A Minimum Complete Tutorial of Portable Document Format (PDF) - Short overview
https://metebalci.com/blog/a-minimum-complete-tutorial-of-pdf-with-pdfsh

Introduction to tagged PDF files
https://de.overleaf.com/learn/latex/An_introduction_to_tagged_PDF_files%3A_internals_and_the_challenges_of_accessibility

Tagged PDF Best Practice Guide
https://pdfa.org/resource/tagged-pdf-best-practice-guide-syntax/

PDF Knowledge BasePDF/UA
https://www.pdflib.com/pdf-knowledge-base/pdfua/

angeljqv · 2025-12-10T16:06:30Z

angeljqv
Dec 10, 2025

You could do it as a trait

https://github.com/fawno/FPDF/tree/master/scripts

1 reply

jeanjann Dec 30, 2025
Author

Hi angeljq,

thanks for your suggestion.

This is a messy mix of extended/patched native functions and new functions - I haven't really delved into traits, but I'm not sure if it's suitable in this case.

clee-sudbury · 2025-12-24T18:22:06Z

clee-sudbury
Dec 24, 2025

Thank you for sharing this guidance; I've published our Classic ASP port of FPDF with accessibility tags as a result of this information.

I have a few questions about the a11y-demo.pdf example:

The /ParentTreeNextKey is defined in the PDF 1.7 specification as "An integer greater than any key in the parent tree, to be used as a key for the next entry added to the tree.", but the /ParentTree in the example has 0 and 1 keys while /ParentTreeNextKey is set to 1. Should this be increased to 2?
The /Headers entry in the example doesn't seem to be defined in the PDF 1.7 specification for structure element dictionaries, but it is defined in attribute objects for tables as "An array of byte strings, where each string is the element identifier (see the ID entry in Table 10.10) for a TH structure element that is a header associated with this cell." while the example file instead provides an array of object references to the structure element of the header cell. What does this entry do?

3 replies

jeanjann Dec 30, 2025
Author

Hi Christopher,

thanks for your comments!

The ParentTree is a number tree in the StructTreeRoot that manages the mapping between content elements on the pages (marked content) and the structure elements in the logical structure tree. ParentTreeNextKey is an integer value that specifies which key should be used next for a new entry in the ParentTree.

In this case, all references to the StructElem objects are arranged as an array within a single object:
<< /Nums 0 [34 0 R 44 0 R 45 0 R ...] >>

Alternatively, one can use a separate index for each StructElem:
<< /Nums
0 34 0 R
1 44 0 R
2 45 0 R >>

I am not entirely sure whether the array-based approach is really the best or most correct one, but it is valid.

ParentTreeNextKey indicates which key should be used next for a new entry in the ParentTree. In the first case this would be
/ParentTreeNextKey 1

and in the second case it would be
/ParentTreeNextKey 3.

With an implementation that uses a single key 0, /ParentTreeNextKey 1 is therefore correct, as it is the next unused index (although when looking at the code I am currently not sure whether the structure in the demo, with /Page/StructParents 0 and /Page/StructParents 1 and only << /Nums 0 ... >>, is correct - although it does appear to be valid).

Of course the syntax should be correct - however: ParentTreeNextKey is only relevant when the PDF is modified or updated, because it points to the next free entry.

With regard to /Headers: you are right - the attribute should actually be located on the TD, not on the StructElem that represents the content of the TD. I read this somewhere in this form, but I cannot say where; perhaps I misinterpreted the syntax. Every implementation I know of uses an object reference as ID.

Thanks for pointing out these issues, I'll fix it!

clee-sudbury Jan 2, 2026

For ParentTree, it depends on the object --- pages use an array of indirect references, and content objects use an indirect reference alone --- per the PDF 1.7 specification:

Each integer key in the number tree corresponds to a single page of the document or to an individual object (such as an annotation or an XObject) that is a content item in its own right. The integer key is given as the value of the StructParent or StructParents entry in that object (see “Finding Structure Elements from Content Items” on page 868). The form of the associated value depends on the nature of the object:

For an object that is a content item in its own right, the value is an indirect reference to the object’s parent element (the structure element that contains it as a content item).

For a page object or content stream containing marked-content sequences that are content items, the value is an array of references to the parent elements of those marked-content sequences.

Do you have an example of an implementation that uses object references for Headers? My usual learning strategy is to read the specification first, so I'm not informed of any de-facto implementation deviations like this. The way that PDF implements header cell references seems just like how it's done in HTML with a list of textual element IDs in a TH or TD element's headers attribute.

jeanjann Jan 9, 2026
Author

Hi Christopher,

thanks for pointing that out.

Yes, you're right (once again). The specification requires a reference to the ID entry ("each string shall be the element identifier, see the ID entry in ..."), as is the case with HTML. I've corrected this (but it will need an IDTree too in this case ...)..

In my notes, I have the syntax with object references (n 0 R), but I can't remember where I got this information from, probably from a tutorial that wasn't entirely correct.

A11y for FPDF – Do It Yourself #31

Uh oh!

Uh oh!

jeanjann Aug 18, 2025

A11y for FPDF – Do It Yourself

Adjustments in makePDF

Adjustments in FPDF

The new function a11y()

Further Information

Validation

Tools - stream unpacking etc

Tutorials

Replies: 2 comments · 4 replies

Uh oh!

angeljqv Dec 10, 2025

Uh oh!

jeanjann Dec 30, 2025 Author

Uh oh!

clee-sudbury Dec 24, 2025

Uh oh!

Uh oh!

jeanjann Dec 30, 2025 Author

Uh oh!

Uh oh!

clee-sudbury Jan 2, 2026

Uh oh!

Uh oh!

jeanjann Jan 9, 2026 Author

jeanjann
Aug 18, 2025

Replies: 2 comments 4 replies

angeljqv
Dec 10, 2025

jeanjann Dec 30, 2025
Author

clee-sudbury
Dec 24, 2025

jeanjann Dec 30, 2025
Author

jeanjann Jan 9, 2026
Author