Accessibility is no longer an optional feature in modern software development. Under regulations like the ADA (Americans with Disabilities Act) and the EU Web Accessibility Directive, software that handles documents must provide assistive technologies. For PDF viewers, this means implementing robust Text-to-Speech (TTS) and screen reader integration.
In this article, we’ll look at how to extract semantic text streams from a PDF using PDFium in Delphi, and then feed that text into the Windows Speech API (SAPI) to build a fully accessible PDF viewer.
The Challenge of PDF Text Extraction
A PDF is fundamentally a canvas of painting instructions. It does not inherently know what a "paragraph" or a "column" is; it merely places glyphs at specific X/Y coordinates. To read a document aloud in a logical order, your parser must reconstruct the reading order.
PDFium handles this via the FPDFText_* API family, which analyzes the spatial relationships of glyphs to output coherent text streams.
Step 1: Extracting Text with PDFium
Before we can speak the text, we must extract it. The following Delphi code demonstrates how to initialize a text page and extract its contents into a standard string.
uses
System.SysUtils, pdfium_lib;
function ExtractPageText(Doc: FPDF_DOCUMENT; PageIndex: Integer): string;
var
Page: FPDF_PAGE;
TextPage: FPDF_TEXTPAGE;
CharCount: Integer;
Buffer: array of WideChar;
begin
Result := '';
Page := FPDF_LoadPage(Doc, PageIndex);
if Page = nil then Exit;
try
// Initialize the text extraction engine for this page
TextPage := FPDFText_LoadPage(Page);
if TextPage <> nil then
begin
try
CharCount := FPDFText_CountChars(TextPage);
if CharCount > 0 then
begin
SetLength(Buffer, CharCount + 1);
// Extract the text into the wide string buffer
FPDFText_GetText(TextPage, 0, CharCount, @Buffer[0]);
Result := WideCharToString(@Buffer[0]);
end;
finally
FPDFText_ClosePage(TextPage);
end;
end;
finally
FPDF_ClosePage(Page);
end;
end;
Step 2: Integrating the Windows Speech API (SAPI)
Once we have the semantic text, we can pass it to the Windows Speech API. SAPI provides the SpVoice COM interface, which allows for asynchronous speech synthesis, voice selection, and speech rate control.
uses
System.Win.ComObj, Winapi.ActiveX;
const
SVSFlagsAsync = 1;
procedure SpeakText(const TextToSpeak: string);
var
SpVoice: OLEVariant;
begin
CoInitialize(nil);
try
SpVoice := CreateOleObject('SAPI.SpVoice');
// Speak asynchronously so the UI does not freeze
SpVoice.Speak(TextToSpeak, SVSFlagsAsync);
finally
CoUninitialize;
end;
end;
Step 3: Synchronizing Speech with Highlighting
A truly accessible viewer doesn't just read the text blindly; it highlights the words on the screen as they are spoken. SAPI provides events (via connection points) that fire when a word boundary is reached.
By mapping the character index returned by the SAPI word boundary event back to the character index in PDFium using FPDFText_GetCharBox(), you can retrieve the bounding rectangle of the currently spoken word and draw a highlight overlay on your viewer canvas.
procedure HighlightWord(TextPage: FPDF_TEXTPAGE; CharIndex: Integer; CharCount: Integer);
var
i: Integer;
L, T, R, B: Double;
begin
// Iterate through the characters of the spoken word
for i := CharIndex to CharIndex + CharCount - 1 do
begin
// Get the physical bounding box on the PDF page
FPDFText_GetCharBox(TextPage, i, @L, @R, @B, @T);
// Transform PDF coordinates to screen coordinates and draw highlighting rect...
end;
end;
Building an Inclusive Application
By combining PDFium's spatial text extraction with SAPI's speech engine, Delphi developers can create powerful tools for visually impaired users or those who prefer auditory learning. Properly implementing these features ensures your application complies with strict enterprise and government accessibility standards.
Note: Integrated text extraction and coordinate mapping are fully supported by the PDFium Component.