Technical Article

Building Accessible PDF Viewers with Text-to-Speech in Delphi

Accessibility is no longer an optional feature in modern software development. Under regulations like the ADA (Americans with Disabilities Act) and the EU Web Accessibility Directive, software that handles documents must provide assistive technologies. For PDF viewers, this means implementing robust Text-to-Speech (TTS) and screen reader integration.

In this article, we’ll look at how to extract semantic text streams from a PDF using PDFium in Delphi, and then feed that text into the Windows Speech API (SAPI) to build a fully accessible PDF viewer.

The Challenge of PDF Text Extraction

A PDF is fundamentally a canvas of painting instructions. It does not inherently know what a "paragraph" or a "column" is; it merely places glyphs at specific X/Y coordinates. To read a document aloud in a logical order, your parser must reconstruct the reading order.

PDFium handles this via the FPDFText_* API family, which analyses the spatial relationships of glyphs to output coherent text streams.

Step 1: Extracting Text with PDFium

Before we can speak the text, we must extract it. The following Delphi code demonstrates how to initialise a text page and extract its contents into a standard string.

uses
  System.SysUtils, pdfium_lib;

function ExtractPageText(Doc: FPDF_DOCUMENT; PageIndex: Integer): string;
var
  Page: FPDF_PAGE;
  TextPage: FPDF_TEXTPAGE;
  CharCount: Integer;
  Buffer: array of WideChar;
begin
  Result := '';
  Page := FPDF_LoadPage(Doc, PageIndex);
  if Page = nil then Exit;
  
  try
    // Initialize the text extraction engine for this page
    TextPage := FPDFText_LoadPage(Page);
    if TextPage <> nil then
    begin
      try
        CharCount := FPDFText_CountChars(TextPage);
        if CharCount > 0 then
        begin
          SetLength(Buffer, CharCount + 1);
          // Extract the text into the wide string buffer
          FPDFText_GetText(TextPage, 0, CharCount, @Buffer[0]);
          Result := WideCharToString(@Buffer[0]);
        end;
      finally
        FPDFText_ClosePage(TextPage);
      end;
    end;
  finally
    FPDF_ClosePage(Page);
  end;
end;

Step 2: Integrating the Windows Speech API (SAPI)

Once we have the semantic text, we can pass it to the Windows Speech API. SAPI provides the SpVoice COM interface, which allows for asynchronous speech synthesis, voice selection, and speech rate control.

uses
  System.Win.ComObj, Winapi.ActiveX;

const
  SVSFlagsAsync = 1;

procedure SpeakText(const TextToSpeak: string);
var
  SpVoice: OLEVariant;
begin
  CoInitialize(nil);
  try
    SpVoice := CreateOleObject('SAPI.SpVoice');
    // Speak asynchronously so the UI does not freeze
    SpVoice.Speak(TextToSpeak, SVSFlagsAsync);
  finally
    CoUninitialize;
  end;
end;

Step 3: Synchronizing Speech with Highlighting

A truly accessible viewer doesn't just read the text blindly; it highlights the words on the screen as they are spoken. SAPI provides events (via connection points) that fire when a word boundary is reached.

By mapping the character index returned by the SAPI word boundary event back to the character index in PDFium using FPDFText_GetCharBox(), you can retrieve the bounding rectangle of the currently spoken word and draw a highlight overlay on your viewer canvas.

procedure HighlightWord(TextPage: FPDF_TEXTPAGE; CharIndex: Integer; CharCount: Integer);
var
  i: Integer;
  L, T, R, B: Double;
begin
  // Iterate through the characters of the spoken word
  for i := CharIndex to CharIndex + CharCount - 1 do
  begin
    // Get the physical bounding box on the PDF page
    FPDFText_GetCharBox(TextPage, i, @L, @R, @B, @T);
    // Transform PDF coordinates to screen coordinates and draw highlighting rect...
  end;
end;

Building an Inclusive Application

By combining PDFium's spatial text extraction with SAPI's speech engine, Delphi developers can create powerful tools for visually impaired users or those who prefer auditory learning. Properly implementing these features ensures your application complies with strict enterprise and government accessibility standards.

Note: Integrated text extraction and coordinate mapping are fully supported by the PDFium Component.