Teknisk artikel

PDFium: word tracking and speech cursoring in Delphi

Integrera PDFium VCL Component-flöden i Delphi- och C++Builder-applikationer, eller PDFium LCL Component-flöden i Lazarus/FPC, med källkodskomponenter för visning, rendering, formulär, utskrift, preflight-rapporter och standardinriktad validering.

Den här artikeln är skriven för developers building read-aloud, study, review, or assisted-reading features in Delphi PDF viewers. Den behandlar word tracking and speech cursoring som produktionsnära dokumentteknik, inte som ett isolerat komponentanrop.

Den praktiska risken är att speech highlighting becomes distracting or wrong when word boxes, ligatures, rotation, hidden text, and timing do not match the spoken stream. Därför behöver flödet ett skrivet kontrakt, observerbar diagnostik och realistiska regressionsfiler.

Arkitekturbeslut

Synchronize speech with extracted text, not pixels alone. text extraction source, word segmentation policy, and language assumptions / highlight style, scroll behavior, pause and resume rules, and user controls

  • text extraction source, word segmentation policy, and language assumptions
  • highlight style, scroll behavior, pause and resume rules, and user controls
  • fallback behavior for image-only pages, hidden text, ligatures, and rotated content
  • whether speech timing is driven by the engine, the viewer, or a shared coordinator

Implementeringsflöde

Track words through a stable text map. Ordningen nedan gör arbetsflödet granskbart för Delphi- och C++Builder-team.

  1. extract page text and word bounds before speech playback begins
  2. build a word map that links text offsets, page numbers, and viewer coordinates
  3. call the tracking layer as the speech engine advances through words or offsets
  4. scroll the viewport only when the current word leaves the comfortable reading zone
  5. record skipped words, missing boxes, and timing drift for diagnostics

Valideringsbevis

Synchronization evidence for assisted reading. Behåll dessa fält tillsammans med utdata eller supportunderlaget.

  • page number, word text, text offset, bounding box, and speech timestamp
  • tracking method used, including TrackReadingWordAt when the viewer supports it
  • fallback reason for words without reliable geometry
  • latency between speech event and visible cursor update

Word boxes are viewer data

Word-synchronized reading needs a mapping between extracted text, page coordinates, and speech timing. A robust implementation knows when text is unavailable, when a word box is ambiguous, and how to keep the viewport aligned with the current spoken word.

Review questions before release

Before this reaches production, the team should be able to answer these questions without reading source code.

  • Who owns text extraction source, word segmentation policy, and language assumptions?
  • What evidence proves page number, word text, text offset, bounding box, and speech timestamp?
  • What happens when ligatures can represent multiple characters inside one visual glyph?
  • Which regression file covers record skipped words, missing boxes, and timing drift for diagnostics?

Tekniska granskningsnoteringar för word tracking and speech cursoring

Använd dessa granskningsnoteringar för att säkerställa att funktionen har passerat demo-nivån och kan försvaras under leverans, support och kundeskalering.

  • Beslut: text extraction source, word segmentation policy, and language assumptions. Implementeringspresspunkt: build a word map that links text offsets, page numbers, and viewer coordinates. Acceptansbevis: fallback reason for words without reliable geometry. Regressionsutlösare: rapid speech rates can require coalescing highlight updates to avoid flicker
  • Beslut: highlight style, scroll behavior, pause and resume rules, and user controls. Implementeringspresspunkt: call the tracking layer as the speech engine advances through words or offsets. Acceptansbevis: latency between speech event and visible cursor update. Regressionsutlösare: ligatures can represent multiple characters inside one visual glyph
  • Beslut: fallback behavior for image-only pages, hidden text, ligatures, and rotated content. Implementeringspresspunkt: scroll the viewport only when the current word leaves the comfortable reading zone. Acceptansbevis: page number, word text, text offset, bounding box, and speech timestamp. Regressionsutlösare: rotated or vertical text needs coordinate handling separate from normal pages

Gränsfall

  • ligatures can represent multiple characters inside one visual glyph
  • rotated or vertical text needs coordinate handling separate from normal pages
  • OCR text layers may not align perfectly with scanned images
  • rapid speech rates can require coalescing highlight updates to avoid flicker

Delphi / C++Builder notes

PDFium Component should sit behind a small service boundary that receives files, streams, profiles, and credentials, then returns output paths, warnings, metrics, and validation status. Important terms include TrackReadingWordAt, speech cursor, word bounds, text extraction, highlight, reading assistance.

Delphi-kodexempel

Följande Delphi-skiss visar en praktisk servicegräns för detta ämne. Håll policykontroller, loggning och validering utanför det smala produktanropet så att arbetsflödet går att testa.

procedure TSpeechForm.SpeakWordAtCursor(PageNo, CharIndex: Integer);
var
  UnitInfo: TReadingUnit;
begin
  UnitInfo := LocateWordUnit(PdfView, PageNo, CharIndex);
  HighlightBounds(UnitInfo.PageNo, UnitInfo.Bounds);
  SpeechQueue.Speak(UnitInfo.Text);
  StoreCursorPosition(UnitInfo.PageNo, UnitInfo.EndChar);
end;

Produktionschecklista

  • Kör arbetsflödet på en tom fil, en normal kundfil och en värstafallfil
  • Öppna den genererade PDF-filen med rätt visare, validator, skrivare eller nedströmsapplikation
  • Logga produktversion, profilversion, inmatningshash, utdatasökväg, förfluten tid och antal varningar
  • Håll lösenord, certifikat, tillfälliga filer och kunddata under tydliga lagringsregler
  • Lägg till regressionsdokument när en kundfil avslöjar ett nytt gränsfall

Produktdokumentation

PDFium Component

Fler kodexempel

procedure TReaderForm.OnSpeechWordBoundary(StreamPos: Integer);
var
  WordIdx: Integer;
begin
  // Maps the offset to a word box and moves the highlight in one call
  WordIdx := PdfView.TrackReadingWordAt(FPageNo, StreamPos);
  if WordIdx < 0 then
    Exit;                     // boundary fell outside any word: keep last highlight
end;
procedure TReaderForm.StopReading;
begin
  FVoice.Stop;                // halt SAPI playback first
  PdfView.ClearReadingWord;   // then remove the highlight; a stale cursor reads as a bug
end;