Integre fluxos do PDFium VCL Component em aplicações Delphi e C++Builder, ou fluxos do PDFium LCL Component em Lazarus/FPC, com componentes em código-fonte para visualização, renderização, formulários, impressão, relatórios de preflight e validação orientada a padrões.
Este artigo é para developers building read-aloud, study, review, or assisted-reading features in Delphi PDF viewers. Ele trata word tracking and speech cursoring como engenharia documental de produção, não como uma chamada isolada de componente.
O risco prático é que speech highlighting becomes distracting or wrong when word boxes, ligatures, rotation, hidden text, and timing do not match the spoken stream. Por isso o fluxo precisa de contrato escrito, diagnósticos observáveis e arquivos de regressão representativos.
Decisões de arquitetura
Synchronize speech with extracted text, not pixels alone. text extraction source, word segmentation policy, and language assumptions / highlight style, scroll behavior, pause and resume rules, and user controls
- text extraction source, word segmentation policy, and language assumptions
- highlight style, scroll behavior, pause and resume rules, and user controls
- fallback behavior for image-only pages, hidden text, ligatures, and rotated content
- whether speech timing is driven by the engine, the viewer, or a shared coordinator
Fluxo de implementação
Track words through a stable text map. The order below keeps the workflow reviewable for Delphi and C++Builder teams.
- extract page text and word bounds before speech playback begins
- build a word map that links text offsets, page numbers, and viewer coordinates
- call the tracking layer as the speech engine advances through words or offsets
- scroll the viewport only when the current word leaves the comfortable reading zone
- record skipped words, missing boxes, and timing drift for diagnostics
Evidências de validação
Synchronization evidence for assisted reading. Keep these fields with the output or support record.
- page number, word text, text offset, bounding box, and speech timestamp
- tracking method used, including TrackReadingWordAt when the viewer supports it
- fallback reason for words without reliable geometry
- latency between speech event and visible cursor update
Word boxes are viewer data
Word-synchronized reading needs a mapping between extracted text, page coordinates, and speech timing. A robust implementation knows when text is unavailable, when a word box is ambiguous, and how to keep the viewport aligned with the current spoken word.
Review questions before release
Before this reaches production, the team should be able to answer these questions without reading source code.
- Who owns text extraction source, word segmentation policy, and language assumptions?
- What evidence proves page number, word text, text offset, bounding box, and speech timestamp?
- What happens when ligatures can represent multiple characters inside one visual glyph?
- Which regression file covers record skipped words, missing boxes, and timing drift for diagnostics?
Engineering review notes for word tracking and speech cursoring
Use these review notes to make sure the feature has moved beyond a demo and can be defended during release, support, and customer escalation.
- Decision: text extraction source, word segmentation policy, and language assumptions. Implementation pressure point: build a word map that links text offsets, page numbers, and viewer coordinates. Acceptance evidence: fallback reason for words without reliable geometry. Regression trigger: rapid speech rates can require coalescing highlight updates to avoid flicker
- Decision: highlight style, scroll behavior, pause and resume rules, and user controls. Implementation pressure point: call the tracking layer as the speech engine advances through words or offsets. Acceptance evidence: latency between speech event and visible cursor update. Regression trigger: ligatures can represent multiple characters inside one visual glyph
- Decision: fallback behavior for image-only pages, hidden text, ligatures, and rotated content. Implementation pressure point: scroll the viewport only when the current word leaves the comfortable reading zone. Acceptance evidence: page number, word text, text offset, bounding box, and speech timestamp. Regression trigger: rotated or vertical text needs coordinate handling separate from normal pages
Casos limite
- ligatures can represent multiple characters inside one visual glyph
- rotated or vertical text needs coordinate handling separate from normal pages
- OCR text layers may not align perfectly with scanned images
- rapid speech rates can require coalescing highlight updates to avoid flicker
Delphi / C++Builder notes
PDFium Component should sit behind a small service boundary that receives files, streams, profiles, and credentials, then returns output paths, warnings, metrics, and validation status. Important terms include TrackReadingWordAt, speech cursor, word bounds, text extraction, highlight, reading assistance.
Exemplo de código Delphi
O esboço Delphi abaixo mostra um limite de serviço prático para este tema. Mantenha checagens de política, logs e validação fora do trecho estreito que chama o produto para que o fluxo continue testável.
procedure TSpeechForm.SpeakWordAtCursor(PageNo, CharIndex: Integer);
var
UnitInfo: TReadingUnit;
begin
UnitInfo := LocateWordUnit(PdfView, PageNo, CharIndex);
HighlightBounds(UnitInfo.PageNo, UnitInfo.Bounds);
SpeechQueue.Speak(UnitInfo.Text);
StoreCursorPosition(UnitInfo.PageNo, UnitInfo.EndChar);
end;
Checklist de produção
- Run the workflow on an empty file, a normal customer file, and a worst-case file
- Open the generated PDF with the target viewer, validator, printer, or downstream application
- Log product version, profile version, input hash, output path, elapsed time, and warning count
- Keep passwords, certificates, temporary files, and customer data under explicit retention rules
- Add regression documents when a customer file exposes a new edge case