Teknisk artikel

HotPDF: Unicode-textformning för komplexa skriftsystem i Delphi

HotPDF är ett nativt VCL PDF-bibliotek för Delphi- och C++Builder-program som behöver direkt PDF-skapande och redigering, formulär, annoteringar, kryptering, digitala signaturer, Unicode-teckensnitt, standardmedveten utdata och preflight-rapporter utan extern PDF-runtime.

Den här artikeln är skriven för developers producing multilingual invoices, certificates, labels, or reports from Delphi. Den behandlar Unicode text shaping for complex scripts som produktionsnära dokumentteknik, inte som ett isolerat komponentanrop.

Den praktiska risken är att text can appear plausible in a sample PDF while ligatures, bidirectional order, fallback fonts, or copy-and-search behavior fail for real customer names. Därför behöver flödet ett skrivet kontrakt, observerbar diagnostik och realistiska regressionsfiler.

Arkitekturbeslut

Make the text pipeline locale-aware. font fallback order for Arabic, Hebrew, Indic, CJK, and mixed Latin text / normalization rules for copied text, database values, and template placeholders

  • font fallback order for Arabic, Hebrew, Indic, CJK, and mixed Latin text
  • normalization rules for copied text, database values, and template placeholders
  • right-to-left paragraph handling and mixed-direction number policy
  • whether text must remain searchable, selectable, and accessible after output

Implementeringsflöde

Resolve fonts and shaping before pagination. Ordningen nedan gör arbetsflödet granskbart för Delphi- och C++Builder-team.

  1. normalize source text and record the locale used for formatting
  2. select fonts that contain the required glyphs before measuring layout
  3. shape and position text before page breaks are finalized
  4. embed or subset fonts according to licensing and PDF standard requirements
  5. verify visual output and extracted text with multilingual regression samples

Valideringsbevis

Proof that text is readable and extractable. Behåll dessa fält tillsammans med utdata eller supportunderlaget.

  • font selected for every script range and fallback reason when it changed
  • glyph coverage warnings, embedding mode, and subset identifier
  • extracted Unicode text compared with the original application value
  • viewer screenshots for representative right-to-left and combining-mark cases

Visual output is not enough

Complex-script support involves character normalization, shaping, glyph positioning, embedding, ToUnicode maps, and reading order. A PDF that only looks right in one viewer can still fail search, selection, accessibility, or downstream extraction.

Regression files worth keeping

Keep more than successful samples. A useful Unicode text shaping for complex scripts regression set contains normal files, boundary files, and intentional failure files so the behavior is stable across releases.

  • database collation can alter composed characters before the PDF layer sees them
  • font substitution on a developer machine can hide missing embedded fonts
  • line breaks in bidirectional text can reorder punctuation and numbers
  • search may fail when ToUnicode data is missing even if the page renders correctly
  • normalize source text and record the locale used for formatting
  • select fonts that contain the required glyphs before measuring layout

Tekniska granskningsnoteringar för Unicode text shaping for complex scripts

Använd dessa granskningsnoteringar för att säkerställa att funktionen har passerat demo-nivån och kan försvaras under leverans, support och kundeskalering.

  • Beslut: font fallback order for Arabic, Hebrew, Indic, CJK, and mixed Latin text. Implementeringspresspunkt: select fonts that contain the required glyphs before measuring layout. Acceptansbevis: extracted Unicode text compared with the original application value. Regressionsutlösare: search may fail when ToUnicode data is missing even if the page renders correctly
  • Beslut: normalization rules for copied text, database values, and template placeholders. Implementeringspresspunkt: shape and position text before page breaks are finalized. Acceptansbevis: viewer screenshots for representative right-to-left and combining-mark cases. Regressionsutlösare: database collation can alter composed characters before the PDF layer sees them
  • Beslut: right-to-left paragraph handling and mixed-direction number policy. Implementeringspresspunkt: embed or subset fonts according to licensing and PDF standard requirements. Acceptansbevis: font selected for every script range and fallback reason when it changed. Regressionsutlösare: font substitution on a developer machine can hide missing embedded fonts
  • Beslut: whether text must remain searchable, selectable, and accessible after output. Implementeringspresspunkt: verify visual output and extracted text with multilingual regression samples. Acceptansbevis: glyph coverage warnings, embedding mode, and subset identifier. Regressionsutlösare: line breaks in bidirectional text can reorder punctuation and numbers

Gränsfall

  • database collation can alter composed characters before the PDF layer sees them
  • font substitution on a developer machine can hide missing embedded fonts
  • line breaks in bidirectional text can reorder punctuation and numbers
  • search may fail when ToUnicode data is missing even if the page renders correctly

Delphi / C++Builder notes

HotPDF Component should sit behind a small service boundary that receives files, streams, profiles, and credentials, then returns output paths, warnings, metrics, and validation status. Important terms include Unicode, text shaping, font embedding, ToUnicode, bidirectional text, fallback font.

Delphi-kodexempel

Följande Delphi-skiss visar en praktisk servicegräns för detta ämne. Håll policykontroller, loggning och validering utanför det smala produktanropet så att arbetsflödet går att testa.

procedure DrawShapedRun(Pdf: THotPDF; const Text: UnicodeString; const Script: TScriptProfile);
begin
  Pdf.CurrentPage.SetFont(Script.FontName, [], Script.Size, 0, Script.Vertical);
  if Script.RequiresReorder then
    Pdf.CurrentPage.TextOut(Script.X, Script.Y, 0, ShapeUnicodeRun(Text, Script))
  else
    Pdf.CurrentPage.TextOut(Script.X, Script.Y, 0, Text);
  RecordGlyphCoverage(Script.FontName, Text);
end;

Produktionschecklista

  • Kör arbetsflödet på en tom fil, en normal kundfil och en värstafallfil
  • Öppna den genererade PDF-filen med rätt visare, validator, skrivare eller nedströmsapplikation
  • Logga produktversion, profilversion, inmatningshash, utdatasökväg, förfluten tid och antal varningar
  • Håll lösenord, certifikat, tillfälliga filer och kunddata under tydliga lagringsregler
  • Lägg till regressionsdokument när en kundfil avslöjar ett nytt gränsfall

Produktdokumentation

HotPDF Component

Fler kodexempel

// Ship a known font instead of relying on installed system fonts
Pdf.RegisterUnicodeTTF('C:\Fonts\NotoSansArabic.ttf');
Pdf.CurrentPage.SetFont('NotoSansArabic', [], 12);

// Audit coverage for the codepoints your data actually uses
GID := Pdf.GetUnicodeGlyphForCodepoint($0628);  // U+0628 ARABIC LETTER BEH
LogGlyphAudit($0628, GID);
// Declare right-to-left reading order at the document level
Pdf.Direction := RightToLeft;  // adds vpDirection to ViewerPreferences