HotPDF: kształtowanie tekstu Unicode dla złożonych skryptów

HotPDF to natywna biblioteka PDF VCL dla aplikacji Delphi i C++Builder, które potrzebują bezpośredniego tworzenia i edycji PDF, formularzy, adnotacji, szyfrowania, podpisów cyfrowych, czcionek Unicode, wyjścia zgodnego ze standardami i raportów preflight bez instalowania zewnętrznego runtime PDF

Ten artykuł jest przeznaczony dla developers producing multilingual invoices, certificates, labels, or reports from Delphi. Traktuje Unicode text shaping for complex scripts jako produkcyjną inżynierię dokumentów, a nie pojedyncze wywołanie komponentu

Praktyczne ryzyko polega na tym, że text can appear plausible in a sample PDF while ligatures, bidirectional order, fallback fonts, or copy-and-search behavior fail for real customer names. Dlatego przepływ wymaga spisanego kontraktu, obserwowalnej diagnostyki i realistycznych plików regresyjnych

Decyzje architektoniczne

Make the text pipeline locale-aware. font fallback order for Arabic, Hebrew, Indic, CJK, and mixed Latin text / normalization rules for copied text, database values, and template placeholders

font fallback order for Arabic, Hebrew, Indic, CJK, and mixed Latin text
normalization rules for copied text, database values, and template placeholders
right-to-left paragraph handling and mixed-direction number policy
whether text must remain searchable, selectable, and accessible after output

Przebieg implementacji

Resolve fonts and shaping before pagination. Poniższa kolejność zachowuje czytelność przepływu pracy dla zespołów Delphi i C++Builder

normalize source text and record the locale used for formatting
select fonts that contain the required glyphs before measuring layout
shape and position text before page breaks are finalized
embed or subset fonts according to licensing and PDF standard requirements
verify visual output and extracted text with multilingual regression samples

Dowody walidacji

Proof that text is readable and extractable. Zachowaj te pola wraz z wynikiem lub rekordem wsparcia

font selected for every script range and fallback reason when it changed
glyph coverage warnings, embedding mode, and subset identifier
extracted Unicode text compared with the original application value
viewer screenshots for representative right-to-left and combining-mark cases

Visual output is not enough

Complex-script support involves character normalization, shaping, glyph positioning, embedding, ToUnicode maps, and reading order. A PDF that only looks right in one viewer can still fail search, selection, accessibility, or downstream extraction

Regression files worth keeping

Keep more than successful samples. A useful Unicode text shaping for complex scripts regression set contains normal files, boundary files, and intentional failure files so the behavior is stable across releases

database collation can alter composed characters before the PDF layer sees them
font substitution on a developer machine can hide missing embedded fonts
line breaks in bidirectional text can reorder punctuation and numbers
search may fail when ToUnicode data is missing even if the page renders correctly
normalize source text and record the locale used for formatting
select fonts that contain the required glyphs before measuring layout

Notatki przeglądu inżynierskiego dla Unicode text shaping for complex scripts

Użyj tych notatek przeglądu, aby upewnić się, że funkcja wyszła poza demonstrację i da się ją obronić podczas wydania, wsparcia i eskalacji klienta

Decyzja: font fallback order for Arabic, Hebrew, Indic, CJK, and mixed Latin text. Punkt nacisku implementacji: select fonts that contain the required glyphs before measuring layout. Dowody akceptacji: extracted Unicode text compared with the original application value. Wyzwalacz regresji: search may fail when ToUnicode data is missing even if the page renders correctly
Decyzja: normalization rules for copied text, database values, and template placeholders. Punkt nacisku implementacji: shape and position text before page breaks are finalized. Dowody akceptacji: viewer screenshots for representative right-to-left and combining-mark cases. Wyzwalacz regresji: database collation can alter composed characters before the PDF layer sees them
Decyzja: right-to-left paragraph handling and mixed-direction number policy. Punkt nacisku implementacji: embed or subset fonts according to licensing and PDF standard requirements. Dowody akceptacji: font selected for every script range and fallback reason when it changed. Wyzwalacz regresji: font substitution on a developer machine can hide missing embedded fonts
Decyzja: whether text must remain searchable, selectable, and accessible after output. Punkt nacisku implementacji: verify visual output and extracted text with multilingual regression samples. Dowody akceptacji: glyph coverage warnings, embedding mode, and subset identifier. Wyzwalacz regresji: line breaks in bidirectional text can reorder punctuation and numbers

Przypadki brzegowe

database collation can alter composed characters before the PDF layer sees them
font substitution on a developer machine can hide missing embedded fonts
line breaks in bidirectional text can reorder punctuation and numbers
search may fail when ToUnicode data is missing even if the page renders correctly

Delphi / C++Builder notes

HotPDF Component should sit behind a small service boundary that receives files, streams, profiles, and credentials, then returns output paths, warnings, metrics, and validation status. Important terms include Unicode, text shaping, font embedding, ToUnicode, bidirectional text, fallback font

Przykład kodu Delphi

Poniższy szkic Delphi pokazuje praktyczną granicę usługi dla tego tematu. Kontrole zasad, logowanie i walidację trzymaj poza wąskim blokiem wywołań produktu, aby przepływ pozostał testowalny

procedure DrawShapedRun(Pdf: THotPDF; const Text: UnicodeString; const Script: TScriptProfile);
begin
  Pdf.CurrentPage.SetFont(Script.FontName, [], Script.Size, 0, Script.Vertical);
  if Script.RequiresReorder then
    Pdf.CurrentPage.TextOut(Script.X, Script.Y, 0, ShapeUnicodeRun(Text, Script))
  else
    Pdf.CurrentPage.TextOut(Script.X, Script.Y, 0, Text);
  RecordGlyphCoverage(Script.FontName, Text);
end;

Lista produkcyjna

Uruchom przepływ pracy na pustym pliku, zwykłym pliku klienta i pliku z najgorszego scenariusza
Otwórz wygenerowany plik PDF w docelowej przeglądarce, walidatorze, drukarce lub aplikacji nadrzędnej
Zaloguj wersję produktu, wersję profilu, hash wejścia, ścieżkę wyjścia, czas wykonania i liczbę ostrzeżeń
Przechowuj hasła, certyfikaty, pliki tymczasowe i dane klienta zgodnie z jednoznacznymi zasadami retencji
Dodaj dokument regresyjny, gdy plik klienta ujawni nowy przypadek brzegowy

Dokumentacja produktu

HotPDF Component

Dodatkowe przykłady kodu

// Ship a known font instead of relying on installed system fonts
Pdf.RegisterUnicodeTTF('C:\Fonts\NotoSansArabic.ttf');
Pdf.CurrentPage.SetFont('NotoSansArabic', [], 12);

// Audit coverage for the codepoints your data actually uses
GID := Pdf.GetUnicodeGlyphForCodepoint($0628);  // U+0628 ARABIC LETTER BEH
LogGlyphAudit($0628, GID);

// Declare right-to-left reading order at the document level
Pdf.Direction := RightToLeft;  // adds vpDirection to ViewerPreferences

Zakres obsługi skryptów

Arabski wymaga zarówno zmiany kolejności dwukierunkowej, jak i kontekstowego łączenia glifów. Hebrajski wymaga zmiany kolejności, natomiast tajski korzysta ze zwykłej ścieżki TextOut. Dla pism indyjskich potrzebne są pełne reguły GSUB, więc poprawne renderowanie arabskiego nie jest dowodem obsługi dewanagari

Weryfikacja wyniku

Testuj tekst logiczny, mieszane kierunki, cyfry, waluty i znaki diakrytyczne. Skopiuj tekst z PDF, użyj wyszukiwania w przeglądarce i sprawdź dokument na komputerze bez deweloperskich czcionek, ponieważ sam wygląd strony nie ujawnia wszystkich problemów

HotPDF: kształtowanie tekstu Unicode dla złożonych skryptów w Delphi