기술 문서

HotPDF Component: Delphi에서 Unicode text shaping for complex scripts

HotPDF는 Delphi 및 C++Builder 애플리케이션을 위한 네이티브 VCL PDF 라이브러리입니다. 외부 PDF 런타임 배포 없이 PDF 생성, 편집, 양식, 주석, 암호화, 디지털 서명, Unicode 글꼴 처리, 표준 지향 출력, 프리플라이트 보고를 지원합니다.

이 글은 developers producing multilingual invoices, certificates, labels, or reports from Delphi을 위한 글입니다. Unicode text shaping for complex scripts을 단순한 컴포넌트 호출이 아니라 운영 환경의 문서 엔지니어링으로 다룹니다.

실제 위험은 text can appear plausible in a sample PDF while ligatures, bidirectional order, fallback fonts, or copy-and-search behavior fail for real customer names입니다. 따라서 명확한 계약, 관찰 가능한 진단, 실제 고객 파일을 반영한 회귀 샘플이 필요합니다.

아키텍처 결정

Make the text pipeline locale-aware. font fallback order for Arabic, Hebrew, Indic, CJK, and mixed Latin text / normalization rules for copied text, database values, and template placeholders

  • font fallback order for Arabic, Hebrew, Indic, CJK, and mixed Latin text
  • normalization rules for copied text, database values, and template placeholders
  • right-to-left paragraph handling and mixed-direction number policy
  • whether text must remain searchable, selectable, and accessible after output

구현 흐름

Resolve fonts and shaping before pagination. The order below keeps the workflow reviewable for Delphi and C++Builder teams.

  1. normalize source text and record the locale used for formatting
  2. select fonts that contain the required glyphs before measuring layout
  3. shape and position text before page breaks are finalized
  4. embed or subset fonts according to licensing and PDF standard requirements
  5. verify visual output and extracted text with multilingual regression samples

검증 증거

Proof that text is readable and extractable. Keep these fields with the output or support record.

  • font selected for every script range and fallback reason when it changed
  • glyph coverage warnings, embedding mode, and subset identifier
  • extracted Unicode text compared with the original application value
  • viewer screenshots for representative right-to-left and combining-mark cases

Visual output is not enough

Complex-script support involves character normalization, shaping, glyph positioning, embedding, ToUnicode maps, and reading order. A PDF that only looks right in one viewer can still fail search, selection, accessibility, or downstream extraction.

Regression files worth keeping

Keep more than successful samples. A useful Unicode text shaping for complex scripts regression set contains normal files, boundary files, and intentional failure files so the behavior is stable across releases.

  • database collation can alter composed characters before the PDF layer sees them
  • font substitution on a developer machine can hide missing embedded fonts
  • line breaks in bidirectional text can reorder punctuation and numbers
  • search may fail when ToUnicode data is missing even if the page renders correctly
  • normalize source text and record the locale used for formatting
  • select fonts that contain the required glyphs before measuring layout

Unicode text shaping for complex scripts에 대한 엔지니어링 검토 노트

이 검토 노트를 사용해 기능이 데모 단계를 넘어섰고 출시, 지원, 고객 에스컬레이션 상황에서 설명할 수 있는지 확인합니다

  • 결정: font fallback order for Arabic, Hebrew, Indic, CJK, and mixed Latin text. 구현상 핵심 지점: select fonts that contain the required glyphs before measuring layout. 승인 증거: extracted Unicode text compared with the original application value. 회귀 트리거: search may fail when ToUnicode data is missing even if the page renders correctly
  • 결정: normalization rules for copied text, database values, and template placeholders. 구현상 핵심 지점: shape and position text before page breaks are finalized. 승인 증거: viewer screenshots for representative right-to-left and combining-mark cases. 회귀 트리거: database collation can alter composed characters before the PDF layer sees them
  • 결정: right-to-left paragraph handling and mixed-direction number policy. 구현상 핵심 지점: embed or subset fonts according to licensing and PDF standard requirements. 승인 증거: font selected for every script range and fallback reason when it changed. 회귀 트리거: font substitution on a developer machine can hide missing embedded fonts
  • 결정: whether text must remain searchable, selectable, and accessible after output. 구현상 핵심 지점: verify visual output and extracted text with multilingual regression samples. 승인 증거: glyph coverage warnings, embedding mode, and subset identifier. 회귀 트리거: line breaks in bidirectional text can reorder punctuation and numbers

경계 사례

  • database collation can alter composed characters before the PDF layer sees them
  • font substitution on a developer machine can hide missing embedded fonts
  • line breaks in bidirectional text can reorder punctuation and numbers
  • search may fail when ToUnicode data is missing even if the page renders correctly

Delphi / C++Builder 참고 사항

HotPDF Component should sit behind a small service boundary that receives files, streams, profiles, and credentials, then returns output paths, warnings, metrics, and validation status. 중요한 용어는 Unicode, text shaping, font embedding, ToUnicode, bidirectional text, fallback font.

Delphi 코드 예제

다음 Delphi 스케치는 이 주제에 맞는 실무형 서비스 경계를 보여 줍니다. 정책 검사, 로깅, 검증을 좁은 제품 호출 구간 밖에 두면 워크플로를 테스트하기 쉽습니다.

procedure DrawShapedRun(Pdf: THotPDF; const Text: UnicodeString; const Script: TScriptProfile);
begin
  Pdf.CurrentPage.SetFont(Script.FontName, [], Script.Size, 0, Script.Vertical);
  if Script.RequiresReorder then
    Pdf.CurrentPage.TextOut(Script.X, Script.Y, 0, ShapeUnicodeRun(Text, Script))
  else
    Pdf.CurrentPage.TextOut(Script.X, Script.Y, 0, Text);
  RecordGlyphCoverage(Script.FontName, Text);
end;

운영 체크리스트

  • 워크플로는 빈 파일, 일반 고객 파일, 최악의 파일에서 실행합니다
  • 생성된 PDF는 대상 뷰어, 검증기, 프린터 또는 downstream 애플리케이션에서 엽니다
  • 제품 버전, 프로필 버전, 입력 해시, 출력 경로, 경과 시간, 경고 수를 기록합니다
  • 암호, 인증서, 임시 파일, 고객 데이터는 명확한 보존 규칙에 따라 관리합니다
  • 고객 파일이 새로운 경계 사례를 드러내면 회귀 문서를 추가합니다

제품 문서

HotPDF Component

추가 코드 예제

// Ship a known font instead of relying on installed system fonts
Pdf.RegisterUnicodeTTF('C:\Fonts\NotoSansArabic.ttf');
Pdf.CurrentPage.SetFont('NotoSansArabic', [], 12);

// Audit coverage for the codepoints your data actually uses
GID := Pdf.GetUnicodeGlyphForCodepoint($0628);  // U+0628 ARABIC LETTER BEH
LogGlyphAudit($0628, GID);
// Declare right-to-left reading order at the document level
Pdf.Direction := RightToLeft;  // adds vpDirection to ViewerPreferences