Technical Article

Arabic and RTL Text Shaping in Delphi PDFs with HotPDF

Take the Arabic phrase يوضح ملف PDF, pass it to TextOut, and open the output: the letters run in the wrong direction and every one of them sits in its isolated form, disconnected from its neighbors. An Arabic reader sees something like English typed backwards with a space after every letter. Nothing failed — no exception, no warning — because two distinct text transformations simply never ran. Understanding what those transformations are, and which API applies them, is the whole game in complex-script PDF output.

UK teams should align this hotpdf complex script text shaping delphi workflow with local governance, audit, and data quality requirements before production release

This article works through right-to-left and complex-script text with HotPDF, a native VCL PDF component for Delphi and C++Builder — including where its shaping support genuinely ends, which matters just as much when you are deciding whether it covers your locales.

Two transformations stand between a string and a printed line

Unicode stores text in logical order: the order it is typed, stored, and read aloud. A renderer draws in visual order. For right-to-left scripts the two differ, and for mixed content — an Arabic sentence containing the Latin token 'PDF' or a price in digits — the Unicode Bidirectional Algorithm (UAX #9) defines how left-to-right runs embed within a right-to-left line. That is transformation one: reordering.

Transformation two is contextual shaping. An Arabic letter takes a different glyph depending on whether it stands at the start, middle, or end of a word, or alone — the codepoint never changes, only the rendered form. A text pipeline that maps codepoints straight to default glyphs produces the disconnected output described above. Hebrew needs no joining but still needs reordering; Arabic needs both, which explains why it is the canonical test case.

Desktop development hides this machinery. When a VCL application draws Arabic to the screen, the operating system's text stack reorders and shapes it invisibly, which explains why the very string that renders perfectly in a TEdit comes out wrong in a naive PDF. A PDF content stream stores positioned glyphs, not editable text runs — whoever writes the stream owns the shaping, and that is the gap RtLTextOut exists to close.

RtLTextOut: reorder and join in a single call

HotPDF separates the Latin path from the complex-script path at the API level. TextOut draws what it is given, in the order it is given; RtLTextOut runs the reordering and contextual analysis first. The charset parameter of SetFont tells the engine which script rules apply: 178 selects Arabic processing, 177 selects Hebrew.

// Arabic: pass logical order; RtLTextOut reorders and joins
Pdf.CurrentPage.SetFont('Arial Unicode MS', [], 12, 178);
Pdf.CurrentPage.RtLTextOut(400, 700, 0, 'يوضح ملف PDF');

// Hebrew: reordering only, no contextual joining
Pdf.CurrentPage.SetFont('Arial Unicode MS', [], 12, 177);
Pdf.CurrentPage.RtLTextOut(400, 660, 0, 'קובץ PDF זה');

The trap that costs the most debugging time: RtLTextOut performs the reversal itself. Feeding it pre-reversed text — typically a leftover 'fix' from an earlier attempt with plain TextOut — double-reverses the line. It can even look correct for one pure-Arabic test string, then break on the first line containing Latin letters or digits, because mixed runs no longer follow UAX #9 ordering. Always pass logical order and let the API do the work.

Mixed-direction content is also where manual expectations go wrong in reviews: within a right-to-left line, numbers and embedded Latin words still read left to right. Reviewers unfamiliar with bidirectional layout routinely file that as a bug; it is the spec-correct behaviour, and worth a note in your acceptance documentation before the first native-speaker review.

Glyph coverage decides the outcome before shaping runs

Shaping selects glyphs; the font must actually contain them. The classic deployment failure is a report that renders perfectly on the developer's workstation — where Arial Unicode MS happens to be installed — and produces blank squares on the customer's server, where Windows silently substituted a font with no Arabic coverage. The remedy is to stop depending on installed system fonts and register a font file you ship:

// Ship a known font instead of relying on installed system fonts
Pdf.RegisterUnicodeTTF('C:\Fonts\NotoSansArabic.ttf');
Pdf.CurrentPage.SetFont('NotoSansArabic', [], 12);

// Audit coverage for the codepoints your data actually uses
GID := Pdf.GetUnicodeGlyphForCodepoint($0628);  // U+0628 ARABIC LETTER BEH
LogGlyphAudit($0628, GID);

Two version boundaries apply. Fonts registered this way must be embedded, and HotPDF's embedded Unicode font handling requires the document's PDF version to be 1.5 or later — relevant only if some downstream system pins your output to PDF 1.4. And the font licence must permit embedding: TrueType files carry embedding-permission bits, and a font that renders fine on screen may be legally unsuitable for distribution within customer documents.

GetUnicodeGlyphForCodepoint is the audit hook: walk the codepoint ranges your data uses at service startup and log the resolved glyph IDs, so a coverage gap appears in a log line during deployment instead of as missing characters in a customer invoice.

For Unicode text that is not right-to-left — CJK strings, Vietnamese diacritics, mixed European scripts — the plain pipeline applies: TextOut accepts a WideString and draws it through the registered font without bidirectional analysis. Keeping the two call paths distinct in report code, one routine for RTL runs and one for everything else, makes the locale behaviour explicit instead of burying it in a flag nobody remembers to set.

Reading order is also a document property

Glyph-level correctness is not the end of the job. ISO 32000-1 §12.2 defines a viewer preference, /Direction, that declares the document's predominant reading order. It changes no glyphs; it tells viewers how to order two-up page spreads, where to anchor progression in facing-page layouts, and which direction the UI should assume — details that matter for booklets and any document a user flips through.

// Declare right-to-left reading order at the document level
Pdf.Direction := RightToLeft;  // adds vpDirection to ViewerPreferences

Assigning Direction is enough on its own — the property setter adds vpDirection to ViewerPreferences automatically, so the preference reaches the file with a single line. The omission to watch for is skipping the declaration entirely, which is easy precisely because nothing visible changes on a single page; it surfaces only when someone prints a duplex booklet and the spreads come out mirrored.

Where HotPDF shaping stops

An honest capability map saves an evaluation week. RtLTextOut handles bidirectional reordering and Arabic contextual joining automatically. Optional typographic ligatures and broader OpenType feature application are not automatic: GetSingleSubstituteGlyph(GID, 'liga') resolves one substitution at a time — glyph ID in, feature tag alongside — which works for a known, finite ligature list you apply yourself, but it is not a general GSUB feature engine. For scripts whose shaping demands go further — Indic scripts with reordering vowel signs are the usual example — run a pilot with genuine customer strings before committing the locale, instead of extrapolating from Arabic results.

Verification has to be end-to-end, because a page can look right and still fail every downstream use. Three checks catch most of it: copy the text back out of Acrobat and compare codepoints with the source string; search within the document for a word that appears on the page; and review the output on a machine that does not have your development fonts installed. A native-reading colleague looking at one real document beats any amount of synthetic test data — schedule that review before the format ships, not after the first complaint.

Choose test strings deliberately instead of reusing whatever a translator sent last year. A useful minimum set per locale: a pure-script sentence, a sentence with embedded Latin brand names, a line with digits and currency, and names carrying diacritics or combining marks. Real customer names break shaping assumptions that filler text never touches — the regression corpus should grow by one entry every time a support case exposes a new pattern.

Font registration, subsetting, and the general text-drawing API are covered by the article on report output, fonts, and images with HotPDF; if the same documents must also meet accessibility profiles, the language tagging and structure requirements in the PDF/A and PDF/UA validation article stack on top of the shaping work described here.

The right-to-left and Unicode font APIs in this article ship with the HotPDF Component for Delphi and C++Builder; the product page links the full text-output reference.