Technical Article

Full Justification for PDF Text in Delphi with HotPDF

Full justification is the layout that makes a column of text line up on both the left and right edges, the look you expect from a printed book or a formal report. It is easy to describe and surprisingly easy to get wrong, because the answer to the question "where does the extra space go" is not the same for English as it is for Japanese, and because the naive way to measure each line turns a fast page into a slow one. HotPDF gives you script-aware justification through a single box-layout call, and underneath that call sits a textbook performance fix worth understanding on its own

This article walks through both. First, the typographic rule that decides how slack is distributed for scripts with word gaps versus scripts without them. Second, the measurement change that cut the per-page cost of justification by roughly eighty times with no visible difference in the output. Both matter if you generate documents at volume and want them to read like real typesetting rather than monospaced output stretched to fit

What full justification actually requires

A line of text drawn at its natural width almost never reaches the right edge of its column. There is always a remainder, the slack, between where the last glyph ends and where the column boundary sits. Left alignment leaves that slack on the right. Right alignment moves it to the left. Centering splits it. Full justification removes it by widening the line itself until both edges meet the box, and the only honest way to do that is to push the glyphs apart from the inside

The rule that separates good justification from bad is where you put the slack. A script that writes words with spaces between them, such as English and the rest of the Latin family, has natural seams at every inter-word space. Widening those spaces is invisible to the eye because readers already accept that word gaps vary. A script that writes without word gaps, such as Chinese Han characters, Japanese kana, or Korean Hangul, has no such seams. There the slack has to be spread evenly between adjacent glyphs, which is the principle Japanese typesetters call kintou-waritsuke, even spacing. Putting Latin-style word-gap stretching on a CJK line, or cramming all the slack into the one place a CJK line happens to contain a space, produces the rivers and gaps that mark amateur output

How HotPDF decides where the space goes

HotPDF makes that decision per gap, not per line. When it justifies a line it walks every adjacent pair of glyphs and asks whether a stretchable boundary sits between them. A boundary is stretchable when either side is a space or tab, the Latin case, or when both sides are CJK-breakable characters, the even-spacing case. It counts those boundaries, divides the line's slack equally among them, and adds that share to each qualifying gap

The consequence falls out naturally. An English line has stretchable boundaries only at its word spaces, so all the slack lands there and the words spread apart while letters inside each word keep their natural spacing. A Han or kana line has a stretchable boundary between nearly every pair of glyphs, so the slack distributes evenly across the whole line, exactly the even inter-glyph spacing those scripts call for. A line that is a single long Latin word with no internal space has no stretchable boundary at all, so HotPDF leaves it at its natural width rather than tearing the word apart letter by letter. The same logic handles mixed Latin and CJK runs in one line without special-casing, because the decision is local to each boundary

One boundary is deliberately excluded everywhere. The position after the final glyph of a line is never treated as a gap, because stretching there would just reintroduce a right-hand remainder, which is the opposite of justification

Why the last line is left alone

The final line of a paragraph is special, and getting it wrong is the most common justification bug. A paragraph's last line is usually short, often only a few words, and stretching it to the full column width drags those words across the page into a sparse, broken row. Correct typography leaves the last line at its natural width, aligned to the left

HotPDF detects the trailing line by position. As it wraps the text into lines, it knows when the line it just split off reaches the end of the supplied string. That final line is emitted with plain left alignment and keeps its natural width. Every line before it is justified to both edges. Hard line breaks you write into the text are honored as written, so an intentional short line is never stretched either. The reader sees a clean rectangular block of text whose last line ends naturally, which is what the eye expects

The measurement cost that made justification slow

To justify a line you must know its exact width, and you must know each glyph's advance so you can place the extra space precisely. The first implementation got those numbers the obvious way. It measured the whole line with a full Unicode width query, then measured prefix after prefix to recover each glyph's advance by differencing. For a line of N glyphs that is N+1 calls into the measurement engine, and each call is a full GDI round-trip, asking the operating system to shape and measure text and hand the answer back

Per line that sounds cheap. Across a page it is not. Take a dense A4 page of body text, roughly forty-five lines of about eighty characters each. At N+1 round-trips per line that is around 81 round-trips for every line and roughly 3,645 for the page, almost all of them spent re-measuring text the engine had already looked at moments earlier. On a batch job producing thousands of pages, that overhead dominates the layout time, and every round-trip crosses the boundary between your process and the graphics subsystem

One call instead of N plus one

The fix is the kind of change that looks small and pays off large. GDI can already report a string's total width and the position of every glyph in a single query. HotPDF exposes that through GetWideCharAdvances, which fills an array with each glyph's natural advance, kerning included, and returns the total width, in one call rather than N+1. The justification routine, _HPDFEmitJustifiedWideLine internally, asks for all the advances once, computes the slack, distributes it across the stretchable boundaries, and emits the line

For that same A4 page the per-line measurement drops from about 81 round-trips to one, so the page falls from roughly 3,645 round-trips to about 45, close to an eighty-fold reduction. The output is byte-for-byte identical, because nothing about the measurement changed except how many times it is requested. The same GDI engine, the same font metrics, the same kerning feed the same numbers. Only the round-trip count fell. When a measurement is already correct, the right optimization is to stop asking for it repeatedly, not to approximate it

How the line reaches the page

Once the slack is apportioned, HotPDF emits the line with ExtTextOut and a per-glyph advance array, the Dx array. Each entry is the distance from one glyph's origin to the next, which is that glyph's natural advance plus its share of the slack when a stretchable boundary follows it. This maps directly onto the PDF imaging model. Positioned text is written with the TJ operator, an array that interleaves glyph runs with explicit horizontal adjustments, and the Dx values become exactly those adjustments. That is why the extra space lands between glyphs at precise sub-point positions rather than being faked with padding characters, and why a justified HotPDF line measures correctly if a downstream tool reads it back

You do not call ExtTextOut yourself for justified paragraphs. The entry point is WideTextOutBox, which wraps a Unicode string into a box and applies the alignment you ask for. It splits the text into lines that fit the box width, places each line down the box height, and returns the number of characters it managed to fit before running out of vertical room. The alignment is chosen by the justification enum

type
  THPDFJustificationType = (jtLeft, jtCenter, jtRight, jtJustify);

The first three are self-explanatory left, centered, and right alignment. The fourth, jtJustify, is the full both-edge justification described here, and it is the value WideTextOutBox reads to switch on the script-aware spacing

Justifying a paragraph in practice

A complete example creates a document, sets a font, and pours a paragraph into a box with full justification. The same code justifies Latin and CJK text without a flag change, because the script-awareness lives below the API

uses
  HPDFDoc;

procedure JustifyParagraph;
var
  Pdf: THotPDF;
  Body: WideString;
begin
  Pdf := THotPDF.Create(nil);
  try
    Pdf.FileName := 'Justified.pdf';
    Pdf.BeginDoc;
    Pdf.CurrentPage.SetFont('Arial', 11);

    Body :=
      'Full justification spreads the slack on each filled line so both ' +
      'edges meet the column, while the last line keeps its natural width. ' +
      'For scripts with word gaps the space lands between words; for ' +
      'scripts without them it spreads evenly between glyphs.';

    // X, Y, LineSpacing, BoxWidth, BoxHeight, Text, Align
    Pdf.CurrentPage.WideTextOutBox(72, 72, 4, 380, 240, Body, jtJustify);

    Pdf.EndDoc;
  finally
    Pdf.Free;
  end;
end;

To draw the same block left-aligned, centered, or right-aligned, change only the final argument to jtLeft, jtCenter, or jtRight. The wrapping, the line placement, and the return value stay the same. The measured width that drives all four paths comes from GetWideTextWidth, the Unicode-aware width query that measures a WideString correctly where the older byte-wise measurement would mis-size anything past Latin-1, which is what makes the box wrap CJK and surrogate-pair text at the right place to begin with

Justification is one layer of a larger text-shaping stack. When a line contains scripts that reorder or join their glyphs, the spacing decisions here sit on top of the work described in our article on complex-script text shaping, and when a font carries typographic variants you want to select, see how to drive OpenType GSUB stylistic alternates. All of it ships in the HotPDF Component for Delphi and C++Builder, alongside the wider text, layout, and document APIs covered across this blog