Artigo Técnico

PDF Redaction and N-up Stitch in Delphi with HotPDF

A request lands on your desk: take a batch of already-rendered statements, black out the account numbers, and ship two pages per sheet to save paper. Both halves of that task are content-stream surgery on a PDF you did not create, so there is no friendly page canvas to draw on and no font manager to lean on. You are editing the object graph of a loaded document directly, appending raw drawing operators to a page that some other tool laid out. HotPDF exposes exactly two entry points for this, and the more dangerous of the two is the one that looks harmless.

HotPDF is a native VCL PDF component for Delphi and C++Builder. Its round-nine loaded-document API added the first methods that create brand-new content on a page you opened from disk rather than one you built from scratch. Two of them are the subject here: RedactLoadedRect, which paints an opaque rectangle over a region, and StitchLoadedPage, which scales one page and draws it onto another. Both work by writing ISO 32000-1 §8.5 content-stream operators into the page's /Contents stream. Understanding what those operators do, and just as importantly what they do not, is the difference between a working tool and a data breach.

Appending operators to a loaded page

When you build a page with the normal HotPDF API, the component owns the content stream and serializes your TextOut and vector calls for you. A loaded page is different: its /Contents is an existing stream object, possibly shared, possibly part of a content array, and you have to splice into it without corrupting what is already there. Round nine introduced three small helpers that make that safe. NewIndirectStream allocates a fresh indirect THPDFStreamObject with an empty buffer and a /Length 0 entry; ResolveLoadedStream follows an indirect reference down to the underlying stream; and AppendLoadedStream writes raw bytes at the end of the stream and rewrites /Length so the saved object stays well-formed.

The pattern both public methods follow is the same. Find the page's /Contents, resolve it to a stream, and if there is no usable stream, create one and attach it. Then append the operators. Because the new bytes go at the end of the stream, the painter's model guarantees they render on top of everything the original layout drew. That ordering is the entire mechanism behind the redaction rectangle, and it is also the reason that rectangle is not what most people assume it is.

RedactLoadedRect: an opaque cover, not a delete

RedactLoadedRect takes a zero-based page index, four user-space coordinates, and three colour components in the 0–1 range:

var
  Pdf: THotPDF;
begin
  Pdf := THotPDF.Create(nil);
  try
    if Pdf.LoadFromFile('statement.pdf') > 0 then
    begin
      // Cover the account-number band on page 1 with solid black.
      // Coordinates are PDF user space: origin bottom-left, points.
      Pdf.RedactLoadedRect(0, 56, 690, 320, 706, 0, 0, 0);
      Pdf.SaveLoadedDocument('statement-covered.pdf');
    end;
  finally
    Pdf.Free;
  end;
end;

Under the hood the method emits three operators into the content stream: a fill-colour set in DeviceRGB (r g b rg), a rectangle path (x y w h re), and a fill (f). The width and height are derived as X2 - X1 and Y2 - Y1, so you pass two opposite corners and let the method compute the extent. Pass 0, 0, 0 for the colour and you get a black bar; pass 1, 1, 1 for a white one that matches a white page. The coordinates are the loaded page's own user space, which means the origin is the bottom-left corner and the units are points, and it also means you need the page's /MediaBox to place anything precisely; GetLoadedPageBox with pbMediaBox gives you that.

Read this twice: a filled rectangle covers content visually, it does not remove it. The text, image, or vector art beneath the rectangle is still present in the PDF, still in the object graph, still extractable by anyone who copies the page, runs a text extractor, or simply deletes your rectangle from the content stream. This is visual masking, not redaction in the legal or security sense. If you are hiding genuinely sensitive data — account numbers, medical records, identities, anything regulated — covering it with a black box and shipping the file is a data leak waiting to be discovered. True redaction requires deleting the underlying content objects, not painting over them.

The method name says "Redact", and that is a useful warning about how the result will be misread, not a promise about what it deletes. The implementation is honest about this in its own comment: it calls itself the "visual redaction primitive" and notes that content-removing redaction needs a content-stream interpreter that walks and rewrites the existing operators. HotPDF's loaded-document path does not do that here. So the safe rule is narrow: use RedactLoadedRect for non-sensitive cosmetic masking — hiding a draft watermark, blanking a region before a screenshot, covering an obsolete logo on an internal proof. The moment the thing under the box would matter if it leaked, this method is the wrong tool, and the right answer is to regenerate the document without the data or to use a real content-removal pipeline.

StitchLoadedPage: scale, translate, draw

N-up imposition is the friendlier problem because nothing is hidden, only rearranged. StitchLoadedPage takes a target page index, a source page index, an X/Y offset, and a scale factor, and it draws the source page onto the target at that position and size:

// Overlay page 2 (index 1) onto page 1 (index 0),
// scaled to 70% and nudged up-right.
Pdf.StitchLoadedPage(0, 1, 40, 380, 0.7);

// Convenience 2-up: source page on the right half of the target.
Pdf.StitchLoadedPageSideBySide(0, 1);

The operator string it appends is a standard transform-and-paint sequence: q to save graphics state, a cm matrix carrying the scale on the diagonal and the offset in the translation slots, /StitchSrc Do to invoke an external object, and Q to restore state. The q/Q pair matters: it isolates the transform so the stitched page does not bleed its coordinate system into anything appended afterward. The method also guards the obvious mistakes — indices out of range, a target equal to the source, a non-positive scale (which it clamps to 1.0) — and exits quietly rather than raising, so check your inputs because a silent no-op looks identical to success.

StitchLoadedPageSideBySide is a thin convenience over the general method. It reads the target's media-box width, halves it, and calls StitchLoadedPage with that half-width as the X offset and a fixed scale of 0.5, putting the source on the right half. That hard-coded 0.5 assumes the source and target share a width; if they do not, the source will not fill its half cleanly, and you will want the general StitchLoadedPage with a scale you compute yourself from both media boxes.

The simplified XObject strategy and its ISO trade-off

Here is where the implementation makes a deliberate shortcut you need to know about before you trust the output across viewers. A correct N-up imposition wraps the source page's content in a Form XObject — a self-contained drawable object that ISO 32000-1 §8.10.1 says must carry /Type /XObject, /Subtype /Form, and its own /BBox clipping box. HotPDF's round-nine stitch does not build that wrapper. Instead it registers the source page dictionary itself directly under the target's /Resources /XObject with the name StitchSrc, then draws it with Do. A page dict and a Form XObject share enough of their content model — both reference a content stream and a resource dictionary — that many readers will render the result.

But it is not a conforming Form XObject. It lacks the /Subtype /Form marker and its own /BBox, which means a strict consumer is within its rights to ignore the Do or to clip it differently than you expect. The TechnicalNotes for this round say so plainly: the approach "renders under most readers" but is "not a strictly ISO-compliant Form XObject", and full compliance requires synthesizing a real Form XObject stream as a separate step. So treat the stitch output the way you would treat any non-conforming construct: verify it in the specific viewers your customers run, not just the one on your machine, and if you need archival or strict-validator-clean PDFs, do not rely on this path. The same discipline applies to anything you build on the loaded object graph, which is why a PDF preflight pass in Delphi earns its place in the release pipeline whenever you mutate documents programmatically.

Where these fit, and where they do not

Both methods are content-stream tools, so the mental model is the same one you use for direct drawing. If you have built pages from scratch with the component, the vector and colour operators behind these calls will look familiar from HotPDF canvas drawing in Delphi; the difference is only that here you are appending to a stream someone else authored rather than one you own. Keep three boundaries in mind:

  • Redaction is cosmetic. RedactLoadedRect paints over content and never deletes it. For anything sensitive, regenerate the source or use real content removal — a black box is not security.
  • Stitch is non-conforming by design. The source page is referenced as a pseudo-XObject without the §8.10.1 /Subtype /Form and /BBox, so confirm rendering in your target viewers and avoid it where strict validation is required.
  • Coordinates are page user space. Bottom-left origin, points, driven by the page's own media box. Read the box with GetLoadedPageBox before you place anything, because the page you loaded may not be the size you assumed.

Used within those limits, the pair covers a real workflow: rearrange pages for printing, mask non-confidential regions, and write the result back with SaveLoadedDocument — all without a full re-render. The loaded-document API that includes these stitch and mask primitives ships with the HotPDF Component for Delphi and C++Builder, alongside the form-field, annotation, and FDF methods from the same round.