Tekninen artikkeli

PDF/VT Variable Data Printing in Delphi with PDFium VCL

A transactional print shop sends back your 80,000-page statement run with a one-line rejection: "not PDF/VT, RIP cannot cache." The file opens fine in every viewer on your desk, the colors are right, the data merged correctly. None of that is what the digital press asked for. High-speed variable-data printing lives or dies on the press being able to recognize that the customer-logo block on page 1 is byte-for-byte the same object as the one on page 40,000, render it once, and reuse it. PDF/VT is the standard that makes that promise machine-checkable, and "looks correct" is exactly the trap, because the structure the RIP reads is invisible on screen.

PDFiumPas exposes that structure through a small surface on TPdf: SaveAsPdfVT writes it, ValidatePdfVT checks it. This article is about what those two methods actually put on disk and inspect, where ISO 16612-2 is stricter than it first looks, and which parts are honest structural anchors rather than a full preflight you can bill a client against.

What PDF/VT standardizes, and why PDF/X comes first

PDF/VT (ISO 16612-2:2010) is not a fresh file format. It is a layer of optimization metadata bolted onto a PDF/X file, and that ordering is load-bearing. The standard defines three conformance levels, but only two of them name a PDF file: PDF/VT-1, a single self-contained document, and PDF/VT-2, a file-set model where pages reference shared external resources. The third token you may see, PDF/VT-2s, is not a file-level value at all; it lives in a MIME stream header described in Annex A. If you find code stamping GTS_PDFVTVersion = "PDF/VT-2s" into a document's XMP, that code is wrong.

The non-negotiable rule for a single file is the PDF/X base. ISO 16612-2 §6.2.1 requires every PDF/VT-1 file to also be a valid PDF/X-4 file. PDF/VT-2's file set, by §6.2.2, must instead sit on PDF/X-4p, PDF/X-5g, or PDF/X-5pg. This is why a PDF/VT writer cannot just append a couple of identifier keys: it has to carry the entire PDF/X-4 marker set with it, which means an OutputIntent, an embedded ICC destination profile, the matching XMP and document Info entries, a trailer /ID, and no encryption. Skip any of those and you have a file that claims PDF/VT and fails the moment a conforming consumer checks the base. PDFiumPas treats the PDF/X-4 layer as part of the PDF/VT save, so you do not call a separate SaveAsPdfX first; the injector writes both layers in one pass.

Writing a file with SaveAsPdfVT

The minimal call needs nothing but an active document, because TPdfVTSaveOptions.Default supplies a built-in sRGB ICC profile and conformance pvc1. The save runs three steps internally: it strips any security (injecting plaintext markers into an encrypted object stream would corrupt it), it bridges the document's existing Info dictionary and trailer /ID into the marker set so the XMP and Info values agree, then it appends the PDF/X-4 and PDF/VT objects through an incremental update.

var
  Pdf: TPdf;
begin
  Pdf := TPdf.Create(nil);
  try
    if Pdf.LoadFromFile('statements-merged.pdf') then
    begin
      // Default options: built-in sRGB OutputIntent, PDF/VT-1, synthesised DPart
      if Pdf.SaveAsPdfVT('statements-pdfvt.pdf') then
        Writeln('PDF/VT-1 written')
      else
        Writeln('Save failed (document not active?)');
    end;
  finally
    Pdf.Free;
  end;
end;

For real production output you almost always want to override the OutputIntent with your press's characterization, not the generic sRGB fallback. Supply the ICC bytes and the condition identifiers through TPdfVTSaveOptions:

var
  Pdf: TPdf;
  Opt: TPdfVTSaveOptions;
  Icc: TBytes;
begin
  Pdf := TPdf.Create(nil);
  try
    Pdf.LoadFromFile('directmail-merged.pdf');
    Icc := LoadIccProfile('GRACoL2013_CRPC6.icc');  // your own loader

    Opt := TPdfVTSaveOptions.Default;
    Opt.Conformance := pvc1;            // pvc2 is normalised to pvc1 on write
    Opt.IccProfileData := Icc;
    Opt.OutputConditionIdentifier := 'CGATS21_CRPC6';
    Opt.OutputCondition := 'Commercial print, coated, CRPC6';
    Opt.RegistryName := 'http://www.color.org';
    Opt.Title := 'Spring 2026 Direct Mail Run';
    Opt.Trapped := ptvFalse;           // PDF/X Info /Trapped state

    Pdf.SaveAsPdfVT('directmail-pdfvt.pdf', Opt);
  finally
    Pdf.Free;
  end;
end;

One detail in that snippet is a deliberate guardrail rather than a limitation you can argue with. Setting Opt.Conformance := pvc2 does not produce a PDF/VT-2 file. The writer normalizes any non-pvc1 request back to pvc1, because PDF/VT-2 is a file-set format and a single-file writer that appends one output document physically cannot assemble the external resource set §6.2.2 demands. The pvc2 value exists for the read path, so ValidatePdfVT can recognize and report an existing file-set document; it is not a write target.

The DPart tree: structure the RIP actually reads

The heart of PDF/VT is the Document Part (DPart) hierarchy. It is what lets a press partition a long run into records, group records into recipients or mail bundles, and attach Document Part Metadata so downstream equipment can route and bill each piece. ISO 16612-2 §6.5 lays out the wiring: the catalog carries a /DPartRoot, the root DPart node carries /DPartRootNode and a /NodeNameList naming each hierarchy level, leaf DParts cover ranges of the page tree, and every page that belongs to a part points back at its leaf through a page-level /DPart entry.

When your source document already contains a usable hierarchy, SaveAsPdfVT preserves it. When it does not, the writer synthesizes a minimal one: a single document-level DPart that spans the current page tree in order, with a /DPart back-reference appended to each live page object and a one-level /NodeNameList [/Document]. Be honest with yourself about what that minimal tree is. It is a structural anchor that satisfies §6.5's shape requirements; it is not business metadata. It cannot invent recipients, mail-piece boundaries, or product batches, because that information was never in the source. If you have per-recipient data, you are expected to build a deeper DPart tree yourself and extend the /NodeNameList to match the levels you create.

Validation that goes past key-presence

ValidatePdfVT returns a TPdfVTValidationResult record with three things: the detected Conformance, a set of Issues, and an IsCompliant helper that is true only when conformance is a real level and the issue set is empty. The issue enumeration is deliberately specific, so a failed result tells you which clause you missed rather than just "invalid":

var
  Pdf: TPdf;
  Res: TPdfVTValidationResult;
begin
  Pdf := TPdf.Create(nil);
  try
    Pdf.LoadFromFile('statements-pdfvt.pdf');
    Res := Pdf.ValidatePdfVT;

    if Res.IsCompliant then
      Writeln('PDF/VT compliant: ', VTLevelName(Res.Conformance))
    else
    begin
      if pvviMissingDPartRoot in Res.Issues then
        Writeln('DPart hierarchy missing or unusable');
      if pvviMissingPdfXIdentifier in Res.Issues then
        Writeln('PDF/X-4 base identifier absent');
      if pvviMissingOutputIntent in Res.Issues then
        Writeln('OutputIntent / ICC profile missing');
      if pvviEncryptionPresent in Res.Issues then
        Writeln('Encrypted - PDF/X forbids this');
    end;
  finally
    Pdf.Free;
  end;
end;

The two checks worth understanding in depth are the conformance pairing and the DPart walk, because both used to be too lenient and were tightened to match the spec. On the pairing side, the validator does exact matching, not "any PDF/X will do": a PDF/VT-1 file is only accepted on a PDF/X-4 base, and a PDF/VT-2 file only on PDF/X-4p, PDF/X-5g, or PDF/X-5pg. A PDF/VT-1 marker sitting on a PDF/X-1a base is reported, not waved through.

The DPart walk is where most of the rigor lives. It is not enough for the catalog to have a /DPartRoot key, because a forged empty object or one with no page links still cannot be consumed. HasValidDPartHierarchy and the recursive ValidateDPartNode trace the whole structure: they follow parent links, reject duplicate children and cycles, enforce that /Start and /DParts are mutually exclusive, and require leaf page ranges to cover the page tree in depth-first order with each page's /DPart pointing at the leaf that contains it. All of those internal faults collapse to the single pvviMissingDPartRoot issue bit rather than expanding the public enum, so treat that one flag as "the DPart hierarchy is unusable," not literally "the root key is absent."

Three syntactic traps the validator now enforces

Successive passes against §6.5 Table 4 turned up shapes that earlier versions accepted but the standard does not. These are the kind of thing a hand-built DPart tree gets wrong, so they are worth calling out explicitly:

  • /DParts is an array of arrays, not a flat array. Each element of the outer array must itself be an indirect-reference array. A flat /DParts [9 0 R] is rejected; the conforming shape is /DParts [[9 0 R] [10 0 R]]. This stops a non-hierarchical structure from masquerading as a valid level.
  • /End only marks a genuine multi-page range. A leaf DPart may carry /End only when it also has /Start, and /End must fall later than /Start in page-tree order. A degenerate /Start 3 0 R /End 3 0 R now makes the hierarchy unusable instead of reading as a one-page part.
  • /NodeNameList names must survive PDF name unescaping as XML NMTOKENs. A name like /Bad#20Name expands to one containing a space, which is not a valid token. The implementation does a light ASCII check (letters, digits, ., -, _, :, plus non-ASCII bytes) that catches whitespace and delimiter mistakes without rejecting legitimate localized or vendor-specific names.

XMP markers: two ways to write the same property

PDF/VT identification lives in XMP under the pdfvtid namespace, specifically GTS_PDFVTVersion and GTS_PDFVTModDate, alongside the standard xmp:CreateDate and xmp:ModifyDate. A subtlety that causes false "missing" reports in naive readers is that any of these can be serialized two ways: as element text (<pdfvtid:GTS_PDFVTVersion>PDF/VT-1</pdfvtid:GTS_PDFVTVersion>) or as an RDF attribute on the description element. PDFiumPas reads both forms, so a file that another tool wrote in attribute style is not penalized. It also enforces the §6.3 consistency rule that GTS_PDFVTModDate must equal xmp:ModifyDate; a mismatch raises pvviModDateMismatch.

One more rule from the same clause: an unknown GTS_PDFVTVersion value is preserved as pvcUnknown rather than being folded back to pvcNone. That distinction matters operationally. pvcNone means "no PDF/VT marker at all, an ordinary PDF," while pvcUnknown means "something stamped a version this validator does not recognize" (the PDF/VT-2s case among them). Conflating the two would hide a malformed file inside the same bucket as a plain document.

Where the guarantee ends

It is worth being precise about the boundary of what these methods promise, because variable-data print compliance has real money attached to it. The DPart and pairing checks are byte-level structural validation. They confirm that the optimization skeleton, the PDF/X-4 base markers, the OutputIntent, and the XMP are present and internally consistent. They are not a content-level PDF/X-4 preflight: they do not verify that every color is within the declared output condition, that all fonts are embedded, or that no prohibited transparency-blending edge case slipped in. For a job you are putting on a contract press, pair PDFiumPas's structural validation with a dedicated PDF/X preflight engine and a test print, the same way you would sanity-check any other compliance claim. The structural layer catches the failures that silently break RIP caching; it is one half of a complete check, not the whole of it.

If you are building these checks into a wider release gate, the same byte-level scanning approach underpins the library's other standards work, including validating object and cross-reference streams before a file ever reaches preflight, and the shared-object discipline behind reusable page stamps with Form XObjects that makes a document RIP-friendly in the first place. The PDF/VT and PDF/X save and validation APIs described here are part of the PDFium VCL component for Delphi and C++Builder, whose product page carries the full compliance reference.