The monthly statement run looked fine on every developer's screen, so it shipped. The print bureau returned the whole batch two days later: RGB images in a CMYK job and no /Trapped declaration. The cost was not the reprint; it was two days of a regulatory deadline. "Preflight" is the prepress term for catching exactly this class of problem before files leave the building, and the interesting engineering question for a Delphi team is where that check belongs when the PDFs are produced by your own code with a library like HotPDF rather than by a designer's desktop tools.
Prevention beats inspection when you own the generator
Classic preflight assumes a foreign file of unknown quality and inspects it after the fact. When your own application generates the document, that architecture is backwards: every property the inspector would check, font embedding, color space usage, output intents, metadata, was decided by your code milliseconds earlier. The cheapest preflight failure is the one made impossible at generation time.
It is worth being precise about what HotPDF does and does not offer here. The component ships a preflight report window as part of its GUI demo application, but there is no programmatic preflight API you can call from a service or a build script. That is less of a gap than it first appears, because for generated documents the robust pattern has two independent halves anyway: constrain the generator so it cannot emit non-compliant structures, then verify the output with a validator you do not maintain yourself. A library validating its own output is grading its own homework; an external tool gives you evidence a customer or auditor will accept.
The generation side: make compliance a configuration, not a review item
HotPDF's standards properties are the prevention layer. When PDFACompliance or PDFXCompliance is set before BeginDoc, the component enforces the corresponding rules during generation, embedding fonts, tracking DeviceRGB and DeviceCMYK usage against the declared output intent, and rejecting features the profile forbids. After saving, those same properties record what was enforced, which is exactly what your pipeline log needs:
// After EndDoc: record the enforced profiles with the run metadata
if Pdf.PDFACompliance <> '' then
Log('Generated as PDF/A level ' + Pdf.PDFACompliance);
if Pdf.PDFXCompliance <> '' then
Log('Generated as PDF/X profile ' + Pdf.PDFXCompliance);
Write these flags, the input data hash, and the HotPDF version into the same log line. When a validator later disagrees with the generator, that line tells you which template revision and which library version produced the disputed file, and a one-line log discipline replaces an afternoon of forensic guessing. The full configuration that backs these flags, output intents, ICC profiles, and tagging, is covered in our guide to PDF/A, PDF/X, and PDF/UA output with HotPDF.
Triage incoming files before they reach expensive checks
Many pipelines are not purely generative: customers upload PDFs, scanners deposit them, partners email them. Running a full structural validation on every incoming file wastes queue time on inputs that are not even openable. HotPDF's Direct File API reads file structure without loading the complete object tree, which makes it a cheap first gate:
function TriagePdf(Pdf: THotPDF; const FileName: string): Boolean;
var
Handle, Pages: Integer;
begin
Result := False;
Handle := Pdf.DAOpenFileReadOnly(FileName, '');
if Handle <= 0 then
Exit; // structurally unreadable: quarantine, do not validate
try
Pages := Pdf.DAGetPageCount(Handle);
Result := Pages > 0;
finally
Pdf.DACloseFile(Handle);
end;
end;
Two behaviors of this API shape the surrounding logic. DAOpenFileReadOnly stays flat-memory only for unencrypted inputs — pass a password and it falls back to a full parse internally — so route known-encrypted files through DecryptFile to produce a plain working copy first. And DAGetPageCount is only valid on a handle from a successful open, so keep the handle check strict rather than assuming a positive value. More patterns for this API are in the Direct File API article for large PDF workflows.
The verification side: veraPDF as a build step
For PDF/A and PDF/UA claims, veraPDF is the validator worth wiring into the pipeline: it runs headless, processes batches, emits machine-readable XML or JSON, and reports each failure by ISO clause number, so a finding like a rule failure against ISO 19005-1 clause 6.2.2 maps directly to a known generator setting. Invoking it from Delphi is ordinary process control:
function RunVeraPdf(const PdfFile, ReportFile: string): Cardinal;
var
Cmd: string;
SI: TStartupInfo;
PI: TProcessInformation;
begin
Cmd := Format('cmd /c verapdf.bat --format xml "%s" > "%s"',
[PdfFile, ReportFile]);
FillChar(SI, SizeOf(SI), 0);
SI.cb := SizeOf(SI);
if not CreateProcess(nil, PChar(Cmd), nil, nil, False,
CREATE_NO_WINDOW, nil, nil, SI, PI) then
RaiseLastOSError;
try
WaitForSingleObject(PI.hProcess, 120000); // bound the wait per file
GetExitCodeProcess(PI.hProcess, Result);
finally
CloseHandle(PI.hThread);
CloseHandle(PI.hProcess);
end;
end;
The timeout is not decoration. A malformed file can send any parser into pathological territory, and an unbounded wait inside a queue worker takes the whole queue down with it. Bound the wait, treat a timeout as a failure with its own code, and quarantine the input for human inspection. Parse the XML for the rule identifiers rather than scraping human-readable messages; rule IDs are stable across validator releases, message wording is not, and stable codes are what support staff can search past tickets for.
Batch behavior deserves the same care as single-file correctness. Run the validator as one process per file rather than one process per batch, so a pathological input costs you that file's timeout instead of the batch; cap concurrent validator processes to the core count, since XML report generation is CPU-bound; and impose a file-size ceiling at intake, because a 2 GB scanned monster will dominate the queue however well-behaved the parser is. None of this is preflight logic, but it is the difference between a gate that survives month-end volume and one that gets disabled the first time it blocks the pipeline at 2 a.m.
PDF/X is the gap in this story: veraPDF does not cover it, and the practical check remains Acrobat's Preflight with the matching ISO 15930 profile. Acrobat is interactive, so use it for sampling, first article of a new template, plus a small random draw per batch, while the automated gate covers what can be automated. A sampled manual check that actually happens beats a theoretical full automation that nobody finished building.
What the report must contain to be worth keeping
A preflight gate produces value twice: once when it blocks a bad file, and again months later when someone asks why a file was accepted. The second use is the one that dictates the report format. Keep, for every checked file: the input hash, the generator's compliance flags and version from the log line above, the validator name and version, the profile checked against, the pass or fail outcome, and the list of failed rule IDs with page numbers where the validator provides them. Store the report next to the artifact it describes, not in a separate system that will be retired before the archive is.
Accepted deviations need paperwork too. When a customer insists on shipping a file the gate dislikes, record who approved it, why, and until when, and attach that waiver record to the report rather than weakening the rule globally. A waiver with an owner and an expiry date is a managed exception; a commented-out check is a future incident.
Failures deserve one more step: copy the failing file into a named regression folder. Every preflight incident we have helped debug eventually came down to a reproducible input, and teams that kept those inputs fixed regressions in hours instead of rediscovering them in production.
FAQ
Can HotPDF validate an arbitrary third-party PDF programmatically?
No. The preflight report in the product is a GUI demo feature, not a callable API. The supported automation pattern is generation-side enforcement via the compliance properties plus an external validator such as veraPDF for the formal verdict.
Is veraPDF enough for print jobs?
It covers PDF/A and PDF/UA. For PDF/X print masters, run Acrobat Preflight with the profile your printer specifies, and confirm the output intent matches the press characterization they expect.
What should fail the build: errors only, or warnings too?
Gate on rule failures for the profile you claim compliance with, and log warnings with trend monitoring. Promoting every warning to a blocker trains people to bypass the gate, which is worse than having none.
Product reference
The compliance properties and Direct File API used in this pipeline belong to the HotPDF Component for Delphi and C++Builder; its documentation describes each call shown here in full.