Object Streams and Incremental Updates in Delphi with HotPDF

Open any PDF that has lived through a few years of production use and you are likely to find more bookkeeping than content: thousands of small dictionary objects, each preceded by its own obj header, plus a cross-reference table that costs 20 bytes of ASCII per object. In one support case we analysed, a 58 MB policy archive used less than half of its bytes on actual page content; the rest was structural overhead and stale revisions. HotPDF, a native VCL PDF library for Delphi and C++Builder, exposes the two file-format mechanisms that address both halves of that problem: object streams for compact storage, and incremental updates for append-only modification that leaves the earlier bytes intact.

UK teams should align this hotpdf object streams incremental updates workflow with local governance, audit, and data quality requirements before production release

How object streams and xref streams reshape the file

Until PDF 1.4, every indirect object sits uncompressed in the body, and the cross-reference table at the end is a fixed-width text structure: exactly 20 bytes per entry, no compression allowed. A document with 200,000 objects therefore holds roughly 4 MB of xref data alone, before any glyph is drawn. PDF 1.5 introduced two alternatives, defined in ISO 32000-1 §7.5.7 and §7.5.8: object streams (/Type /ObjStm), which gather non-stream objects into a single Flate-compressed container, and cross-reference streams, which store the lookup table itself as compressed binary with variable-width fields.

The savings are greatest where naive generators do the most damage: form-heavy documents with thousands of field dictionaries, deep outline trees, and tagged-PDF structure elements. Those objects are individually tiny and highly repetitive, which makes them ideal Flate input once packed together. Page content streams were already compressible before PDF 1.5, so object streams do not reduce image-dominated files much; they shrink structure-dominated files substantially.

In HotPDF the two features are switched on with a pair of properties, and the ordering dependency matters:

var
  Pdf: THotPDF;
begin
  Pdf := THotPDF.Create(nil);
  try
    Pdf.FileName := 'catalog-2026.pdf';
    Pdf.UseXRefStream := True;      // binary xref, prerequisite for ObjStm
    Pdf.UseObjectStreams := True;   // pack objects into /Type /ObjStm
    Pdf.BeginDoc;
    Pdf.CurrentPage.SetFont('Arial', [], 11);
    Pdf.CurrentPage.TextOut(50, 760, 0, 'Compressed structure demo');
    Pdf.EndDoc;                     // emits XRefStm + ObjStm containers
  finally
    Pdf.Free;
  end;
end;

UseObjectStreams requires UseXRefStream to be True, because a compressed object is addressed by a type-2 xref entry (object-stream number plus index), and type-2 entries simply cannot be expressed in a classic 20-byte text table. Setting UseObjectStreams alone buys you nothing; setting both before BeginDoc is the working configuration.

The compatibility boundary at PDF 1.4

Both properties default to False for a reason that bites teams with legacy integrations. A reader limited to PDF 1.4 semantics does not report 'compressed objects unsupported' when it meets an xref stream; it typically reports a damaged cross-reference table or refuses the file outright, because the trailer keyword layout it expects is not there. If your output feeds an old fax gateway, a hardware printer with an embedded interpreter, or a downstream parser written against PDF 1.4, leave both flags off for that channel and accept the larger file. For archival and web delivery channels, where every mainstream viewer has handled PDF 1.5 for two decades, enabling them is close to free compression.

One operational side effect is worth flagging to your support team: once dictionaries are packed into object streams, byte-level diffing of two generated files becomes meaningless, because a one-field change can re-flate an entire container. Support investigations should compare documents logically, by object content, instead of with a binary diff.

Why incremental updates exist: offsets, signatures, audit trails

A digital signature in PDF covers an explicit /ByteRange: two spans of the physical file, measured in absolute byte offsets, that the CMS digest was computed over. Rewrite the file, even with visually identical content, and every offset shifts; the digest no longer matches and the signature reports as broken. This is the concrete reason ISO 32000-1 §7.5.6 defines incremental updates: changed and added objects are appended after the existing %%EOF, followed by a new cross-reference section whose /Prev entry points back at the previous one. The original bytes are never touched, so a previously signed revision stays verifiable, and Acrobat can show each signed revision separately in the signature panel.

HotPDF wraps this as a dedicated entry point:

Pdf.BeginIncrementalUpdate('contract-signed.pdf');
Pdf.AddPage;
Pdf.CurrentPage.SetFont('Arial', [], 10);
Pdf.CurrentPage.TextOut(50, 760, 0, 'Addendum recorded 2026-06-11');
Pdf.SaveIncrementalUpdate('contract-updated.pdf');  // appends the delta only

Two details are easy to get wrong here. Firstly, BeginIncrementalUpdate must be given the original file name; the appended xref section stores offsets that are only valid relative to those exact original bytes. Secondly, the save is append-only by definition, so the output is always larger than the input. That is not a defect to optimise away; it is the property that keeps earlier signed revisions intact.

Editing loaded documents: LoadFromFile, not BeginDoc

A separate trap catches developers who learned HotPDF through the generation API. BeginDoc starts a new document; it is the wrong call when the goal is to modify an existing one. Document surgery goes through the loaded-document path:

PageCount := Pdf.LoadFromFile('base.pdf');
Pdf.InsertPagesFromDocument(OtherDoc, '1-3', 5);  // pages 1-3 after page 5
Pdf.MovePage(2, 5);
Pdf.SaveLoadedDocument('modified.pdf');

The symptom of mixing the two models is a file that contains only your new content and none of the original, because BeginDoc happily created a fresh document next to the one you thought you were editing. When reviewing code, treat LoadFromFile + SaveLoadedDocument as one paired vocabulary and BeginDoc + EndDoc as another; a procedure that uses both for the same document is almost always a bug.

Containing growth: when to compact an appended file

Append-only saving has a long-term cost. A nightly job that stamps a status line onto the same PDF produces 365 revisions a year, and each revision drags a new xref section with it. Once the revision history has served its purpose, and provided no signature must survive, the file can be compacted by re-serialising it through the loaded-document path:

Pdf.LoadFromFile('stamped.pdf');
Pdf.SaveLoadedDocument('compacted.pdf');

The resave is a full rewrite, which means it deliberately discards prior revisions and will invalidate any signature in the file, so gate it behind the same policy check you use before any destructive operation. A reasonable production rule we have seen work: compact when the revision count crosses a threshold or when the appended overhead exceeds a percentage of the base file, and never compact files whose signature panel is non-empty.

Checking the result before it ships

Verification for this feature pair is pleasantly mechanical. Open the output in Adobe Acrobat and check three things: the document properties report PDF 1.5 or later when object streams are enabled; the signature panel still validates every previously signed revision after an incremental update; and page count and bookmarks survived a load-modify-save cycle. For archival channels, run the file through veraPDF as well, since compressed xref structures are exactly the kind of thing a strict parser examines more carefully than a forgiving viewer does. If you also process very large inputs, the inspection techniques in our walkthrough of the Direct File API for large PDF workflows combine well with incremental saving, and the signature mechanics referenced above are covered by depth in the HotPDF digital signatures and PAdES article.

FAQ

Will enabling object streams break older PDF readers?

Readers that only implement PDF 1.4 cannot parse xref streams and typically report the file as damaged. Keep UseXRefStream and UseObjectStreams at their default False for channels that feed legacy interpreters, and enable them for modern viewer and archive channels.

Does an incremental update keep my digital signature valid?

Yes, that is its purpose: new objects are appended after the signed bytes, so the signed /ByteRange still digests correctly. A full rewrite, including load-and-resave compaction, breaks every existing signature even when the visible content is unchanged.

Why does my file keep growing after repeated saves?

Incremental saving appends a delta on every save and never reclaims space. Compact occasionally with LoadFromFile plus SaveLoadedDocument once revision history and signatures no longer need to be preserved.

Where to go next

Both features are part of the standard feature set of the HotPDF Component, alongside the generation, form, encryption, and signing APIs shown elsewhere on this blog. The product page links to the full API documentation if you want to map the calls above onto your own document pipeline.