Technical Article

Hardening a PDFium VCL Binding: ABI and Memory Safety

A Pascal binding over a C library reads like ordinary Pascal. You call a method, you get a record back, you free what you allocated. The trouble is that PDFium is a C and C++ library with its own calling convention, its own integer widths, and its own rules about who owns memory and who frees it. None of that crosses the language boundary on its own. Every one of those contracts has to be restated by hand in the Pascal declarations, and a single wrong word turns a clean-looking call into a stack corruption, a truncated offset, or a double free. A v1.61.0 audit of a PDFium VCL binding turned up one defect of each kind. They are worth walking through because they are not specific to this binding. They are the standing hazards of wrapping any C API in Delphi or Lazarus.

cdecl is part of the function type, not a decoration

PDFium is compiled C. On Win32 its exports and, more importantly, the callbacks it invokes use the cdecl calling convention. Under cdecl the caller cleans up the stack after the call returns. Delphi's native default is register, and the Win32 C standard for callbacks is stdcall in some libraries, where the callee cleans up instead. When a structure hands PDFium a function pointer and you forget the cdecl on that pointer's type, the two sides disagree about who adjusts the stack pointer. Both fix it, or neither does, and the stack pointer drifts by the size of the arguments on every invocation.

The reason this defect is hard to find is that the damage is non-local. The corrupted call returns and looks fine. The misalignment shows up later, in some unrelated function whose frame now sits on a stack pointer that is a few bytes off, and it manifests as a wild read, a bad return address, or a crash with a backtrace that points nowhere near the callback you actually got wrong. Form-fill is the classic place this bites, because the form-fill interface is a record full of callbacks that PDFium calls back into. One of them, FFI_OpenFile, hands PDFium a function it will call to open an external file, declared as function(pThis: PFPDF_FORMFILLINFO; fileFlag: Integer; wsURL: FPDF_WIDESTRING; mode: PAnsiChar): PFPDF_FILEHANDLER; cdecl. The trailing cdecl is the point worth copying. Drop it and the code still compiles, still links, and still runs right up until PDFium calls the function. The convention belongs to the function type itself. It is not optional sugar, and the compiler will not warn you when it is missing because a plain function type is a perfectly legal Pascal type. The only defence is to treat the calling convention as a mandatory field of every imported signature and every callback you pass outward.

size_t is pointer-width, and on FPC Win64 that means 64 bits

The second defect is an integer-width mismatch that only appears on one target. C's size_t is defined to be wide enough to hold any object size, which on a 64-bit platform means a 64-bit unsigned integer. PDFium's progressive-loading interfaces speak in size_t byte offsets. The availability provider's FX_FILEAVAIL record carries an IsDataAvail callback that PDFium calls with an offset and a size, and the FX_DOWNLOADHINTS record's AddSegment callback receives the same. Both parameters are size_t.

IsDataAvail = function(
  pThis       : PFX_FILEAVAIL;
  offset, size: size_t): FPDF_BOOL; cdecl;

AddSegment = procedure(
  pThis       : PFX_DOWNLOADHINTS;
  offset, size: size_t); cdecl;

If you declare those offsets as a 32-bit type, the binding works on Win32 and on Delphi Win64, then silently breaks on FPC and Lazarus Win64. The cause is subtle. On FPC Win64, NativeUInt is a genuine pointer-width 64-bit type, and size_t is aliased to it. The binding has a comment in the type section warning precisely against shadowing NativeUInt on FPC, because redefining it to a 32-bit alias there would force size_t to 32 bits and corrupt every size_t parameter passed to or written by the library. A 64-bit offset arriving at a 32-bit parameter loses its top half. For a small file every offset fits in 32 bits and nothing is wrong. For a large file, the moment an offset crosses the four-gigabyte line the truncated value points somewhere else entirely, PDFium asks whether the wrong byte range is available, and progressive loading stalls or reads garbage. The defect is invisible until the file is big enough and the target is the one where size_t actually widened.

A Pascal exception must never unwind through a C frame

The third class is about the exception model, which C does not have. When PDFium calls one of your callbacks, your Pascal code runs inside a stack of C and C++ frames that know nothing about Delphi's exception machinery. If your callback raises and lets the exception propagate, it unwinds through frames that were never built to be unwound. PDFium's own cleanup does not run, its internal invariants are left half-updated, and the process is now in a state the library never anticipated. The contract for these callbacks is a return code, not an exception.

Two callbacks make this concrete. FPDF_FILEWRITE is the sink PDFium writes a saved document into, and FPDF_FILEACCESS is the source it reads an input document from. Both are implemented here over a Delphi TStream, and both can fail the way any stream fails: the disk fills, the stream is closed underneath you, a read runs past the end. The write callback wraps its stream write and turns any failure into PDFium's failure code rather than letting it escape.

function WriteBlock(
  pThis: PFPDF_FILEWRITE;
  pData: Pointer;
  Size : LongWord): Integer; cdecl;
begin
  // PDFium treats any non-1 return as a write failure. A Pascal exception
  // must not unwind through this cdecl/C++ frame, so trap it and report
  // failure instead.
  Result := 0;
  try
    PPdfWrite(pThis).Stream.WriteBuffer(pData^, Size);
    Result := 1;
  except
  end;
end;

The read side does the same: a failed read reports zero to match the FPDF_FILEACCESS contract instead of raising across the boundary. A bare except with no re-raise looks wrong to a Pascal programmer trained never to swallow exceptions, and in ordinary Pascal it is wrong. At an ABI boundary it is the correct shape, because the only safe value to hand back to the C caller is a status code it knows how to interpret. The failure still propagates, just through the return value, and the calling code above the library surfaces it as EPdfError once control is back on the Pascal side of the fence.

Double free hides on the error path

The fourth defect is ownership. A PDFium document handle is opened by the library and must be closed exactly once, by FPDF_CloseDocument. The danger is an error path that frees a handle a second cleanup also owns. Picture a routine that creates a wrapper object, assigns a freshly opened document handle to it, and then does more setup that might fail. If the setup throws, an early-return handler that calls FPDF_CloseDocument on the raw handle will close it, and then the wrapper object's own destructor will close it again when the object is freed. The handle is freed twice, which is undefined behaviour and a likely crash.

The audit found this on an imposition-style import path that builds a TPdf around an already-open handle. The fix is to make ownership transfer the single source of truth. Once the handle is assigned to the wrapper's field, the wrapper owns it, and the only cleanup on the error path is to free the wrapper. The wrapper's destructor calls FPDF_CloseDocument for you, so a second explicit close would double-free the same document. The corrected error handler frees the object and re-raises, and there is exactly one path to the close.

Result := TPdf.Create(nil);
try
  Result.FDocument := NewDoc;   // Result now owns the handle
  Result.InitializeFormFill;
  Result.ReloadPage;
except
  // Result.Free closes the handle. A second FPDF_CloseDocument(NewDoc)
  // here would double-free the same PDFium document.
  Result.Free;
  raise;
end;

Managed records and a library full of exports both need explicit teardown

The last class is about memory the compiler manages on your behalf, which a C habit will quietly corrupt. Many of this binding's helper functions return a record that contains a WideString or a dynamic array. Those are reference-counted fields, and the compiler emits hidden bookkeeping to maintain their counts. The instinct carried over from C is to clear a fresh record with FillChar(Result, SizeOf(Result), 0). That stamps zeros over the managed reference inside the record without decrementing it first. The compiler reuses one hidden temporary for a function result across loop iterations, so on the second iteration FillChar overwrites a live string pointer that was never released, and the string it pointed at leaks. Call the function in a loop over a thousand annotations and you leak a thousand strings.

The fix is to let the language clear the record the way it knows how, with Default(T), which releases any managed field before zeroing it.

// Default() instead of FillChar: the compiler reuses one hidden temp for
// the function result across loop iterations, so FillChar would zero live
// WideString pointers without releasing them.
Result := Default(TPdfAnnotation);

A related ownership problem lives at the library-loading boundary. This binding resolves several hundred function pointers out of the PDFium DLL with GetProcAddress after a LoadLibrary. If one required export is missing, the partially bound state is dangerous: dozens of pointers are valid, the rest are nil or stale, and any later call through one of them jumps into a module that may already be unloaded. The binding handles this by unloading the library and running a full ClearAllBindings that resets every imported pointer back to nil whenever a required export fails to resolve. After that, no function pointer dangles into an unloaded module, and a later call fails cleanly with a nil-pointer check instead of branching into freed code.

The wrapper is where four contracts get restated by hand

None of these five defects is exotic. They are the predictable failure modes of a thin Pascal layer over a C API, and they cluster because that layer is exactly where four separate contracts have to be re-declared. The calling convention has to be spelled cdecl on every callback. The integer width has to match size_t on the one target where it actually widens. The exception model has to be converted to return codes at every callback that crosses out of Pascal. The ownership of every handle and every managed field has to be stated once and obeyed on every path, including the error paths nobody exercises until production. Miss any one and you get a defect whose symptom shows up far from its cause, which is what makes this category expensive. The audit's value was less in any single fix than in treating each of these as its own discipline to check across the whole binding.

If you want to see the binding doing real work rather than guarding its edges, the render-cache and zoom techniques in our note on render-cache and zoom performance show the rendering path, and the cross-compiler walkthrough in building a Lazarus and FPC viewer is the place the Win64 size_t behaviour described here actually matters. Both build on the same memory-safety and ABI work that ships in the PDFium Component for Delphi, Lazarus, and C++Builder, alongside the rendering, text-extraction, and form APIs covered elsewhere on this blog.