Understanding PDF Page Ordering – Why Your PDF Pages Aren’t There

The Hidden Complexity Behind PDF Structure

PDF documents are far more sophisticated than they appear to end users. While viewers see pages in a logical, sequential order (1, 2, 3…), the internal architecture of a PDF file tells a dramatically different story. This complexity is one of the most misunderstood aspects of PDF processing, leading to countless bugs, incorrect implementations, and frustrated developers. This comprehensive article explores the intricate world of PDF page organization, explains why developers frequently encounter unexpected page ordering issues, and provides practical solutions for robust PDF manipulation.

The PDF Object Model: A Paradigm Shift from Sequential Documents

To understand PDF page ordering challenges, we must first appreciate how fundamentally different PDF is from simpler document formats. Unlike plain text files, HTML documents, or even older formats like RTF, PDF employs a sophisticated object-based architecture where content organization and physical storage are completely decoupled.

This architectural decision was made for several important reasons:

Flexibility: Objects can be referenced from multiple locations without duplication
Efficiency: Common resources (fonts, images, graphics states) can be shared across pages
Incremental updates: Documents can be modified without rewriting the entire file
Random access: Viewers can jump to any page without parsing the entire document

However, this flexibility comes at the cost of complexity, particularly when it comes to understanding the relationship between object storage order and logical page sequence.

Object References vs. Display Order: A Concrete Example

Consider this typical PDF structure that illustrates the disconnect between storage and display:

% PDF file structure example - storage order vs. display order

%PDF-1.4

1 0 obj

<< /Type /Catalog /Pages 2 0 R >>

endobj

2 0 obj

<< /Type /Pages /Kids [20 0 R 1 0 R 4 0 R] /Count 3 >>

endobj

% Object 4 appears third in file but represents page 3 in display

4 0 obj

<< /Type /Page

/Contents 5 0 R

/Parent 2 0 R

/MediaBox [0 0 612 792]

/Resources << /Font << /F1 6 0 R >> >> >>

endobj

% Object 20 appears last in file but represents page 1 in display

20 0 obj

<< /Type /Page

/Contents 21 0 R

/Parent 2 0 R

/MediaBox [0 0 612 792]

/Resources << /Font << /F1 6 0 R >> >> >>

endobj

In this example, the page objects are stored as objects 4 and 20, but the display order is defined by the Kids array: [20, 1, 4]. This creates the following mapping:

Page 1 (display order) = Object 20 (storage order: last)
Page 2 (display order) = Object 1 (storage order: first)
Page 3 (display order) = Object 4 (storage order: third)

This disconnect is not accidental—it’s a fundamental feature of PDF that enables sophisticated document manipulation and optimisation.

Why PDF Generators Create Non-Sequential Object Orders

Understanding why PDF generators create non-sequential object orders helps developers appreciate the complexity they’re dealing with and avoid making incorrect assumptions about document structure.

PDF Creation Workflows

Different PDF creation workflows result in different object ordering patterns:

1. Sequential Document Creation

% Typical output from simple PDF generators

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [3 0 R 4 0 R 5 0 R] /Count 3 >> endobj

3 0 obj << /Type /Page /Contents 6 0 R /Parent 2 0 R >> endobj

4 0 obj << /Type /Page /Contents 7 0 R /Parent 2 0 R >> endobj

5 0 obj << /Type /Page /Contents 8 0 R /Parent 2 0 R >> endobj

2. Optimised Resource Sharing

% PDF with shared resources created first

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [10 0 R 11 0 R 12 0 R] /Count 3 >> endobj

3 0 obj << /Type /Font /Subtype /Type1 /BaseFont /Helvetica >> endobj

4 0 obj << /Type /XObject /Subtype /Image /Width 100 /Height 100 >> endobj

% ... more shared resources ...

10 0 obj << /Type /Page /Resources << /Font << /F1 3 0 R >> >> >> endobj

11 0 obj << /Type /Page /Resources << /XObject << /Im1 4 0 R >> >> >> endobj

12 0 obj << /Type /Page /Resources << /Font << /F1 3 0 R >> >> >> endobj

3. Incremental Document Assembly

% Document created by combining existing PDFs

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [100 0 R 25 0 R 75 0 R] /Count 3 >> endobj

% Objects from first source document

25 0 obj << /Type /Page /Contents 26 0 R /Parent 2 0 R >> endobj

% Objects from second source document

75 0 obj << /Type /Page /Contents 76 0 R /Parent 2 0 R >> endobj

% Objects from third source document

100 0 obj << /Type /Page /Contents 101 0 R /Parent 2 0 R >> endobj

Common Developer Mistakes and Their Consequences

The complexity of PDF structure leads to several common mistakes that can have serious consequences for application reliability and user experience.

Mistake 1: Assuming Object ID Order Equals Display Order

This is perhaps the most common mistake made by developers new to PDF processing:

// WRONG: Processing pages by object ID order

function GetPagesInWrongOrder(Doc: TPDFDocument): TPageList;

var

i: Integer;

Obj: TPDFObject;

begin

Result := TPageList.Create;

// This approach processes pages in storage order, not display order

for i := 0 to Doc.Objects.Count - 1 do

begin

Obj := Doc.Objects[i];

if (Obj <> nil) and (Obj.GetValue('/Type') = '/Page') then

begin

Result.Add(Obj); // Wrong order!

end;

// Result will be in object ID order: [1, 4, 20]

// But display order should be: [20, 1, 4]

end;

The consequences of this mistake include:

Pages appear in incorrect order in output documents
Page numbering becomes inconsistent
User confusion and support requests
Potential data corruption in document processing pipelines

Mistake 2: Hard-Coded Page Mapping Based on Observations

When developers encounter page ordering issues, they sometimes implement hard-coded fixes based on observed patterns:

// WRONG: Hard-coded page reordering based on heuristics

function ApplyPageReorderingHeuristics(Pages: TPageArray): TPageArray;

var

i: Integer;

begin

SetLength(Result, Length(Pages));

// Dangerous heuristic based on limited observations

if Length(Pages) = 3 then

begin

// "Fix" for specific 3-page documents observed during testing

Result[0] := Pages[1]; // Put second page first

Result[1] := Pages[2]; // Put third page second

Result[2] := Pages[0]; // Put first page last

end

else if Length(Pages) > 3 then

begin

// Generic "fix" that swaps first and last pages

Result[0] := Pages[Length(Pages) - 1];

Result[Length(Pages) - 1] := Pages[0];

// Keep middle pages in original order

for i := 1 to Length(Pages) - 2 do

Result[i] := Pages[i];

end

else

begin

// For other cases, just copy as-is

for i := 0 to High(Pages) do

Result[i] := Pages[i];

end;

This approach is fundamentally flawed because:

It only works for the specific PDFs observed during development
It fails catastrophically with PDFs that have different structures
It creates unpredictable behaviour that users cannot understand
It accumulates technical debt as more special cases are added

Mistake 3: Ignoring Hierarchical Page Trees

Many developers assume that PDF page trees are always flat arrays, but the PDF specification allows for hierarchical structures:

// WRONG: Assuming flat page tree structure

function GetPagesFromFlatTree(PagesObj: TPDFObject): TPageArray;

var

KidsArray: TPDFArray;

i: Integer;

begin

KidsArray := PagesObj.GetArray('/Kids');

if KidsArray = nil then Exit;

SetLength(Result, KidsArray.Count);

for i := 0 to KidsArray.Count - 1 do

begin

// This assumes all Kids entries are Page objects

// But they might be intermediate Pages objects!

Result[i] := KidsArray.GetIndirectObject(i);

end;

The Correct Approach: Following the Pages Tree Structure

The proper way to handle PDF page ordering is to implement a complete Pages tree traversal that follows the PDF specification exactly.

Understanding the Pages Tree Hierarchy

PDF page trees can be hierarchical, with intermediate Pages objects containing their own Kids arrays:

% Hierarchical page tree example

1 0 obj

<< /Type /Catalog /Pages 2 0 R >>

endobj

% Root Pages object

2 0 obj

<< /Type /Pages

/Kids [3 0 R 8 0 R 15 0 R]

/Count 7 >>

endobj

% First intermediate Pages object (contains 3 pages)

3 0 obj

<< /Type /Pages

/Kids [4 0 R 5 0 R 6 0 R]

/Count 3

/Parent 2 0 R >>

endobj

% Second intermediate Pages object (contains 2 pages)

8 0 obj

<< /Type /Pages

/Kids [9 0 R 10 0 R]

/Count 2

/Parent 2 0 R >>

endobj

% Third intermediate Pages object (contains 2 pages)

15 0 obj

<< /Type /Pages

/Kids [16 0 R 17 0 R]

/Count 2

/Parent 2 0 R >>

endobj

% Actual page objects

4 0 obj << /Type /Page /Contents 40 0 R /Parent 3 0 R >> endobj

5 0 obj << /Type /Page /Contents 41 0 R /Parent 3 0 R >> endobj

% ... and so on

Implementing Recursive Page Tree Traversal

// CORRECT: Recursive page tree traversal

function GetPagesInCorrectOrder(Doc: TPDFDocument): TPageArray;

var

CatalogObj, RootPagesObj: TPDFObject;

PageList: TList;

begin

PageList := TList.Create;

try

// Step 1: Find the document catalog

CatalogObj := Doc.FindObject('/Type', '/Catalog');

if CatalogObj = nil then

raise Exception.Create('Document catalog not found');

// Step 2: Get the root Pages object

RootPagesObj := CatalogObj.GetIndirectObject('/Pages');

if RootPagesObj = nil then

raise Exception.Create('Root Pages object not found');

// Step 3: Recursively traverse the page tree

TraversePagesTree(RootPagesObj, PageList);

// Step 4: Convert list to array

SetLength(Result, PageList.Count);

for i := 0 to PageList.Count - 1 do

Result[i] := TPDFObject(PageList[i]);

finally

PageList.Free;

end;

procedure TraversePagesTree(PagesObj: TPDFObject; PageList: TList);

var

KidsArray: TPDFArray;

i: Integer;

ChildObj: TPDFObject;

ChildType: string;

begin

if PagesObj = nil then Exit;

// Get the Kids array from this Pages object

KidsArray := PagesObj.GetArray('/Kids');

if KidsArray = nil then Exit;

// Process each child in the Kids array

for i := 0 to KidsArray.Count - 1 do

begin

ChildObj := KidsArray.GetIndirectObject(i);

if ChildObj = nil then Continue;

ChildType := ChildObj.GetValue('/Type');

if ChildType = '/Page' then

begin

// This is a leaf page object - add it to our list

PageList.Add(ChildObj);

end

else if ChildType = '/Pages' then

begin

// This is an intermediate Pages object - recurse into it

TraversePagesTree(ChildObj, PageList);

end

else

begin

// Unexpected object type in Kids array

raise Exception.CreateFmt('Unexpected object type in Kids array: %s', [ChildType]);

end;

Handling Real-World PDF Variations and Edge Cases

Real-world PDF files often deviate from the ideal structure described in the specification. A robust PDF processing library must handle these variations gracefully.

Common Structural Anomalies

1. Missing or Corrupted Catalog

% PDF with missing catalog reference

%PDF-1.4

% Object 1 should be catalog but is missing or corrupted

2 0 obj

<< /Type /Pages /Kids [3 0 R 4 0 R] /Count 2 >>

endobj

2. Circular References

% PDF with circular page tree references (corrupted)

2 0 obj

<< /Type /Pages /Kids [3 0 R] /Count 1 /Parent 3 0 R >>

endobj

3 0 obj

<< /Type /Pages /Kids [2 0 R] /Count 1 /Parent 2 0 R >>

endobj

3. Inconsistent Count Values

% PDF with incorrect Count value

2 0 obj

<< /Type /Pages /Kids [3 0 R 4 0 R 5 0 R] /Count 5 >>

% Count says 5 but Kids array has only 3 elements

endobj

Implementing Robust Error Handling

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

// Robust page tree traversal with comprehensive error handling

function GetPagesWithFallbacks(Doc: TPDFDocument): TPageArray;

var

AttemptCount: Integer;

ErrorMessages: TStringList;

begin

ErrorMessages := TStringList.Create;

try

AttemptCount := 0;

// Attempt 1: Standard PDF specification approach

Inc(AttemptCount);

try

Result := GetPagesViaStandardTraversal(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with standard traversal (attempt %d)', [AttemptCount]));

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// Attempt 2: Search for Pages objects and try each one

Inc(AttemptCount);

try

Result := GetPagesViaObjectSearch(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with object search (attempt %d)', [AttemptCount]));

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// Attempt 3: Brute force search for Page objects

Inc(AttemptCount);

try

Result := GetPagesViaBruteForce(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with brute force search (attempt %d)', [AttemptCount]));

LogMessage('Warning: Document structure is non-standard');

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// All attempts failed

raise Exception.Create('Failed to extract pages from PDF. Errors: ' +

ErrorMessages.Text);

finally

ErrorMessages.Free;

end;

function GetPagesViaObjectSearch(Doc: TPDFDocument): TPageArray;

var

i: Integer;

Obj: TPDFObject;

KidsArray: TPDFArray;

PageList: TList;

CandidateObjects: TList;

begin

CandidateObjects := TList.Create;

PageList := TList.Create;

try

// Find all objects that could be Pages objects

for i := 0 to Doc.Objects.Count - 1 do

begin

Obj := Doc.Objects[i];

if (Obj <> nil) and

(Obj.GetValue('/Type') = '/Pages') and

Obj.HasKey('/Kids') then

begin

CandidateObjects.Add(Obj);

end;

// Try each candidate Pages object

for i := 0 to CandidateObjects.Count - 1 do

begin

Obj := TPDFObject(CandidateObjects[i]);

KidsArray := Obj.GetArray('/Kids');

if (KidsArray <> nil) and (KidsArray.Count > 0) then

begin

// Validate that this Kids array contains actual pages

if ValidateKidsArray(KidsArray) then

begin

PageList.Clear;

TraversePagesTree(Obj, PageList);

if PageList.Count > 0 then

begin

// Found valid pages - convert to array and return

SetLength(Result, PageList.Count);

for j := 0 to PageList.Count - 1 do

Result[j] := TPDFObject(PageList[j]);

Exit;

end;

// No valid Pages object found

SetLength(Result, 0);

finally

CandidateObjects.Free;

PageList.Free;

end;

Performance Optimisation Strategies

When processing large PDF files or handling high-volume document processing, performance becomes a critical consideration.

Lazy Loading and Caching

// Performance-optimised page access with caching

type

TPDFPageCache = class

private

FPages: array of TPDFPage;

FPageObjects: array of TPDFObject;

FCacheHits: Integer;

FCacheMisses: Integer;

FMaxCacheSize: Integer;

public

constructor Create(MaxCacheSize: Integer = 100);

destructor Destroy; override;

function GetPage(Index: Integer): TPDFPage;

procedure ClearCache;

procedure GetCacheStatistics(out Hits, Misses: Integer);

end;

function TPDFPageCache.GetPage(Index: Integer): TPDFPage;

begin

// Check if page is already cached

if (Index >= 0) and (Index < Length(FPages)) and

(FPages[Index] <> nil) then

begin

Inc(FCacheHits);

Result := FPages[Index];

Exit;

end;

Inc(FCacheMisses);

// Load page from object if not cached

if (Index >= 0) and (Index < Length(FPageObjects)) and

(FPageObjects[Index] <> nil) then

begin

Result := TPDFPage.CreateFromObject(FPageObjects[Index]);

// Cache the page if we have room

if Length(FPages) < FMaxCacheSize then begin if Index >= Length(FPages) then

SetLength(FPages, Index + 1);

FPages[Index] := Result;

end;

end

else

begin

Result := nil;

end;

Streaming Processing for Large Documents

// Streaming approach for processing large PDF documents

procedure ProcessLargePDFInChunks(const FileName: string; ChunkSize: Integer = 50);

var

Doc: TPDFDocument;

TotalPages: Integer;

ChunkStart, ChunkEnd: Integer;

i: Integer;

begin

Doc := TPDFDocument.Create;

try

Doc.LoadFromFile(FileName);

TotalPages := Doc.GetPageCount;

LogMessage(Format('Processing %d pages in chunks of %d', [TotalPages, ChunkSize]));

ChunkStart := 0;

while ChunkStart < TotalPages do

begin

ChunkEnd := Min(ChunkStart + ChunkSize - 1, TotalPages - 1);

LogMessage(Format('Processing chunk: pages %d-%d', [ChunkStart + 1, ChunkEnd + 1]));

// Process this chunk of pages

for i := ChunkStart to ChunkEnd do

begin

ProcessSinglePage(Doc, i);

end;

// Optional: Force garbage collection between chunks

if (ChunkStart mod (ChunkSize * 4)) = 0 then

begin

ForceGarbageCollection;

end;

ChunkStart := ChunkEnd + 1;

end;

finally

Doc.Free;

end;

Advanced PDF Structure Analysis

For developers working with complex PDF processing requirements, understanding advanced structural elements is crucial.

Page Inheritance and Resource Management

PDF pages can inherit properties from their parent Pages objects, creating a hierarchical resource management system:

% Example of page inheritance in PDF structure

2 0 obj

<< /Type /Pages

/Kids [3 0 R 4 0 R]

/Count 2

/MediaBox [0 0 612 792]

/Resources <<

/Font << /F1 10 0 R >>

/ProcSet [/PDF /Text]

>> >>

endobj

% Child page inherits MediaBox and Resources from parent

3 0 obj

<< /Type /Page

/Parent 2 0 R

/Contents 5 0 R >>

% This page inherits MediaBox [0 0 612 792] and Resources from parent

endobj

% Child page overrides inherited MediaBox

4 0 obj

<< /Type /Page

/Parent 2 0 R

/Contents 6 0 R

/MediaBox [0 0 792 612] >>

% This page overrides MediaBox but still inherits Resources

endobj

Handling Page Inheritance in Code

// Proper handling of page inheritance

function GetEffectivePageProperties(PageObj: TPDFObject): TPDFPageProperties;

var

CurrentObj: TPDFObject;

MediaBox: TPDFArray;

Resources: TPDFObject;

begin

// Initialize result

Result := TPDFPageProperties.Create;

// Walk up the parent chain to collect inherited properties

CurrentObj := PageObj;

while CurrentObj <> nil do

begin

// Check for MediaBox at this level

if Result.MediaBox.IsEmpty then

begin

MediaBox := CurrentObj.GetArray('/MediaBox');

if MediaBox <> nil then

Result.MediaBox := MediaBox;

end;

// Check for Resources at this level

if Result.Resources = nil then

begin

Resources := CurrentObj.GetDictionary('/Resources');

if Resources <> nil then

Result.Resources := Resources;

end;

// Check for other inheritable properties

CheckForInheritableProperty(CurrentObj, '/Rotate', Result.Rotate);

CheckForInheritableProperty(CurrentObj, '/CropBox', Result.CropBox);

// Move to parent object

CurrentObj := CurrentObj.GetIndirectObject('/Parent');

// Prevent infinite loops in corrupted PDFs

if CurrentObj = PageObj then

break;

end;

// Validate that we found required properties

if Result.MediaBox.IsEmpty then

raise Exception.Create('No MediaBox found in page inheritance chain');

end;

Testing Strategies for PDF Page Ordering

Comprehensive testing is essential when dealing with PDF page ordering, given the variety of possible document structures.

Creating Comprehensive Test Suites

# Comprehensive PDF test case generation script

# Test Case 1: Sequential pages (baseline)

echo "Creating sequential page test..."

pdftk A=template.pdf cat A A A output test-sequential.pdf

# Test Case 2: Non-sequential object IDs

echo "Creating non-sequential object ID test..."

pdftk A=page3.pdf B=page1.pdf C=page2.pdf cat A B C output test-nonsequential.pdf

# Test Case 3: Hierarchical page tree

echo "Creating hierarchical page tree test..."

# This requires custom PDF generation tool

generate-hierarchical-pdf --depth 3 --pages-per-node 2 output test-hierarchical.pdf

# Test Case 4: Large document with mixed structures

echo "Creating large document test..."

pdftk A=large-doc.pdf cat 1-100 50-149 200-299 output test-large-mixed.pdf

# Test Case 5: Corrupted page tree

echo "Creating corrupted page tree test..."

# This requires custom corruption tool

corrupt-pdf-structure --target pages-tree test-sequential.pdf test-corrupted.pdf

# Test Case 6: Minimal single-page document

echo "Creating minimal single-page test..."

pdftk A=template.pdf cat 1 output test-single-page.pdf

Automated Validation Framework

100

// Comprehensive PDF page ordering validation framework

type

TPDFTestCase = record

FileName: string;

ExpectedPageCount: Integer;

ExpectedPageOrder: array of Integer;

Description: string;

end;

function RunPDFPageOrderingTests: Boolean;

var

TestCases: array of TPDFTestCase;

i: Integer;

PassCount, FailCount: Integer;

begin

// Define test cases

SetLength(TestCases, 6);

TestCases[0].FileName := 'test-sequential.pdf';

TestCases[0].ExpectedPageCount := 3;

TestCases[0].ExpectedPageOrder := [0, 1, 2];

TestCases[0].Description := 'Sequential page ordering';

TestCases[1].FileName := 'test-nonsequential.pdf';

TestCases[1].ExpectedPageCount := 3;

TestCases[1].ExpectedPageOrder := [2, 0, 1]; // Based on how pdftk reorders

TestCases[1].Description := 'Non-sequential object IDs';

// ... define other test cases ...

PassCount := 0;

FailCount := 0;

WriteLn('Running PDF page ordering tests...');

WriteLn('=' * 50);

for i := 0 to High(TestCases) do

begin

Write(Format('Test %d: %s... ', [i + 1, TestCases[i].Description]));

if ValidateTestCase(TestCases[i]) then

begin

WriteLn('PASS');

Inc(PassCount);

end

else

begin

WriteLn('FAIL');

Inc(FailCount);

end;

WriteLn('=' * 50);

WriteLn(Format('Results: %d passed, %d failed', [PassCount, FailCount]));

Result := FailCount = 0;

end;

function ValidateTestCase(const TestCase: TPDFTestCase): Boolean;

var

Doc: TPDFDocument;

ActualPages: TPageArray;

i: Integer;

begin

Result := False;

Doc := TPDFDocument.Create;

try

if not Doc.LoadFromFile(TestCase.FileName) then

begin

WriteLn(Format('Failed to load %s', [TestCase.FileName]));

Exit;

end;

ActualPages := GetPagesInCorrectOrder(Doc);

// Validate page count

if Length(ActualPages) <> TestCase.ExpectedPageCount then

begin

WriteLn(Format('Page count mismatch: expected %d, got %d',

[TestCase.ExpectedPageCount, Length(ActualPages)]));

Exit;

end;

// Validate page order (simplified - in real implementation,

// you'd compare actual page content or identifiers)

for i := 0 to High(ActualPages) do

begin

if not ValidatePageAtPosition(ActualPages[i], TestCase.ExpectedPageOrder[i]) then

begin

WriteLn(Format('Page order mismatch at position %d', [i]));

Exit;

end;

Result := True;

finally

Doc.Free;

end;

Future-Proofing Your PDF Processing Code

As PDF standards evolve and new use cases emerge, it’s important to write code that can adapt to future requirements.

Designing for Extensibility

// Extensible PDF page processing architecture

type

IPDFPageProcessor = interface

['{12345678-1234-1234-1234-123456789012}']

function ProcessPage(Page: TPDFPage; Context: TPDFProcessingContext): Boolean;

function GetProcessorName: string;

function GetSupportedPDFVersions: TStringArray;

end;

TPDFProcessingPipeline = class

private

FProcessors: TList;

FContext: TPDFProcessingContext;

public

constructor Create;

destructor Destroy; override;

procedure RegisterProcessor(Processor: IPDFPageProcessor);

procedure UnregisterProcessor(Processor: IPDFPageProcessor);

function ProcessDocument(Doc: TPDFDocument): Boolean;

end;

function TPDFProcessingPipeline.ProcessDocument(Doc: TPDFDocument): Boolean;

var

Pages: TPageArray;

i, j: Integer;

Page: TPDFPage;

Processor: IPDFPageProcessor;

Success: Boolean;

begin

Result := True;

// Get pages in correct order using our robust method

Pages := GetPagesInCorrectOrder(Doc);

// Process each page through all registered processors

for i := 0 to High(Pages) do

begin

Page := TPDFPage.CreateFromObject(Pages[i]);

try

FContext.CurrentPageIndex := i;

FContext.TotalPages := Length(Pages);

for j := 0 to FProcessors.Count - 1 do

begin

Processor := FProcessors[j];

Success := Processor.ProcessPage(Page, FContext);

if not Success then

begin

LogError(Format('Processor %s failed on page %d',

[Processor.GetProcessorName, i + 1]));

Result := False;

// Continue with other processors/pages or break based on policy

end;

finally

Page.Free;

end;

The investment in proper PDF structure understanding pays dividends in reduced support burden, improved user satisfaction, and easier maintenance over the application’s lifetime. PDF page ordering is not just a technical detail – it’s a fundamental aspect of document integrity that directly impacts user experience. Master this complexity, and you’ll build PDF applications that users can trust with their most important documents.

Next Article