PDF ページ順序を理解する: ページが見つからない理由

PDF構造の背後にある隠された複雑さ

PDF ドキュメントは、エンドユーザーが見ている以上に高度です。閲覧者はページを論理的かつ順序立てて(1、2、3…)表示しますが、PDF ファイルの内部構造は、全く異なる物語を語っています。この複雑さは、PDF処理において最も誤解されやすい側面であり、数えきれないほどのバグ、不正確な実装、そして落胆した開発者を引き起こしています。この記事では、PDF ページの構成に関する複雑な世界を探求し、開発者が頻繁に遭遇する予期しないページ順序の問題の原因を説明し、堅牢なPDF操作のための実用的な解決策を提供します。

PDFオブジェクトモデル:シーケンシャルなドキュメントからのパラダイムシフト

PDFのページ順序に関する課題を理解するためには、まず、PDFが単純なドキュメント形式とどれほど根本的に異なるかを理解する必要があります。プレーンテキストファイル、HTMLドキュメント、またはRTFのような古い形式とは異なり、PDFは、コンテンツの構成と物理的な保存が完全に分離された、高度なオブジェクトベースのアーキテクチャを採用しています。

このアーキテクチャ上の決定は、いくつかの重要な理由からなされました。

柔軟性: オブジェクトは、重複することなく、複数の場所から参照できます。
効率性: 共通のリソース(フォント、画像、グラフィックスの状態)は、ページ間で共有できます。
增量更新: ドキュメントは、ファイル全体を書き換えることなく変更できます。
乱数アクセス: ユーザーは、ドキュメント全体を解析することなく、任意のページにジャンプできます。

しかし、この柔軟性には、特にオブジェクトの格納順序と論理的なページシーケンスの関係を理解する際の複雑さというコストが伴います。

オブジェクト参照と表示順序:具体的な例

以下の典型的なPDF構造は、格納と表示の間の乖離を示しています。

% PDF file structure example - storage order vs. display order

%PDF-1.4

1 0 obj

<< /Type /Catalog /Pages 2 0 R >>

endobj

2 0 obj

<< /Type /Pages /Kids [20 0 R 1 0 R 4 0 R] /Count 3 >>

endobj

% Object 4 appears third in file but represents page 3 in display

4 0 obj

<< /Type /Page

/Contents 5 0 R

/Parent 2 0 R

/MediaBox [0 0 612 792]

/Resources << /Font << /F1 6 0 R >> >> >>

endobj

% Object 20 appears last in file but represents page 1 in display

20 0 obj

<< /Type /Page

/Contents 21 0 R

/Parent 2 0 R

/MediaBox [0 0 612 792]

/Resources << /Font << /F1 6 0 R >> >> >>

endobj

この例では、ページオブジェクトはオブジェクト4と20として格納されていますが、表示順序はKids配列によって定義されています:[20, 1, 4]。これにより、以下のマッピングが作成されます。

ページ1 (表示順序)= オブジェクト20 (格納順序:最後)
ページ2 (表示順序)= オブジェクト1 (格納順序:最初)
3ページ(表示順序)= オブジェクト4 (保存順序:3番目)

この不整合は偶然ではありません。これはPDFの基本的な機能であり、高度なドキュメントの操作と最適化を可能にします。

PDFジェネレーターが非連続的なオブジェクト順序を作成する理由

PDFジェネレーターが非連続的なオブジェクト順序を作成する理由を理解することで、開発者は取り組んでいる複雑さを理解し、ドキュメント構造に関する誤った仮定を避けることができます。

PDF作成ワークフロー

さまざまなPDF作成ワークフローにより、異なるオブジェクト順序パターンが生まれます。

1. 順次ドキュメント作成

% Typical output from simple PDF generators

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [3 0 R 4 0 R 5 0 R] /Count 3 >> endobj

3 0 obj << /Type /Page /Contents 6 0 R /Parent 2 0 R >> endobj

4 0 obj << /Type /Page /Contents 7 0 R /Parent 2 0 R >> endobj

5 0 obj << /Type /Page /Contents 8 0 R /Parent 2 0 R >> endobj

2. 最適化されたリソース共有

% PDF with shared resources created first

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [10 0 R 11 0 R 12 0 R] /Count 3 >> endobj

3 0 obj << /Type /Font /Subtype /Type1 /BaseFont /Helvetica >> endobj

4 0 obj << /Type /XObject /Subtype /Image /Width 100 /Height 100 >> endobj

% ... more shared resources ...

10 0 obj << /Type /Page /Resources << /Font << /F1 3 0 R >> >> >> endobj

11 0 obj << /Type /Page /Resources << /XObject << /Im1 4 0 R >> >> >> endobj

12 0 obj << /Type /Page /Resources << /Font << /F1 3 0 R >> >> >> endobj

3. 增量ドキュメント组装

% Document created by combining existing PDFs

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [100 0 R 25 0 R 75 0 R] /Count 3 >> endobj

% Objects from first source document

25 0 obj << /Type /Page /Contents 26 0 R /Parent 2 0 R >> endobj

% Objects from second source document

75 0 obj << /Type /Page /Contents 76 0 R /Parent 2 0 R >> endobj

% Objects from third source document

100 0 obj << /Type /Page /Contents 101 0 R /Parent 2 0 R >> endobj

常见的开发エラー及其后果

PDF 構造的複雑性会引き起こす一些常见的エラー、这些エラー可能对アプリケーション的可靠性とユーザー体验产生严重后果。

エラー 1:假设オブジェクト ID 順序等于表示順序

这可能是 PDF 処理新手開発者最常犯的エラー之一:

// WRONG: Processing pages by object ID order

function GetPagesInWrongOrder(Doc: TPDFDocument): TPageList;

var

i: Integer;

Obj: TPDFObject;

begin

Result := TPageList.Create;

// This approach processes pages in storage order, not display order

for i := 0 to Doc.Objects.Count - 1 do

begin

Obj := Doc.Objects[i];

if (Obj <> nil) and (Obj.GetValue('/Type') = '/Page') then

begin

Result.Add(Obj); // Wrong order!

end;

// Result will be in object ID order: [1, 4, 20]

// But display order should be: [20, 1, 4]

end;

このようなエラー的后果含む:

出力ドキュメント内のページ順序不正しい
ページ番号不一致
ユーザーの混乱とサポートリクエスト
ドキュメント処理パイプラインにおける潜在的なデータ破損

誤り2:観察に基づいたハードコードされたページマッピング

開発者がページ順序の問題に遭遇した場合、観察されたパターンに基づいてハードコードされた修正を実装することがあります。

// WRONG: Hard-coded page reordering based on heuristics

function ApplyPageReorderingHeuristics(Pages: TPageArray): TPageArray;

var

i: Integer;

begin

SetLength(Result, Length(Pages));

// Dangerous heuristic based on limited observations

if Length(Pages) = 3 then

begin

// "Fix" for specific 3-page documents observed during testing

Result[0] := Pages[1]; // Put second page first

Result[1] := Pages[2]; // Put third page second

Result[2] := Pages[0]; // Put first page last

end

else if Length(Pages) > 3 then

begin

// Generic "fix" that swaps first and last pages

Result[0] := Pages[Length(Pages) - 1];

Result[Length(Pages) - 1] := Pages[0];

// Keep middle pages in original order

for i := 1 to Length(Pages) - 2 do

Result[i] := Pages[i];

end

else

begin

// For other cases, just copy as-is

for i := 0 to High(Pages) do

Result[i] := Pages[i];

end;

このアプローチは根本的に欠陥があります。

これは、開発中に観察された特定のPDF ファイルでのみ機能します。
構造が異なるPDF ファイルでは、完全に機能しません。
ユーザーが理解できない予測不可能な動作を引き起こします。
それは、より多くの特殊なケースが追加されるにつれて、技術的な負債を蓄積します。

間違い3:階層的なページツリーを無視すること。

多くの開発者は、PDFのページツリーは常にフラットな配列であると仮定していますが、PDFの仕様では階層構造が許可されています。

// WRONG: Assuming flat page tree structure

function GetPagesFromFlatTree(PagesObj: TPDFObject): TPageArray;

var

KidsArray: TPDFArray;

i: Integer;

begin

KidsArray := PagesObj.GetArray('/Kids');

if KidsArray = nil then Exit;

SetLength(Result, KidsArray.Count);

for i := 0 to KidsArray.Count - 1 do

begin

// This assumes all Kids entries are Page objects

// But they might be intermediate Pages objects!

Result[i] := KidsArray.GetIndirectObject(i);

end;

正しいアプローチ:ページのツリー構造に従うこと。

PDFのページ順序を適切に処理するには、PDF仕様に完全に準拠した、完全なページのツリーのトラバーサルを実装する必要があります。

ページのツリーの階層構造を理解する。

PDFのページツリーは階層構造を持つことができ、中間ページのオブジェクトには独自のKids配列が含まれている場合があります。

% Hierarchical page tree example

1 0 obj

<< /Type /Catalog /Pages 2 0 R >>

endobj

% Root Pages object

2 0 obj

<< /Type /Pages

/Kids [3 0 R 8 0 R 15 0 R]

/Count 7 >>

endobj

% First intermediate Pages object (contains 3 pages)

3 0 obj

<< /Type /Pages

/Kids [4 0 R 5 0 R 6 0 R]

/Count 3

/Parent 2 0 R >>

endobj

% Second intermediate Pages object (contains 2 pages)

8 0 obj

<< /Type /Pages

/Kids [9 0 R 10 0 R]

/Count 2

/Parent 2 0 R >>

endobj

% Third intermediate Pages object (contains 2 pages)

15 0 obj

<< /Type /Pages

/Kids [16 0 R 17 0 R]

/Count 2

/Parent 2 0 R >>

endobj

% Actual page objects

4 0 obj << /Type /Page /Contents 40 0 R /Parent 3 0 R >> endobj

5 0 obj << /Type /Page /Contents 41 0 R /Parent 3 0 R >> endobj

% ... and so on

再帰的なページツリーのトラバーサルを実装する。

// CORRECT: Recursive page tree traversal

function GetPagesInCorrectOrder(Doc: TPDFDocument): TPageArray;

var

CatalogObj, RootPagesObj: TPDFObject;

PageList: TList;

begin

PageList := TList.Create;

try

// Step 1: Find the document catalog

CatalogObj := Doc.FindObject('/Type', '/Catalog');

if CatalogObj = nil then

raise Exception.Create('Document catalog not found');

// Step 2: Get the root Pages object

RootPagesObj := CatalogObj.GetIndirectObject('/Pages');

if RootPagesObj = nil then

raise Exception.Create('Root Pages object not found');

// Step 3: Recursively traverse the page tree

TraversePagesTree(RootPagesObj, PageList);

// Step 4: Convert list to array

SetLength(Result, PageList.Count);

for i := 0 to PageList.Count - 1 do

Result[i] := TPDFObject(PageList[i]);

finally

PageList.Free;

end;

procedure TraversePagesTree(PagesObj: TPDFObject; PageList: TList);

var

KidsArray: TPDFArray;

i: Integer;

ChildObj: TPDFObject;

ChildType: string;

begin

if PagesObj = nil then Exit;

// Get the Kids array from this Pages object

KidsArray := PagesObj.GetArray('/Kids');

if KidsArray = nil then Exit;

// Process each child in the Kids array

for i := 0 to KidsArray.Count - 1 do

begin

ChildObj := KidsArray.GetIndirectObject(i);

if ChildObj = nil then Continue;

ChildType := ChildObj.GetValue('/Type');

if ChildType = '/Page' then

begin

// This is a leaf page object - add it to our list

PageList.Add(ChildObj);

end

else if ChildType = '/Pages' then

begin

// This is an intermediate Pages object - recurse into it

TraversePagesTree(ChildObj, PageList);

end

else

begin

// Unexpected object type in Kids array

raise Exception.CreateFmt('Unexpected object type in Kids array: %s', [ChildType]);

end;

実際のPDFのバリエーションと特殊ケースの処理

実際のPDF ファイルは、仕様で記述されている理想的な構造から逸脱することがよくあります。堅牢なPDF処理ライブラリは、これらのバリエーションを適切に処理する必要があります。

構造上の一般的な異常

1. 欠損または破損したカタログ

% PDF with missing catalog reference

%PDF-1.4

% Object 1 should be catalog but is missing or corrupted

2 0 obj

<< /Type /Pages /Kids [3 0 R 4 0 R] /Count 2 >>

endobj

2. 循環参照

% PDF with circular page tree references (corrupted)

2 0 obj

<< /Type /Pages /Kids [3 0 R] /Count 1 /Parent 3 0 R >>

endobj

3 0 obj

<< /Type /Pages /Kids [2 0 R] /Count 1 /Parent 2 0 R >>

endobj

3. 一貫性のないカウント値

% PDF with incorrect Count value

2 0 obj

<< /Type /Pages /Kids [3 0 R 4 0 R 5 0 R] /Count 5 >>

% Count says 5 but Kids array has only 3 elements

endobj

堅牢なエラー処理の実装

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

// Robust page tree traversal with comprehensive error handling

function GetPagesWithFallbacks(Doc: TPDFDocument): TPageArray;

var

AttemptCount: Integer;

ErrorMessages: TStringList;

begin

ErrorMessages := TStringList.Create;

try

AttemptCount := 0;

// Attempt 1: Standard PDF specification approach

Inc(AttemptCount);

try

Result := GetPagesViaStandardTraversal(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with standard traversal (attempt %d)', [AttemptCount]));

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// Attempt 2: Search for Pages objects and try each one

Inc(AttemptCount);

try

Result := GetPagesViaObjectSearch(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with object search (attempt %d)', [AttemptCount]));

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// Attempt 3: Brute force search for Page objects

Inc(AttemptCount);

try

Result := GetPagesViaBruteForce(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with brute force search (attempt %d)', [AttemptCount]));

LogMessage('Warning: Document structure is non-standard');

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// All attempts failed

raise Exception.Create('Failed to extract pages from PDF. Errors: ' +

ErrorMessages.Text);

finally

ErrorMessages.Free;

end;

function GetPagesViaObjectSearch(Doc: TPDFDocument): TPageArray;

var

i: Integer;

Obj: TPDFObject;

KidsArray: TPDFArray;

PageList: TList;

CandidateObjects: TList;

begin

CandidateObjects := TList.Create;

PageList := TList.Create;

try

// Find all objects that could be Pages objects

for i := 0 to Doc.Objects.Count - 1 do

begin

Obj := Doc.Objects[i];

if (Obj <> nil) and

(Obj.GetValue('/Type') = '/Pages') and

Obj.HasKey('/Kids') then

begin

CandidateObjects.Add(Obj);

end;

// Try each candidate Pages object

for i := 0 to CandidateObjects.Count - 1 do

begin

Obj := TPDFObject(CandidateObjects[i]);

KidsArray := Obj.GetArray('/Kids');

if (KidsArray <> nil) and (KidsArray.Count > 0) then

begin

// Validate that this Kids array contains actual pages

if ValidateKidsArray(KidsArray) then

begin

PageList.Clear;

TraversePagesTree(Obj, PageList);

if PageList.Count > 0 then

begin

// Found valid pages - convert to array and return

SetLength(Result, PageList.Count);

for j := 0 to PageList.Count - 1 do

Result[j] := TPDFObject(PageList[j]);

Exit;

end;

// No valid Pages object found

SetLength(Result, 0);

finally

CandidateObjects.Free;

PageList.Free;

end;

性能最適化策略

大量のPDF ファイルを処理する場合や、大量のドキュメント処理を行う場合、パフォーマンスが重要な考慮事項になります。

遅延読み込みとキャッシュ

// Performance-optimized page access with caching

type

TPDFPageCache = class

private

FPages: array of TPDFPage;

FPageObjects: array of TPDFObject;

FCacheHits: Integer;

FCacheMisses: Integer;

FMaxCacheSize: Integer;

public

constructor Create(MaxCacheSize: Integer = 100);

destructor Destroy; override;

function GetPage(Index: Integer): TPDFPage;

procedure ClearCache;

procedure GetCacheStatistics(out Hits, Misses: Integer);

end;

function TPDFPageCache.GetPage(Index: Integer): TPDFPage;

begin

// Check if page is already cached

if (Index >= 0) and (Index < Length(FPages)) and

(FPages[Index] <> nil) then

begin

Inc(FCacheHits);

Result := FPages[Index];

Exit;

end;

Inc(FCacheMisses);

// Load page from object if not cached

if (Index >= 0) and (Index < Length(FPageObjects)) and

(FPageObjects[Index] <> nil) then

begin

Result := TPDFPage.CreateFromObject(FPageObjects[Index]);

// Cache the page if we have room

if Length(FPages) < FMaxCacheSize then begin if Index >= Length(FPages) then

SetLength(FPages, Index + 1);

FPages[Index] := Result;

end;

end

else

begin

Result := nil;

end;

大規模ドキュメント向けのストリーミング処理

// Streaming approach for processing large PDF documents

procedure ProcessLargePDFInChunks(const FileName: string; ChunkSize: Integer = 50);

var

Doc: TPDFDocument;

TotalPages: Integer;

ChunkStart, ChunkEnd: Integer;

i: Integer;

begin

Doc := TPDFDocument.Create;

try

Doc.LoadFromFile(FileName);

TotalPages := Doc.GetPageCount;

LogMessage(Format('Processing %d pages in chunks of %d', [TotalPages, ChunkSize]));

ChunkStart := 0;

while ChunkStart < TotalPages do

begin

ChunkEnd := Min(ChunkStart + ChunkSize - 1, TotalPages - 1);

LogMessage(Format('Processing chunk: pages %d-%d', [ChunkStart + 1, ChunkEnd + 1]));

// Process this chunk of pages

for i := ChunkStart to ChunkEnd do

begin

ProcessSinglePage(Doc, i);

end;

// Optional: Force garbage collection between chunks

if (ChunkStart mod (ChunkSize * 4)) = 0 then

begin

ForceGarbageCollection;

end;

ChunkStart := ChunkEnd + 1;

end;

finally

Doc.Free;

end;

高度なPDF構造解析

複雑なPDF処理要件に取り組む開発者にとって、高度な構造要素の理解は非常に重要です。

ページ継承とリソース管理

PDFのページは、親のPagesオブジェクトからプロパティを継承し、階層的なリソース管理システムを構築できます。

% Example of page inheritance in PDF structure

2 0 obj

<< /Type /Pages

/Kids [3 0 R 4 0 R]

/Count 2

/MediaBox [0 0 612 792]

/Resources <<

/Font << /F1 10 0 R >>

/ProcSet [/PDF /Text]

>> >>

endobj

% Child page inherits MediaBox and Resources from parent

3 0 obj

<< /Type /Page

/Parent 2 0 R

/Contents 5 0 R >>

% This page inherits MediaBox [0 0 612 792] and Resources from parent

endobj

% Child page overrides inherited MediaBox

4 0 obj

<< /Type /Page

/Parent 2 0 R

/Contents 6 0 R

/MediaBox [0 0 792 612] >>

% This page overrides MediaBox but still inherits Resources

endobj

コードでのページ継承の処理

// Proper handling of page inheritance

function GetEffectivePageProperties(PageObj: TPDFObject): TPDFPageProperties;

var

CurrentObj: TPDFObject;

MediaBox: TPDFArray;

Resources: TPDFObject;

begin

// Initialize result

Result := TPDFPageProperties.Create;

// Walk up the parent chain to collect inherited properties

CurrentObj := PageObj;

while CurrentObj <> nil do

begin

// Check for MediaBox at this level

if Result.MediaBox.IsEmpty then

begin

MediaBox := CurrentObj.GetArray('/MediaBox');

if MediaBox <> nil then

Result.MediaBox := MediaBox;

end;

// Check for Resources at this level

if Result.Resources = nil then

begin

Resources := CurrentObj.GetDictionary('/Resources');

if Resources <> nil then

Result.Resources := Resources;

end;

// Check for other inheritable properties

CheckForInheritableProperty(CurrentObj, '/Rotate', Result.Rotate);

CheckForInheritableProperty(CurrentObj, '/CropBox', Result.CropBox);

// Move to parent object

CurrentObj := CurrentObj.GetIndirectObject('/Parent');

// Prevent infinite loops in corrupted PDFs

if CurrentObj = PageObj then

break;

end;

// Validate that we found required properties

if Result.MediaBox.IsEmpty then

raise Exception.Create('No MediaBox found in page inheritance chain');

end;

PDF ページ順序のテスト戦略

PDF ページの順序を扱う際には、さまざまなドキュメント構造が存在するため、包括的なテストが不可欠です。

包括的なテストスイートの作成

# Comprehensive PDF test case generation script

# Test Case 1: Sequential pages (baseline)

echo "Creating sequential page test..."

pdftk A=template.pdf cat A A A output test-sequential.pdf

# Test Case 2: Non-sequential object IDs

echo "Creating non-sequential object ID test..."

pdftk A=page3.pdf B=page1.pdf C=page2.pdf cat A B C output test-nonsequential.pdf

# Test Case 3: Hierarchical page tree

echo "Creating hierarchical page tree test..."

# This requires custom PDF generation tool

generate-hierarchical-pdf --depth 3 --pages-per-node 2 output test-hierarchical.pdf

# Test Case 4: Large document with mixed structures

echo "Creating large document test..."

pdftk A=large-doc.pdf cat 1-100 50-149 200-299 output test-large-mixed.pdf

# Test Case 5: Corrupted page tree

echo "Creating corrupted page tree test..."

# This requires custom corruption tool

corrupt-pdf-structure --target pages-tree test-sequential.pdf test-corrupted.pdf

# Test Case 6: Minimal single-page document

echo "Creating minimal single-page test..."

pdftk A=template.pdf cat 1 output test-single-page.pdf

自動検証フレームワーク

100

// Comprehensive PDF page ordering validation framework

type

TPDFTestCase = record

FileName: string;

ExpectedPageCount: Integer;

ExpectedPageOrder: array of Integer;

Description: string;

end;

function RunPDFPageOrderingTests: Boolean;

var

TestCases: array of TPDFTestCase;

i: Integer;

PassCount, FailCount: Integer;

begin

// Define test cases

SetLength(TestCases, 6);

TestCases[0].FileName := 'test-sequential.pdf';

TestCases[0].ExpectedPageCount := 3;

TestCases[0].ExpectedPageOrder := [0, 1, 2];

TestCases[0].Description := 'Sequential page ordering';

TestCases[1].FileName := 'test-nonsequential.pdf';

TestCases[1].ExpectedPageCount := 3;

TestCases[1].ExpectedPageOrder := [2, 0, 1]; // Based on how pdftk reorders

TestCases[1].Description := 'Non-sequential object IDs';

// ... define other test cases ...

PassCount := 0;

FailCount := 0;

WriteLn('Running PDF page ordering tests...');

WriteLn('=' * 50);

for i := 0 to High(TestCases) do

begin

Write(Format('Test %d: %s... ', [i + 1, TestCases[i].Description]));

if ValidateTestCase(TestCases[i]) then

begin

WriteLn('PASS');

Inc(PassCount);

end

else

begin

WriteLn('FAIL');

Inc(FailCount);

end;

WriteLn('=' * 50);

WriteLn(Format('Results: %d passed, %d failed', [PassCount, FailCount]));

Result := FailCount = 0;

end;

function ValidateTestCase(const TestCase: TPDFTestCase): Boolean;

var

Doc: TPDFDocument;

ActualPages: TPageArray;

i: Integer;

begin

Result := False;

Doc := TPDFDocument.Create;

try

if not Doc.LoadFromFile(TestCase.FileName) then

begin

WriteLn(Format('Failed to load %s', [TestCase.FileName]));

Exit;

end;

ActualPages := GetPagesInCorrectOrder(Doc);

// Validate page count

if Length(ActualPages) <> TestCase.ExpectedPageCount then

begin

WriteLn(Format('Page count mismatch: expected %d, got %d',

[TestCase.ExpectedPageCount, Length(ActualPages)]));

Exit;

end;

// Validate page order (simplified - in real implementation,

// you'd compare actual page content or identifiers)

for i := 0 to High(ActualPages) do

begin

if not ValidatePageAtPosition(ActualPages[i], TestCase.ExpectedPageOrder[i]) then

begin

WriteLn(Format('Page order mismatch at position %d', [i]));

Exit;

end;

Result := True;

finally

Doc.Free;

end;

PDF処理コードの来性

PDFの標準が進化し、新しいユースケースが登場するにつれて、来の要件に適応できるコードを作成することが重要です。

拡張性を考慮した設計

// Extensible PDF page processing architecture

type

IPDFPageProcessor = interface

['{12345678-1234-1234-1234-123456789012}']

function ProcessPage(Page: TPDFPage; Context: TPDFProcessingContext): Boolean;

function GetProcessorName: string;

function GetSupportedPDFVersions: TStringArray;

end;

TPDFProcessingPipeline = class

private

FProcessors: TList;

FContext: TPDFProcessingContext;

public

constructor Create;

destructor Destroy; override;

procedure RegisterProcessor(Processor: IPDFPageProcessor);

procedure UnregisterProcessor(Processor: IPDFPageProcessor);

function ProcessDocument(Doc: TPDFDocument): Boolean;

end;

function TPDFProcessingPipeline.ProcessDocument(Doc: TPDFDocument): Boolean;

var

Pages: TPageArray;

i, j: Integer;

Page: TPDFPage;

Processor: IPDFPageProcessor;

Success: Boolean;

begin

Result := True;

// Get pages in correct order using our robust method

Pages := GetPagesInCorrectOrder(Doc);

// Process each page through all registered processors

for i := 0 to High(Pages) do

begin

Page := TPDFPage.CreateFromObject(Pages[i]);

try

FContext.CurrentPageIndex := i;

FContext.TotalPages := Length(Pages);

for j := 0 to FProcessors.Count - 1 do

begin

Processor := FProcessors[j];

Success := Processor.ProcessPage(Page, FContext);

if not Success then

begin

LogError(Format('Processor %s failed on page %d',

[Processor.GetProcessorName, i + 1]));

Result := False;

// Continue with other processors/pages or break based on policy

end;

finally

Page.Free;

end;

正しいPDF構造の理解への投資は、サポートの負担を軽減し、ユーザー満足度を向上させ、アプリケーションの寿命全体にわたってメンテナンスを容易にします。PDF ページの順序は、単なる技術的な詳細ではなく、ドキュメントの完全性の基本的な側面であり、ユーザーエクスペリエンスに直接影響します。この複雑さをマスターすれば、ユーザーが最も重要なドキュメントを安心して利用できるPDFアプリケーションを構築できます。

次の記事