PDF 페이지 순서 이해 – PDF 페이지가 없는 이유

PDF 구조의 숨겨진 복잡성

PDF 문서는 최종 사용자에게 보이는 것보다 훨씬 더 복잡합니다. 사용자는 페이지를 논리적이고 순차적인 순서(1, 2, 3...)로 보지만, PDF 파일의 내부 구조는 완전히 다른 이야기를 보여줍니다. 이러한 복잡성은 PDF 처리의 가장 오해받는 측면 중 하나이며, 수많은 버그, 잘못된 구현 및 좌절한 개발자를 야기합니다. 이 포괄적인 기사는 PDF 페이지 구성의 복잡한 세계를 탐구하고, 개발자가 종종 예상치 못한 페이지 순서 문제를 겪는 이유를 설명하며, 강력한 PDF 조작을 위한 실용적인 솔루션을 제공합니다.

PDF 객체 모델: 순차적 문서에서 완전히 다른 패러다임

PDF 페이지 순서 문제에 대한 이해를 높이기 위해서는 PDF가 더 간단한 문서 형식과 얼마나 근본적으로 다른지 이해해야 합니다. 일반 텍스트 파일, HTML 문서 또는 RTF와 같은 이전 형식과 달리 PDF는 콘텐츠 구성과 물리적 저장소가 완전히 분리된 정교한 객체 기반 아키텍처를 사용합니다.

이러한 아키텍처 결정은 여러 가지 중요한 이유로 이루어졌습니다.

유연성: 객체는 중복 없이 여러 위치에서 참조될 수 있습니다.
효율성: 공통 리소스(글꼴, 이미지, 그래픽 상태)는 페이지 전체에서 공유될 수 있습니다.
점진적 업데이트: 문서를 전체 파일을 다시 쓰지 않고 수정할 수 있습니다.
임의 접근: 사용자는 전체 문서를 파싱하지 않고도 원하는 페이지로 바로 이동할 수 있습니다.

그러나 이러한 유연성은 복잡성을 수반하며, 특히 객체 저장 순서와 논리적 페이지 순서 간의 관계를 이해하는 데 어려움을 겪을 수 있습니다.

객체 참조 vs. 표시 순서: 구체적인 예시

다음은 저장과 표시 간의 불일치를 보여주는 일반적인 PDF 구조입니다.

% PDF file structure example - storage order vs. display order

%PDF-1.4

1 0 obj

<< /Type /Catalog /Pages 2 0 R >>

endobj

2 0 obj

<< /Type /Pages /Kids [20 0 R 1 0 R 4 0 R] /Count 3 >>

endobj

% Object 4 appears third in file but represents page 3 in display

4 0 obj

<< /Type /Page

/Contents 5 0 R

/Parent 2 0 R

/MediaBox [0 0 612 792]

/Resources << /Font << /F1 6 0 R >> >> >>

endobj

% Object 20 appears last in file but represents page 1 in display

20 0 obj

<< /Type /Page

/Contents 21 0 R

/Parent 2 0 R

/MediaBox [0 0 612 792]

/Resources << /Font << /F1 6 0 R >> >> >>

endobj

이 예에서 페이지 객체는 객체 4와 20으로 저장되지만, 표시 순서는 Kids 배열 [20, 1, 4]에 의해 정의됩니다. 이를 통해 다음과 같은 매핑이 생성됩니다.

페이지 1 (표시 순서) = 객체 20 (저장 순서: 마지막)
페이지 2 (표시 순서) = 객체 1 (저장 순서: 처음)
3페이지 (표시 순서) = 객체 4 (저장 순서: 세 번째)

이러한 불일치는 우연이 아니며, PDF의 기본적인 특징으로, 정교한 문서 조작 및 최적화를 가능하게 합니다.

PDF 생성기가 순차적이지 않은 객체 순서를 생성하는 이유

PDF 생성기가 순차적이지 않은 객체 순서를 생성하는 이유를 이해하면 개발자는 관련된 복잡성을 파악하고 문서 구조에 대한 잘못된 가정으로 이어질 수 있는 실수를 피할 수 있습니다.

PDF 생성 워크플로우

서로 다른 PDF 생성 워크플로우는 서로 다른 객체 순서 패턴을 생성합니다.

1. 순차적 문서 생성

% Typical output from simple PDF generators

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [3 0 R 4 0 R 5 0 R] /Count 3 >> endobj

3 0 obj << /Type /Page /Contents 6 0 R /Parent 2 0 R >> endobj

4 0 obj << /Type /Page /Contents 7 0 R /Parent 2 0 R >> endobj

5 0 obj << /Type /Page /Contents 8 0 R /Parent 2 0 R >> endobj

2. 최적화된 리소스 공유

% PDF with shared resources created first

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [10 0 R 11 0 R 12 0 R] /Count 3 >> endobj

3 0 obj << /Type /Font /Subtype /Type1 /BaseFont /Helvetica >> endobj

4 0 obj << /Type /XObject /Subtype /Image /Width 100 /Height 100 >> endobj

% ... more shared resources ...

10 0 obj << /Type /Page /Resources << /Font << /F1 3 0 R >> >> >> endobj

11 0 obj << /Type /Page /Resources << /XObject << /Im1 4 0 R >> >> >> endobj

12 0 obj << /Type /Page /Resources << /Font << /F1 3 0 R >> >> >> endobj

3. 점진적인 문서 조립

% Document created by combining existing PDFs

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [100 0 R 25 0 R 75 0 R] /Count 3 >> endobj

% Objects from first source document

25 0 obj << /Type /Page /Contents 26 0 R /Parent 2 0 R >> endobj

% Objects from second source document

75 0 obj << /Type /Page /Contents 76 0 R /Parent 2 0 R >> endobj

% Objects from third source document

100 0 obj << /Type /Page /Contents 101 0 R /Parent 2 0 R >> endobj

일반적인 개발자의 실수와 그 결과

PDF 구조의 복잡성으로 인해 여러 가지 흔한 실수가 발생할 수 있으며, 이는 애플리케이션의 안정성과 사용자 경험에 심각한 영향을 미칠 수 있습니다.

실수 1: 객체 ID 순서가 표시 순서와 동일하다고 가정

PDF 처리 경험이 없는 개발자가 흔히 저지르는 실수 중 하나는 다음과 같습니다.

// WRONG: Processing pages by object ID order

function GetPagesInWrongOrder(Doc: TPDFDocument): TPageList;

var

i: Integer;

Obj: TPDFObject;

begin

Result := TPageList.Create;

// This approach processes pages in storage order, not display order

for i := 0 to Doc.Objects.Count - 1 do

begin

Obj := Doc.Objects[i];

if (Obj <> nil) and (Obj.GetValue('/Type') = '/Page') then

begin

Result.Add(Obj); // Wrong order!

end;

// Result will be in object ID order: [1, 4, 20]

// But display order should be: [20, 1, 4]

end;

이 실수로 인해 발생하는 결과는 다음과 같습니다.

출력 문서에서 페이지 순서가 올바르지 않게 표시됨
페이지 번호가 일관되지 않게 됨
사용자 혼란 및 지원 요청
문서 처리 파이프라인에서 발생할 수 있는 잠재적인 데이터 손상

오류 2: 관찰을 기반으로 한 하드 코딩된 페이지 매핑

개발자가 페이지 순서 문제를 겪을 때, 때로는 관찰된 패턴을 기반으로 하드 코딩된 해결 방법을 구현합니다.

// WRONG: Hard-coded page reordering based on heuristics

function ApplyPageReorderingHeuristics(Pages: TPageArray): TPageArray;

var

i: Integer;

begin

SetLength(Result, Length(Pages));

// Dangerous heuristic based on limited observations

if Length(Pages) = 3 then

begin

// "Fix" for specific 3-page documents observed during testing

Result[0] := Pages[1]; // Put second page first

Result[1] := Pages[2]; // Put third page second

Result[2] := Pages[0]; // Put first page last

end

else if Length(Pages) > 3 then

begin

// Generic "fix" that swaps first and last pages

Result[0] := Pages[Length(Pages) - 1];

Result[Length(Pages) - 1] := Pages[0];

// Keep middle pages in original order

for i := 1 to Length(Pages) - 2 do

Result[i] := Pages[i];

end

else

begin

// For other cases, just copy as-is

for i := 0 to High(Pages) do

Result[i] := Pages[i];

end;

이 접근 방식은 근본적으로 결함이 있습니다.

이는 개발 중에 관찰된 특정 PDF 파일에만 적용됩니다.
구조가 다른 PDF 파일의 경우 심각한 오류가 발생합니다.
예측할 수 없는 동작을 유발하여 사용자가 이해하기 어렵습니다.
더 많은 특수한 경우가 추가될수록 기술 부채가 누적됩니다.

실수 3: 계층적 페이지 트리 무시

많은 개발자는 PDF 페이지 트리가 항상 평면 배열이라고 가정하지만, PDF 사양은 계층적 구조를 허용합니다.

// WRONG: Assuming flat page tree structure

function GetPagesFromFlatTree(PagesObj: TPDFObject): TPageArray;

var

KidsArray: TPDFArray;

i: Integer;

begin

KidsArray := PagesObj.GetArray('/Kids');

if KidsArray = nil then Exit;

SetLength(Result, KidsArray.Count);

for i := 0 to KidsArray.Count - 1 do

begin

// This assumes all Kids entries are Page objects

// But they might be intermediate Pages objects!

Result[i] := KidsArray.GetIndirectObject(i);

end;

올바른 접근 방식: 페이지 트리 구조를 따르기

PDF 페이지 순서를 처리하는 올바른 방법은 PDF 사양을 정확히 따르는 완전한 페이지 트리 순회 구현입니다.

페이지 트리 계층 구조 이해

PDF 페이지 트리는 계층적일 수 있으며, 중간 페이지 객체는 자체 Kids 배열을 포함할 수 있습니다.

% Hierarchical page tree example

1 0 obj

<< /Type /Catalog /Pages 2 0 R >>

endobj

% Root Pages object

2 0 obj

<< /Type /Pages

/Kids [3 0 R 8 0 R 15 0 R]

/Count 7 >>

endobj

% First intermediate Pages object (contains 3 pages)

3 0 obj

<< /Type /Pages

/Kids [4 0 R 5 0 R 6 0 R]

/Count 3

/Parent 2 0 R >>

endobj

% Second intermediate Pages object (contains 2 pages)

8 0 obj

<< /Type /Pages

/Kids [9 0 R 10 0 R]

/Count 2

/Parent 2 0 R >>

endobj

% Third intermediate Pages object (contains 2 pages)

15 0 obj

<< /Type /Pages

/Kids [16 0 R 17 0 R]

/Count 2

/Parent 2 0 R >>

endobj

% Actual page objects

4 0 obj << /Type /Page /Contents 40 0 R /Parent 3 0 R >> endobj

5 0 obj << /Type /Page /Contents 41 0 R /Parent 3 0 R >> endobj

% ... and so on

재귀적 페이지 트리 순회 구현

// CORRECT: Recursive page tree traversal

function GetPagesInCorrectOrder(Doc: TPDFDocument): TPageArray;

var

CatalogObj, RootPagesObj: TPDFObject;

PageList: TList;

begin

PageList := TList.Create;

try

// Step 1: Find the document catalog

CatalogObj := Doc.FindObject('/Type', '/Catalog');

if CatalogObj = nil then

raise Exception.Create('Document catalog not found');

// Step 2: Get the root Pages object

RootPagesObj := CatalogObj.GetIndirectObject('/Pages');

if RootPagesObj = nil then

raise Exception.Create('Root Pages object not found');

// Step 3: Recursively traverse the page tree

TraversePagesTree(RootPagesObj, PageList);

// Step 4: Convert list to array

SetLength(Result, PageList.Count);

for i := 0 to PageList.Count - 1 do

Result[i] := TPDFObject(PageList[i]);

finally

PageList.Free;

end;

procedure TraversePagesTree(PagesObj: TPDFObject; PageList: TList);

var

KidsArray: TPDFArray;

i: Integer;

ChildObj: TPDFObject;

ChildType: string;

begin

if PagesObj = nil then Exit;

// Get the Kids array from this Pages object

KidsArray := PagesObj.GetArray('/Kids');

if KidsArray = nil then Exit;

// Process each child in the Kids array

for i := 0 to KidsArray.Count - 1 do

begin

ChildObj := KidsArray.GetIndirectObject(i);

if ChildObj = nil then Continue;

ChildType := ChildObj.GetValue('/Type');

if ChildType = '/Page' then

begin

// This is a leaf page object - add it to our list

PageList.Add(ChildObj);

end

else if ChildType = '/Pages' then

begin

// This is an intermediate Pages object - recurse into it

TraversePagesTree(ChildObj, PageList);

end

else

begin

// Unexpected object type in Kids array

raise Exception.CreateFmt('Unexpected object type in Kids array: %s', [ChildType]);

end;

실제 PDF 파일의 다양한 변형 및 예외 처리

실제 PDF 파일은 종종 사양에 설명된 이상적인 구조에서 벗어납니다. 강력한 PDF 처리 라이브러리는 이러한 변형을 원활하게 처리해야 합니다.

일반적인 구조적 이상 현상

1. 누락되거나 손상된 카탈로그

% PDF with missing catalog reference

%PDF-1.4

% Object 1 should be catalog but is missing or corrupted

2 0 obj

<< /Type /Pages /Kids [3 0 R 4 0 R] /Count 2 >>

endobj

2. 순환 참조

% PDF with circular page tree references (corrupted)

2 0 obj

<< /Type /Pages /Kids [3 0 R] /Count 1 /Parent 3 0 R >>

endobj

3 0 obj

<< /Type /Pages /Kids [2 0 R] /Count 1 /Parent 2 0 R >>

endobj

3. 일관성 없는 카운트 값

% PDF with incorrect Count value

2 0 obj

<< /Type /Pages /Kids [3 0 R 4 0 R 5 0 R] /Count 5 >>

% Count says 5 but Kids array has only 3 elements

endobj

강력한 오류 처리 구현

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

// Robust page tree traversal with comprehensive error handling

function GetPagesWithFallbacks(Doc: TPDFDocument): TPageArray;

var

AttemptCount: Integer;

ErrorMessages: TStringList;

begin

ErrorMessages := TStringList.Create;

try

AttemptCount := 0;

// Attempt 1: Standard PDF specification approach

Inc(AttemptCount);

try

Result := GetPagesViaStandardTraversal(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with standard traversal (attempt %d)', [AttemptCount]));

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// Attempt 2: Search for Pages objects and try each one

Inc(AttemptCount);

try

Result := GetPagesViaObjectSearch(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with object search (attempt %d)', [AttemptCount]));

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// Attempt 3: Brute force search for Page objects

Inc(AttemptCount);

try

Result := GetPagesViaBruteForce(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with brute force search (attempt %d)', [AttemptCount]));

LogMessage('Warning: Document structure is non-standard');

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// All attempts failed

raise Exception.Create('Failed to extract pages from PDF. Errors: ' +

ErrorMessages.Text);

finally

ErrorMessages.Free;

end;

function GetPagesViaObjectSearch(Doc: TPDFDocument): TPageArray;

var

i: Integer;

Obj: TPDFObject;

KidsArray: TPDFArray;

PageList: TList;

CandidateObjects: TList;

begin

CandidateObjects := TList.Create;

PageList := TList.Create;

try

// Find all objects that could be Pages objects

for i := 0 to Doc.Objects.Count - 1 do

begin

Obj := Doc.Objects[i];

if (Obj <> nil) and

(Obj.GetValue('/Type') = '/Pages') and

Obj.HasKey('/Kids') then

begin

CandidateObjects.Add(Obj);

end;

// Try each candidate Pages object

for i := 0 to CandidateObjects.Count - 1 do

begin

Obj := TPDFObject(CandidateObjects[i]);

KidsArray := Obj.GetArray('/Kids');

if (KidsArray <> nil) and (KidsArray.Count > 0) then

begin

// Validate that this Kids array contains actual pages

if ValidateKidsArray(KidsArray) then

begin

PageList.Clear;

TraversePagesTree(Obj, PageList);

if PageList.Count > 0 then

begin

// Found valid pages - convert to array and return

SetLength(Result, PageList.Count);

for j := 0 to PageList.Count - 1 do

Result[j] := TPDFObject(PageList[j]);

Exit;

end;

// No valid Pages object found

SetLength(Result, 0);

finally

CandidateObjects.Free;

PageList.Free;

end;

성능 최적화 전략

대용량 PDF 파일을 처리하거나 대량의 문서 처리를 수행할 때, 성능은 매우 중요한 고려 사항입니다.

지연 로딩 및 캐싱

// Performance-optimized page access with caching

type

TPDFPageCache = class

private

FPages: array of TPDFPage;

FPageObjects: array of TPDFObject;

FCacheHits: Integer;

FCacheMisses: Integer;

FMaxCacheSize: Integer;

public

constructor Create(MaxCacheSize: Integer = 100);

destructor Destroy; override;

function GetPage(Index: Integer): TPDFPage;

procedure ClearCache;

procedure GetCacheStatistics(out Hits, Misses: Integer);

end;

function TPDFPageCache.GetPage(Index: Integer): TPDFPage;

begin

// Check if page is already cached

if (Index >= 0) and (Index < Length(FPages)) and

(FPages[Index] <> nil) then

begin

Inc(FCacheHits);

Result := FPages[Index];

Exit;

end;

Inc(FCacheMisses);

// Load page from object if not cached

if (Index >= 0) and (Index < Length(FPageObjects)) and

(FPageObjects[Index] <> nil) then

begin

Result := TPDFPage.CreateFromObject(FPageObjects[Index]);

// Cache the page if we have room

if Length(FPages) < FMaxCacheSize then begin if Index >= Length(FPages) then

SetLength(FPages, Index + 1);

FPages[Index] := Result;

end;

end

else

begin

Result := nil;

end;

대용량 문서 처리를 위한 스트리밍 처리

// Streaming approach for processing large PDF documents

procedure ProcessLargePDFInChunks(const FileName: string; ChunkSize: Integer = 50);

var

Doc: TPDFDocument;

TotalPages: Integer;

ChunkStart, ChunkEnd: Integer;

i: Integer;

begin

Doc := TPDFDocument.Create;

try

Doc.LoadFromFile(FileName);

TotalPages := Doc.GetPageCount;

LogMessage(Format('Processing %d pages in chunks of %d', [TotalPages, ChunkSize]));

ChunkStart := 0;

while ChunkStart < TotalPages do

begin

ChunkEnd := Min(ChunkStart + ChunkSize - 1, TotalPages - 1);

LogMessage(Format('Processing chunk: pages %d-%d', [ChunkStart + 1, ChunkEnd + 1]));

// Process this chunk of pages

for i := ChunkStart to ChunkEnd do

begin

ProcessSinglePage(Doc, i);

end;

// Optional: Force garbage collection between chunks

if (ChunkStart mod (ChunkSize * 4)) = 0 then

begin

ForceGarbageCollection;

end;

ChunkStart := ChunkEnd + 1;

end;

finally

Doc.Free;

end;

고급 PDF 구조 분석

복잡한 PDF 처리 요구 사항을 가진 개발자는 고급 구조 요소에 대한 이해가 중요합니다.

페이지 상속 및 리소스 관리

PDF 페이지는 부모 Pages 객체로부터 속성을 상속받아 계층적인 리소스 관리 시스템을 구축할 수 있습니다.

% Example of page inheritance in PDF structure

2 0 obj

<< /Type /Pages

/Kids [3 0 R 4 0 R]

/Count 2

/MediaBox [0 0 612 792]

/Resources <<

/Font << /F1 10 0 R >>

/ProcSet [/PDF /Text]

>> >>

endobj

% Child page inherits MediaBox and Resources from parent

3 0 obj

<< /Type /Page

/Parent 2 0 R

/Contents 5 0 R >>

% This page inherits MediaBox [0 0 612 792] and Resources from parent

endobj

% Child page overrides inherited MediaBox

4 0 obj

<< /Type /Page

/Parent 2 0 R

/Contents 6 0 R

/MediaBox [0 0 792 612] >>

% This page overrides MediaBox but still inherits Resources

endobj

코드에서 페이지 상속 처리

// Proper handling of page inheritance

function GetEffectivePageProperties(PageObj: TPDFObject): TPDFPageProperties;

var

CurrentObj: TPDFObject;

MediaBox: TPDFArray;

Resources: TPDFObject;

begin

// Initialize result

Result := TPDFPageProperties.Create;

// Walk up the parent chain to collect inherited properties

CurrentObj := PageObj;

while CurrentObj <> nil do

begin

// Check for MediaBox at this level

if Result.MediaBox.IsEmpty then

begin

MediaBox := CurrentObj.GetArray('/MediaBox');

if MediaBox <> nil then

Result.MediaBox := MediaBox;

end;

// Check for Resources at this level

if Result.Resources = nil then

begin

Resources := CurrentObj.GetDictionary('/Resources');

if Resources <> nil then

Result.Resources := Resources;

end;

// Check for other inheritable properties

CheckForInheritableProperty(CurrentObj, '/Rotate', Result.Rotate);

CheckForInheritableProperty(CurrentObj, '/CropBox', Result.CropBox);

// Move to parent object

CurrentObj := CurrentObj.GetIndirectObject('/Parent');

// Prevent infinite loops in corrupted PDFs

if CurrentObj = PageObj then

break;

end;

// Validate that we found required properties

if Result.MediaBox.IsEmpty then

raise Exception.Create('No MediaBox found in page inheritance chain');

end;

PDF 페이지 순서 테스트 전략

PDF 페이지 순서 처리 시, 다양한 문서 구조의 가능성을 고려하여 종합적인 테스트가 필수적입니다.

종합적인 테스트 스위트 생성

# Comprehensive PDF test case generation script

# Test Case 1: Sequential pages (baseline)

echo "Creating sequential page test..."

pdftk A=template.pdf cat A A A output test-sequential.pdf

# Test Case 2: Non-sequential object IDs

echo "Creating non-sequential object ID test..."

pdftk A=page3.pdf B=page1.pdf C=page2.pdf cat A B C output test-nonsequential.pdf

# Test Case 3: Hierarchical page tree

echo "Creating hierarchical page tree test..."

# This requires custom PDF generation tool

generate-hierarchical-pdf --depth 3 --pages-per-node 2 output test-hierarchical.pdf

# Test Case 4: Large document with mixed structures

echo "Creating large document test..."

pdftk A=large-doc.pdf cat 1-100 50-149 200-299 output test-large-mixed.pdf

# Test Case 5: Corrupted page tree

echo "Creating corrupted page tree test..."

# This requires custom corruption tool

corrupt-pdf-structure --target pages-tree test-sequential.pdf test-corrupted.pdf

# Test Case 6: Minimal single-page document

echo "Creating minimal single-page test..."

pdftk A=template.pdf cat 1 output test-single-page.pdf

자동화된 검증 프레임워크

100

// Comprehensive PDF page ordering validation framework

type

TPDFTestCase = record

FileName: string;

ExpectedPageCount: Integer;

ExpectedPageOrder: array of Integer;

Description: string;

end;

function RunPDFPageOrderingTests: Boolean;

var

TestCases: array of TPDFTestCase;

i: Integer;

PassCount, FailCount: Integer;

begin

// Define test cases

SetLength(TestCases, 6);

TestCases[0].FileName := 'test-sequential.pdf';

TestCases[0].ExpectedPageCount := 3;

TestCases[0].ExpectedPageOrder := [0, 1, 2];

TestCases[0].Description := 'Sequential page ordering';

TestCases[1].FileName := 'test-nonsequential.pdf';

TestCases[1].ExpectedPageCount := 3;

TestCases[1].ExpectedPageOrder := [2, 0, 1]; // Based on how pdftk reorders

TestCases[1].Description := 'Non-sequential object IDs';

// ... define other test cases ...

PassCount := 0;

FailCount := 0;

WriteLn('Running PDF page ordering tests...');

WriteLn('=' * 50);

for i := 0 to High(TestCases) do

begin

Write(Format('Test %d: %s... ', [i + 1, TestCases[i].Description]));

if ValidateTestCase(TestCases[i]) then

begin

WriteLn('PASS');

Inc(PassCount);

end

else

begin

WriteLn('FAIL');

Inc(FailCount);

end;

WriteLn('=' * 50);

WriteLn(Format('Results: %d passed, %d failed', [PassCount, FailCount]));

Result := FailCount = 0;

end;

function ValidateTestCase(const TestCase: TPDFTestCase): Boolean;

var

Doc: TPDFDocument;

ActualPages: TPageArray;

i: Integer;

begin

Result := False;

Doc := TPDFDocument.Create;

try

if not Doc.LoadFromFile(TestCase.FileName) then

begin

WriteLn(Format('Failed to load %s', [TestCase.FileName]));

Exit;

end;

ActualPages := GetPagesInCorrectOrder(Doc);

// Validate page count

if Length(ActualPages) <> TestCase.ExpectedPageCount then

begin

WriteLn(Format('Page count mismatch: expected %d, got %d',

[TestCase.ExpectedPageCount, Length(ActualPages)]));

Exit;

end;

// Validate page order (simplified - in real implementation,

// you'd compare actual page content or identifiers)

for i := 0 to High(ActualPages) do

begin

if not ValidatePageAtPosition(ActualPages[i], TestCase.ExpectedPageOrder[i]) then

begin

WriteLn(Format('Page order mismatch at position %d', [i]));

Exit;

end;

Result := True;

finally

Doc.Free;

end;

PDF 처리 코드의 미래 대비

PDF 표준이 진화하고 새로운 사용 사례가 등장함에 따라, 향후 요구 사항에 적응할 수 있는 코드를 작성하는 것이 중요합니다.

확장성을 고려한 설계

// Extensible PDF page processing architecture

type

IPDFPageProcessor = interface

['{12345678-1234-1234-1234-123456789012}']

function ProcessPage(Page: TPDFPage; Context: TPDFProcessingContext): Boolean;

function GetProcessorName: string;

function GetSupportedPDFVersions: TStringArray;

end;

TPDFProcessingPipeline = class

private

FProcessors: TList;

FContext: TPDFProcessingContext;

public

constructor Create;

destructor Destroy; override;

procedure RegisterProcessor(Processor: IPDFPageProcessor);

procedure UnregisterProcessor(Processor: IPDFPageProcessor);

function ProcessDocument(Doc: TPDFDocument): Boolean;

end;

function TPDFProcessingPipeline.ProcessDocument(Doc: TPDFDocument): Boolean;

var

Pages: TPageArray;

i, j: Integer;

Page: TPDFPage;

Processor: IPDFPageProcessor;

Success: Boolean;

begin

Result := True;

// Get pages in correct order using our robust method

Pages := GetPagesInCorrectOrder(Doc);

// Process each page through all registered processors

for i := 0 to High(Pages) do

begin

Page := TPDFPage.CreateFromObject(Pages[i]);

try

FContext.CurrentPageIndex := i;

FContext.TotalPages := Length(Pages);

for j := 0 to FProcessors.Count - 1 do

begin

Processor := FProcessors[j];

Success := Processor.ProcessPage(Page, FContext);

if not Success then

begin

LogError(Format('Processor %s failed on page %d',

[Processor.GetProcessorName, i + 1]));

Result := False;

// Continue with other processors/pages or break based on policy

end;

finally

Page.Free;

end;

적절한 PDF 구조 이해에 대한 투자는 지원 부담 감소, 사용자 만족도 향상 및 애플리케이션 수명 동안의 유지 관리 용이성이라는 이점을 가져다줍니다. PDF 페이지 순서는 단순한 기술적인 세부 사항이 아니라, 문서의 무결성의 근본적인 측면이며, 이는 사용자 경험에 직접적인 영향을 미칩니다. 이러한 복잡성을 마스터하면 사용자가 가장 중요한 문서를 신뢰할 수 있는 PDF 애플리케이션을 구축할 수 있습니다.

다음 글