فهم ترتيب صفحات PDF - لماذا لا توجد صفحات PDF الخاصة بك

التعقيد الخفي وراء هيكل ملفات PDF.

ملفات PDF أكثر تعقيدًا مما يبدو للمستخدمين النهائيين. بينما يرى المستخدمون الصفحات بترتيب منطقي وتسلسلي (1، 2، 3...)، فإن البنية الداخلية لملف PDF تحكي قصة مختلفة تمامًا. هذا التعقيد هو أحد الجوانب الأكثر سوء فهم في معالجة ملفات PDF، مما يؤدي إلى العديد من الأخطاء والتنفيذات غير الصحيحة وإحباط المطورين. تستكشف هذه المقالة الشاملة عالم تنظيم صفحات PDF المعقد، وتشرح سبب مواجهة المطورين بشكل متكرر لمشاكل ترتيب الصفحات غير المتوقعة، وتقدم حلولًا عملية لمعالجة ملفات PDF بشكل قوي.

نموذج كائن PDF: تحول نموذجي من المستندات التسلسلية.

لفهم تحديات ترتيب صفحات PDF، يجب أولاً أن ندرك مدى اختلاف PDF بشكل أساسي عن تنسيقات المستندات الأبسط. على عكس ملفات النصوص العادية أو مستندات HTML أو حتى التنسيقات القديمة مثل RTF، تستخدم PDF بنية كائن متطورة حيث يتم فصل تنظيم المحتوى والتخزين الفعلي تمامًا.

تم اتخاذ هذا القرار المعماري لعدة أسباب مهمة:

المرونة: يمكن الإشارة إلى الكائنات من مواقع متعددة دون تكرار.
الكفاءة: يمكن مشاركة الموارد الشائعة (الخطوط والصور وحالات الرسومات) عبر الصفحات.
التحديثات التدريجية: يمكن تعديل المستندات دون إعادة كتابة الملف بأكمله.
الوصول العشوائي: يمكن للمشاهدين الانتقال إلى أي صفحة دون الحاجة إلى تحليل المستند بأكمله.

ومع ذلك، فإن هذه المرونة تأتي على حساب التعقيد، خاصةً عندما يتعلق الأمر بفهم العلاقة بين ترتيب تخزين الكائنات وتسلسل الصفحات المنطقي.

المراجع إلى الكائنات مقابل ترتيب العرض: مثال ملموس.

ضع في اعتبارك هذا الهيكل النموذجي لملفات PDF الذي يوضح الانفصال بين التخزين والعرض:

% PDF file structure example - storage order vs. display order

%PDF-1.4

1 0 obj

<< /Type /Catalog /Pages 2 0 R >>

endobj

2 0 obj

<< /Type /Pages /Kids [20 0 R 1 0 R 4 0 R] /Count 3 >>

endobj

% Object 4 appears third in file but represents page 3 in display

4 0 obj

<< /Type /Page

/Contents 5 0 R

/Parent 2 0 R

/MediaBox [0 0 612 792]

/Resources << /Font << /F1 6 0 R >> >> >>

endobj

% Object 20 appears last in file but represents page 1 in display

20 0 obj

<< /Type /Page

/Contents 21 0 R

/Parent 2 0 R

/MediaBox [0 0 612 792]

/Resources << /Font << /F1 6 0 R >> >> >>

endobj

في هذا المثال، يتم تخزين كائنات الصفحات ككائنات 4 و 20، ولكن ترتيب العرض يتم تعريفه بواسطة المصفوفة "Kids": [20, 1, 4]. هذا يخلق الت mapping التالي:

الصفحة 1 (ترتيب العرض) = الكائن 20 (ترتيب التخزين: الأخير).
الصفحة 2 (ترتيب العرض) = الكائن 1 (ترتيب التخزين: الأول).
الصفحة 3 (ترتيب العرض) = الكائن 4 (ترتيب التخزين: الثالث).

هذا الانفصال ليس عشوائياً، بل هو ميزة أساسية في PDF تتيح معالجة وتحسين مستندات متقدمة.

لماذا تقوم مولدات PDF بإنشاء ترتيبات كائنات غير متسلسلة.

فهم سبب قيام مولدات PDF بإنشاء ترتيبات كائنات غير متسلسلة يساعد المطورين على تقدير التعقيد الذي يتعاملون معه وتجنب إجراء افتراضات غير صحيحة حول هيكل المستند.

سير عمل إنشاء ملفات PDF.

تؤدي سير العمل المختلفة لإنشاء ملفات PDF إلى أنماط ترتيب كائنات مختلفة:

1. إنشاء مستند متسلسل.

% Typical output from simple PDF generators

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [3 0 R 4 0 R 5 0 R] /Count 3 >> endobj

3 0 obj << /Type /Page /Contents 6 0 R /Parent 2 0 R >> endobj

4 0 obj << /Type /Page /Contents 7 0 R /Parent 2 0 R >> endobj

5 0 obj << /Type /Page /Contents 8 0 R /Parent 2 0 R >> endobj

2. مشاركة موارد محسنة.

% PDF with shared resources created first

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [10 0 R 11 0 R 12 0 R] /Count 3 >> endobj

3 0 obj << /Type /Font /Subtype /Type1 /BaseFont /Helvetica >> endobj

4 0 obj << /Type /XObject /Subtype /Image /Width 100 /Height 100 >> endobj

% ... more shared resources ...

10 0 obj << /Type /Page /Resources << /Font << /F1 3 0 R >> >> >> endobj

11 0 obj << /Type /Page /Resources << /XObject << /Im1 4 0 R >> >> >> endobj

12 0 obj << /Type /Page /Resources << /Font << /F1 3 0 R >> >> >> endobj

3. تجميع المستندات التدريجي.

% Document created by combining existing PDFs

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [100 0 R 25 0 R 75 0 R] /Count 3 >> endobj

% Objects from first source document

25 0 obj << /Type /Page /Contents 26 0 R /Parent 2 0 R >> endobj

% Objects from second source document

75 0 obj << /Type /Page /Contents 76 0 R /Parent 2 0 R >> endobj

% Objects from third source document

100 0 obj << /Type /Page /Contents 101 0 R /Parent 2 0 R >> endobj

الأخطاء الشائعة التي يرتكبها المطورون ونتائجها.

تعقيد هيكل ملفات PDF يؤدي إلى عدة أخطاء شائعة يمكن أن يكون لها عواقب وخيمة على موثوقية التطبيق وتجربة المستخدم.

الخطأ الأول: افتراض أن ترتيب معرفات الكائنات يساوي ترتيب العرض.

ربما هذا هو أكثر الأخطاء شيوعًا التي يرتكبها المطورون الجدد في معالجة ملفات PDF:

// WRONG: Processing pages by object ID order

function GetPagesInWrongOrder(Doc: TPDFDocument): TPageList;

var

i: Integer;

Obj: TPDFObject;

begin

Result := TPageList.Create;

// This approach processes pages in storage order, not display order

for i := 0 to Doc.Objects.Count - 1 do

begin

Obj := Doc.Objects[i];

if (Obj <> nil) and (Obj.GetValue('/Type') = '/Page') then

begin

Result.Add(Obj); // Wrong order!

end;

// Result will be in object ID order: [1, 4, 20]

// But display order should be: [20, 1, 4]

end;

تتضمن عواقب هذا الخطأ:

تظهر الصفحات بترتيب غير صحيح في المستندات الناتجة.
يصبح ترقيم الصفحات غير متسق.
ارتباك المستخدم وطلبات الدعم.
احتمال حدوث تلف في البيانات في مسارات معالجة المستندات.

الخطأ الثاني: ربط الصفحات الثابت بناءً على الملاحظات.

عندما يواجه المطورون مشاكل في ترتيب الصفحات، يقومون أحيانًا بتطبيق حلول ثابتة بناءً على الأنماط المرصودة:

// WRONG: Hard-coded page reordering based on heuristics

function ApplyPageReorderingHeuristics(Pages: TPageArray): TPageArray;

var

i: Integer;

begin

SetLength(Result, Length(Pages));

// Dangerous heuristic based on limited observations

if Length(Pages) = 3 then

begin

// "Fix" for specific 3-page documents observed during testing

Result[0] := Pages[1]; // Put second page first

Result[1] := Pages[2]; // Put third page second

Result[2] := Pages[0]; // Put first page last

end

else if Length(Pages) > 3 then

begin

// Generic "fix" that swaps first and last pages

Result[0] := Pages[Length(Pages) - 1];

Result[Length(Pages) - 1] := Pages[0];

// Keep middle pages in original order

for i := 1 to Length(Pages) - 2 do

Result[i] := Pages[i];

end

else

begin

// For other cases, just copy as-is

for i := 0 to High(Pages) do

Result[i] := Pages[i];

end;

هذه الطريقة معيبة بشكل أساسي لأن:

إنها تعمل فقط مع ملفات PDF المحددة التي تمت ملاحظتها أثناء التطوير.
تفشل بشكل كارثي مع ملفات PDF ذات هياكل مختلفة.
إنها تخلق سلوكًا غير متوقع لا يستطيع المستخدمون فهمه.
تتراكم الديون التقنية مع إضافة المزيد من الحالات الخاصة.

الخطأ الثالث: تجاهل أشجار الصفحات الهرمية.

يفترض العديد من المطورين أن أشجار صفحات PDF هي دائمًا مصفوفات مسطحة، ولكن مواصفات PDF تسمح بهياكل هرمية:

// WRONG: Assuming flat page tree structure

function GetPagesFromFlatTree(PagesObj: TPDFObject): TPageArray;

var

KidsArray: TPDFArray;

i: Integer;

begin

KidsArray := PagesObj.GetArray('/Kids');

if KidsArray = nil then Exit;

SetLength(Result, KidsArray.Count);

for i := 0 to KidsArray.Count - 1 do

begin

// This assumes all Kids entries are Page objects

// But they might be intermediate Pages objects!

Result[i] := KidsArray.GetIndirectObject(i);

end;

النهج الصحيح: اتباع هيكل شجرة الصفحات.

الطريقة الصحيحة للتعامل مع ترتيب صفحات PDF هي تنفيذ اجتياز كامل لشجرة الصفحات يتبع مواصفات PDF بدقة.

فهم التسلسل الهرمي لشجرة الصفحات.

يمكن أن تكون أشجار صفحات PDF هرمية، حيث تحتوي كائنات الصفحات الوسيطة على مصفوفات "أبناء" خاصة بها.

% Hierarchical page tree example

1 0 obj

<< /Type /Catalog /Pages 2 0 R >>

endobj

% Root Pages object

2 0 obj

<< /Type /Pages

/Kids [3 0 R 8 0 R 15 0 R]

/Count 7 >>

endobj

% First intermediate Pages object (contains 3 pages)

3 0 obj

<< /Type /Pages

/Kids [4 0 R 5 0 R 6 0 R]

/Count 3

/Parent 2 0 R >>

endobj

% Second intermediate Pages object (contains 2 pages)

8 0 obj

<< /Type /Pages

/Kids [9 0 R 10 0 R]

/Count 2

/Parent 2 0 R >>

endobj

% Third intermediate Pages object (contains 2 pages)

15 0 obj

<< /Type /Pages

/Kids [16 0 R 17 0 R]

/Count 2

/Parent 2 0 R >>

endobj

% Actual page objects

4 0 obj << /Type /Page /Contents 40 0 R /Parent 3 0 R >> endobj

5 0 obj << /Type /Page /Contents 41 0 R /Parent 3 0 R >> endobj

% ... and so on

تنفيذ اجتياز تكراري لشجرة الصفحات.

// CORRECT: Recursive page tree traversal

function GetPagesInCorrectOrder(Doc: TPDFDocument): TPageArray;

var

CatalogObj, RootPagesObj: TPDFObject;

PageList: TList;

begin

PageList := TList.Create;

try

// Step 1: Find the document catalog

CatalogObj := Doc.FindObject('/Type', '/Catalog');

if CatalogObj = nil then

raise Exception.Create('Document catalog not found');

// Step 2: Get the root Pages object

RootPagesObj := CatalogObj.GetIndirectObject('/Pages');

if RootPagesObj = nil then

raise Exception.Create('Root Pages object not found');

// Step 3: Recursively traverse the page tree

TraversePagesTree(RootPagesObj, PageList);

// Step 4: Convert list to array

SetLength(Result, PageList.Count);

for i := 0 to PageList.Count - 1 do

Result[i] := TPDFObject(PageList[i]);

finally

PageList.Free;

end;

procedure TraversePagesTree(PagesObj: TPDFObject; PageList: TList);

var

KidsArray: TPDFArray;

i: Integer;

ChildObj: TPDFObject;

ChildType: string;

begin

if PagesObj = nil then Exit;

// Get the Kids array from this Pages object

KidsArray := PagesObj.GetArray('/Kids');

if KidsArray = nil then Exit;

// Process each child in the Kids array

for i := 0 to KidsArray.Count - 1 do

begin

ChildObj := KidsArray.GetIndirectObject(i);

if ChildObj = nil then Continue;

ChildType := ChildObj.GetValue('/Type');

if ChildType = '/Page' then

begin

// This is a leaf page object - add it to our list

PageList.Add(ChildObj);

end

else if ChildType = '/Pages' then

begin

// This is an intermediate Pages object - recurse into it

TraversePagesTree(ChildObj, PageList);

end

else

begin

// Unexpected object type in Kids array

raise Exception.CreateFmt('Unexpected object type in Kids array: %s', [ChildType]);

end;

التعامل مع الاختلافات والحالات الخاصة في ملفات PDF الواقعية.

غالبًا ما تختلف ملفات PDF الواقعية عن الهيكل المثالي الموصوف في المواصفات. يجب أن تتعامل مكتبة معالجة PDF قوية مع هذه الاختلافات بسلاسة.

التشوهات الهيكلية الشائعة.

1. فهرس مفقود أو تالف.

% PDF with missing catalog reference

%PDF-1.4

% Object 1 should be catalog but is missing or corrupted

2 0 obj

<< /Type /Pages /Kids [3 0 R 4 0 R] /Count 2 >>

endobj

2. مراجع دائرية.

% PDF with circular page tree references (corrupted)

2 0 obj

<< /Type /Pages /Kids [3 0 R] /Count 1 /Parent 3 0 R >>

endobj

3 0 obj

<< /Type /Pages /Kids [2 0 R] /Count 1 /Parent 2 0 R >>

endobj

3. قيم عد غير متسقة.

% PDF with incorrect Count value

2 0 obj

<< /Type /Pages /Kids [3 0 R 4 0 R 5 0 R] /Count 5 >>

% Count says 5 but Kids array has only 3 elements

endobj

تطبيق معالجة أخطاء قوية.

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

// Robust page tree traversal with comprehensive error handling

function GetPagesWithFallbacks(Doc: TPDFDocument): TPageArray;

var

AttemptCount: Integer;

ErrorMessages: TStringList;

begin

ErrorMessages := TStringList.Create;

try

AttemptCount := 0;

// Attempt 1: Standard PDF specification approach

Inc(AttemptCount);

try

Result := GetPagesViaStandardTraversal(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with standard traversal (attempt %d)', [AttemptCount]));

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// Attempt 2: Search for Pages objects and try each one

Inc(AttemptCount);

try

Result := GetPagesViaObjectSearch(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with object search (attempt %d)', [AttemptCount]));

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// Attempt 3: Brute force search for Page objects

Inc(AttemptCount);

try

Result := GetPagesViaBruteForce(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with brute force search (attempt %d)', [AttemptCount]));

LogMessage('Warning: Document structure is non-standard');

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// All attempts failed

raise Exception.Create('Failed to extract pages from PDF. Errors: ' +

ErrorMessages.Text);

finally

ErrorMessages.Free;

end;

function GetPagesViaObjectSearch(Doc: TPDFDocument): TPageArray;

var

i: Integer;

Obj: TPDFObject;

KidsArray: TPDFArray;

PageList: TList;

CandidateObjects: TList;

begin

CandidateObjects := TList.Create;

PageList := TList.Create;

try

// Find all objects that could be Pages objects

for i := 0 to Doc.Objects.Count - 1 do

begin

Obj := Doc.Objects[i];

if (Obj <> nil) and

(Obj.GetValue('/Type') = '/Pages') and

Obj.HasKey('/Kids') then

begin

CandidateObjects.Add(Obj);

end;

// Try each candidate Pages object

for i := 0 to CandidateObjects.Count - 1 do

begin

Obj := TPDFObject(CandidateObjects[i]);

KidsArray := Obj.GetArray('/Kids');

if (KidsArray <> nil) and (KidsArray.Count > 0) then

begin

// Validate that this Kids array contains actual pages

if ValidateKidsArray(KidsArray) then

begin

PageList.Clear;

TraversePagesTree(Obj, PageList);

if PageList.Count > 0 then

begin

// Found valid pages - convert to array and return

SetLength(Result, PageList.Count);

for j := 0 to PageList.Count - 1 do

Result[j] := TPDFObject(PageList[j]);

Exit;

end;

// No valid Pages object found

SetLength(Result, 0);

finally

CandidateObjects.Free;

PageList.Free;

end;

استراتيجيات تحسين الأداء.

عند معالجة ملفات PDF كبيرة أو التعامل مع معالجة كميات كبيرة من المستندات، يصبح الأداء اعتبارًا بالغ الأهمية.

التحميل المؤجل والتخزين المؤقت.

// Performance-optimized page access with caching

type

TPDFPageCache = class

private

FPages: array of TPDFPage;

FPageObjects: array of TPDFObject;

FCacheHits: Integer;

FCacheMisses: Integer;

FMaxCacheSize: Integer;

public

constructor Create(MaxCacheSize: Integer = 100);

destructor Destroy; override;

function GetPage(Index: Integer): TPDFPage;

procedure ClearCache;

procedure GetCacheStatistics(out Hits, Misses: Integer);

end;

function TPDFPageCache.GetPage(Index: Integer): TPDFPage;

begin

// Check if page is already cached

if (Index >= 0) and (Index < Length(FPages)) and

(FPages[Index] <> nil) then

begin

Inc(FCacheHits);

Result := FPages[Index];

Exit;

end;

Inc(FCacheMisses);

// Load page from object if not cached

if (Index >= 0) and (Index < Length(FPageObjects)) and

(FPageObjects[Index] <> nil) then

begin

Result := TPDFPage.CreateFromObject(FPageObjects[Index]);

// Cache the page if we have room

if Length(FPages) < FMaxCacheSize then begin if Index >= Length(FPages) then

SetLength(FPages, Index + 1);

FPages[Index] := Result;

end;

end

else

begin

Result := nil;

end;

معالجة البيانات المتدفقة للمستندات الكبيرة.

// Streaming approach for processing large PDF documents

procedure ProcessLargePDFInChunks(const FileName: string; ChunkSize: Integer = 50);

var

Doc: TPDFDocument;

TotalPages: Integer;

ChunkStart, ChunkEnd: Integer;

i: Integer;

begin

Doc := TPDFDocument.Create;

try

Doc.LoadFromFile(FileName);

TotalPages := Doc.GetPageCount;

LogMessage(Format('Processing %d pages in chunks of %d', [TotalPages, ChunkSize]));

ChunkStart := 0;

while ChunkStart < TotalPages do

begin

ChunkEnd := Min(ChunkStart + ChunkSize - 1, TotalPages - 1);

LogMessage(Format('Processing chunk: pages %d-%d', [ChunkStart + 1, ChunkEnd + 1]));

// Process this chunk of pages

for i := ChunkStart to ChunkEnd do

begin

ProcessSinglePage(Doc, i);

end;

// Optional: Force garbage collection between chunks

if (ChunkStart mod (ChunkSize * 4)) = 0 then

begin

ForceGarbageCollection;

end;

ChunkStart := ChunkEnd + 1;

end;

finally

Doc.Free;

end;

تحليل متقدم لهيكل ملفات PDF.

بالنسبة للمطورين الذين يعملون مع متطلبات معالجة ملفات PDF معقدة، فإن فهم العناصر الهيكلية المتقدمة أمر بالغ الأهمية.

وراثة الصفحات وإدارة الموارد.

يمكن لصفحات PDF أن ترث خصائص من كائنات "الصفحات" الأصلية، مما يخلق نظامًا هرميًا لإدارة الموارد:

% Example of page inheritance in PDF structure

2 0 obj

<< /Type /Pages

/Kids [3 0 R 4 0 R]

/Count 2

/MediaBox [0 0 612 792]

/Resources <<

/Font << /F1 10 0 R >>

/ProcSet [/PDF /Text]

>> >>

endobj

% Child page inherits MediaBox and Resources from parent

3 0 obj

<< /Type /Page

/Parent 2 0 R

/Contents 5 0 R >>

% This page inherits MediaBox [0 0 612 792] and Resources from parent

endobj

% Child page overrides inherited MediaBox

4 0 obj

<< /Type /Page

/Parent 2 0 R

/Contents 6 0 R

/MediaBox [0 0 792 612] >>

% This page overrides MediaBox but still inherits Resources

endobj

التعامل مع وراثة الصفحات في الكود.

// Proper handling of page inheritance

function GetEffectivePageProperties(PageObj: TPDFObject): TPDFPageProperties;

var

CurrentObj: TPDFObject;

MediaBox: TPDFArray;

Resources: TPDFObject;

begin

// Initialize result

Result := TPDFPageProperties.Create;

// Walk up the parent chain to collect inherited properties

CurrentObj := PageObj;

while CurrentObj <> nil do

begin

// Check for MediaBox at this level

if Result.MediaBox.IsEmpty then

begin

MediaBox := CurrentObj.GetArray('/MediaBox');

if MediaBox <> nil then

Result.MediaBox := MediaBox;

end;

// Check for Resources at this level

if Result.Resources = nil then

begin

Resources := CurrentObj.GetDictionary('/Resources');

if Resources <> nil then

Result.Resources := Resources;

end;

// Check for other inheritable properties

CheckForInheritableProperty(CurrentObj, '/Rotate', Result.Rotate);

CheckForInheritableProperty(CurrentObj, '/CropBox', Result.CropBox);

// Move to parent object

CurrentObj := CurrentObj.GetIndirectObject('/Parent');

// Prevent infinite loops in corrupted PDFs

if CurrentObj = PageObj then

break;

end;

// Validate that we found required properties

if Result.MediaBox.IsEmpty then

raise Exception.Create('No MediaBox found in page inheritance chain');

end;

استراتيجيات الاختبار لترتيب صفحات PDF.

الاختبار الشامل ضروري عند التعامل مع ترتيب صفحات PDF، نظرًا لتنوع هياكل المستندات المحتملة.

إنشاء مجموعات اختبار شاملة.

# Comprehensive PDF test case generation script

# Test Case 1: Sequential pages (baseline)

echo "Creating sequential page test..."

pdftk A=template.pdf cat A A A output test-sequential.pdf

# Test Case 2: Non-sequential object IDs

echo "Creating non-sequential object ID test..."

pdftk A=page3.pdf B=page1.pdf C=page2.pdf cat A B C output test-nonsequential.pdf

# Test Case 3: Hierarchical page tree

echo "Creating hierarchical page tree test..."

# This requires custom PDF generation tool

generate-hierarchical-pdf --depth 3 --pages-per-node 2 output test-hierarchical.pdf

# Test Case 4: Large document with mixed structures

echo "Creating large document test..."

pdftk A=large-doc.pdf cat 1-100 50-149 200-299 output test-large-mixed.pdf

# Test Case 5: Corrupted page tree

echo "Creating corrupted page tree test..."

# This requires custom corruption tool

corrupt-pdf-structure --target pages-tree test-sequential.pdf test-corrupted.pdf

# Test Case 6: Minimal single-page document

echo "Creating minimal single-page test..."

pdftk A=template.pdf cat 1 output test-single-page.pdf

إطار عمل التحقق الآلي.

100

// Comprehensive PDF page ordering validation framework

type

TPDFTestCase = record

FileName: string;

ExpectedPageCount: Integer;

ExpectedPageOrder: array of Integer;

Description: string;

end;

function RunPDFPageOrderingTests: Boolean;

var

TestCases: array of TPDFTestCase;

i: Integer;

PassCount, FailCount: Integer;

begin

// Define test cases

SetLength(TestCases, 6);

TestCases[0].FileName := 'test-sequential.pdf';

TestCases[0].ExpectedPageCount := 3;

TestCases[0].ExpectedPageOrder := [0, 1, 2];

TestCases[0].Description := 'Sequential page ordering';

TestCases[1].FileName := 'test-nonsequential.pdf';

TestCases[1].ExpectedPageCount := 3;

TestCases[1].ExpectedPageOrder := [2, 0, 1]; // Based on how pdftk reorders

TestCases[1].Description := 'Non-sequential object IDs';

// ... define other test cases ...

PassCount := 0;

FailCount := 0;

WriteLn('Running PDF page ordering tests...');

WriteLn('=' * 50);

for i := 0 to High(TestCases) do

begin

Write(Format('Test %d: %s... ', [i + 1, TestCases[i].Description]));

if ValidateTestCase(TestCases[i]) then

begin

WriteLn('PASS');

Inc(PassCount);

end

else

begin

WriteLn('FAIL');

Inc(FailCount);

end;

WriteLn('=' * 50);

WriteLn(Format('Results: %d passed, %d failed', [PassCount, FailCount]));

Result := FailCount = 0;

end;

function ValidateTestCase(const TestCase: TPDFTestCase): Boolean;

var

Doc: TPDFDocument;

ActualPages: TPageArray;

i: Integer;

begin

Result := False;

Doc := TPDFDocument.Create;

try

if not Doc.LoadFromFile(TestCase.FileName) then

begin

WriteLn(Format('Failed to load %s', [TestCase.FileName]));

Exit;

end;

ActualPages := GetPagesInCorrectOrder(Doc);

// Validate page count

if Length(ActualPages) <> TestCase.ExpectedPageCount then

begin

WriteLn(Format('Page count mismatch: expected %d, got %d',

[TestCase.ExpectedPageCount, Length(ActualPages)]));

Exit;

end;

// Validate page order (simplified - in real implementation,

// you'd compare actual page content or identifiers)

for i := 0 to High(ActualPages) do

begin

if not ValidatePageAtPosition(ActualPages[i], TestCase.ExpectedPageOrder[i]) then

begin

WriteLn(Format('Page order mismatch at position %d', [i]));

Exit;

end;

Result := True;

finally

Doc.Free;

end;

حماية كود معالجة PDF الخاص بك للمستقبل.

مع تطور معايير PDF وظهور حالات استخدام جديدة، من المهم كتابة تعليمات برمجية يمكنها التكيف مع المتطلبات المستقبلية.

التصميم من أجل قابلية التوسع.

// Extensible PDF page processing architecture

type

IPDFPageProcessor = interface

['{12345678-1234-1234-1234-123456789012}']

function ProcessPage(Page: TPDFPage; Context: TPDFProcessingContext): Boolean;

function GetProcessorName: string;

function GetSupportedPDFVersions: TStringArray;

end;

TPDFProcessingPipeline = class

private

FProcessors: TList;

FContext: TPDFProcessingContext;

public

constructor Create;

destructor Destroy; override;

procedure RegisterProcessor(Processor: IPDFPageProcessor);

procedure UnregisterProcessor(Processor: IPDFPageProcessor);

function ProcessDocument(Doc: TPDFDocument): Boolean;

end;

function TPDFProcessingPipeline.ProcessDocument(Doc: TPDFDocument): Boolean;

var

Pages: TPageArray;

i, j: Integer;

Page: TPDFPage;

Processor: IPDFPageProcessor;

Success: Boolean;

begin

Result := True;

// Get pages in correct order using our robust method

Pages := GetPagesInCorrectOrder(Doc);

// Process each page through all registered processors

for i := 0 to High(Pages) do

begin

Page := TPDFPage.CreateFromObject(Pages[i]);

try

FContext.CurrentPageIndex := i;

FContext.TotalPages := Length(Pages);

for j := 0 to FProcessors.Count - 1 do

begin

Processor := FProcessors[j];

Success := Processor.ProcessPage(Page, FContext);

if not Success then

begin

LogError(Format('Processor %s failed on page %d',

[Processor.GetProcessorName, i + 1]));

Result := False;

// Continue with other processors/pages or break based on policy

end;

finally

Page.Free;

end;

الاستثمار في فهم هيكل PDF بشكل صحيح يؤتي ثماره في تقليل عبء الدعم، وتحسين رضا المستخدم، وتسهيل الصيانة طوال عمر التطبيق. ترتيب صفحات PDF ليس مجرد تفصيل فني - بل هو جانب أساسي من جوانب سلامة المستند الذي يؤثر بشكل مباشر على تجربة المستخدم. أتقن هذه التعقيدات، وستبني تطبيقات PDF يمكن للمستخدمين الوثوق بها في التعامل مع أهم مستنداتهم.

المقالة التالية.