PDF Sayfa Sıralamasını Anlamak - PDF Sayfalarınız Neden Orada Değil

PDF Yapısının Arkasındaki Gizli Karmaşıklık

PDF belgeleri son kullanıcılara göründüklerinden çok daha karmaşıktır. İzleyiciler sayfaları mantıksal, sıralı bir sırada (1, 2, 3...) görürken, PDF dosyasının iç mimarisi çarpıcı biçimde farklı bir hikaye anlatır. Bu karmaşıklık, PDF işlemenin en yanlış anlaşılan yönlerinden biridir ve sayısız hataya, yanlış uygulamalara ve geliştiricilerin hayal kırıklığına uğramasına yol açar. Bu kapsamlı makale, PDF sayfa organizasyonunun karmaşık dünyasını araştırıyor, geliştiricilerin neden sık sık beklenmedik sayfa sıralama sorunlarıyla karşılaştıklarını açıklıyor ve güçlü PDF manipülasyonu için pratik çözümler sunuyor.

PDF Nesne Modeli: Sıralı Belgelerden Bir Paradigma Değişimi

PDF sayfa sıralama zorluklarını anlamak için öncelikle PDF'nin daha basit belge formatlarından ne kadar temelde farklı olduğunu takdir etmeliyiz. Düz metin dosyalarının, HTML belgelerinin ve hatta RTF gibi daha eski formatların aksine PDF, içerik organizasyonunun ve fiziksel depolamanın tamamen ayrıştırıldığı karmaşık bir nesne tabanlı mimari kullanır.

Bu mimari karar birkaç önemli nedenden dolayı alındı:

Esneklik: Nesnelere çoğaltılmadan birden fazla konumdan başvurulabilir
Verimlilik: Ortak kaynaklar (yazı tipleri, resimler, grafik durumları) sayfalar arasında paylaşılabilir
Artımlı güncellemeler: Belgeler, dosyanın tamamı yeniden yazılmadan değiştirilebilir
Rastgele erişim: İzleyiciler belgenin tamamını ayrıştırmadan herhangi bir sayfaya atlayabilir

Ancak bu esneklik, özellikle nesne depolama sırası ile mantıksal sayfa sırası arasındaki ilişkinin anlaşılması söz konusu olduğunda karmaşıklık pahasına gelir.

Nesne Referansları ve Görüntülenme Sırası: Somut Bir Örnek

Depolama ve görüntüleme arasındaki kopukluğu gösteren bu tipik PDF yapısını düşünün:

Urvanov Sözdizimi Vurgulayıcı v2.9.1

% PDF file structure example - storage order vs. display order

%PDF-1.4

1 0 obj

<< /Type /Catalog /Pages 2 0 R >>

endobj

2 0 obj

<< /Type /Pages /Kids [20 0 R 1 0 R 4 0 R] /Count 3 >>

endobj

% Object 4 appears third in file but represents page 3 in display

4 0 obj

<< /Type /Page

/Contents 5 0 R

/Parent 2 0 R

/MediaBox [0 0 612 792]

/Resources << /Font << /F1 6 0 R >> >> >>

endobj

% Object 20 appears last in file but represents page 1 in display

20 0 obj

<< /Type /Page

/Contents 21 0 R

/Parent 2 0 R

/MediaBox [0 0 612 792]

/Resources << /Font << /F1 6 0 R >> >> >>

endobj

[Format Süresi: 0,0018 saniye]

Bu örnekte, sayfa nesneleri 4 ve 20 numaralı nesneler olarak depolanır, ancak görüntüleme sırası Kids dizisi tarafından tanımlanır: [20, 1, 4]. Bu, aşağıdaki eşlemeyi oluşturur:

Sayfa 1 (görüntüleme sırası) = Nesne 20 (depolama sırası: son)
Sayfa 2 (görüntüleme sırası) = Nesne 1 (depolama sırası: ilk)
Sayfa 3 (görüntüleme sırası) = Nesne 4 (depolama sırası: üçüncü)

Bu bağlantının kesilmesi tesadüfi değildir; karmaşık belge manipülasyonu ve optimizasyonuna olanak sağlayan PDF'nin temel bir özelliğidir.

PDF Oluşturucular Neden Sırasız Nesne Sıraları Oluşturur?

PDF oluşturucuların neden sıralı olmayan nesne sıraları oluşturduğunu anlamak, geliştiricilerin uğraştıkları karmaşıklığı takdir etmelerine ve belge yapısı hakkında yanlış varsayımlarda bulunmaktan kaçınmalarına yardımcı olur.

PDF Oluşturma İş Akışları

Farklı PDF oluşturma iş akışları, farklı nesne sıralama modelleriyle sonuçlanır:

1. Sıralı Belge Oluşturma

Urvanov Sözdizimi Vurgulayıcı v2.9.1

% Typical output from simple PDF generators

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [3 0 R 4 0 R 5 0 R] /Count 3 >> endobj

3 0 obj << /Type /Page /Contents 6 0 R /Parent 2 0 R >> endobj

4 0 obj << /Type /Page /Contents 7 0 R /Parent 2 0 R >> endobj

5 0 obj << /Type /Page /Contents 8 0 R /Parent 2 0 R >> endobj

[Format Süresi: 0,0007 saniye]

2. Optimize Edilmiş Kaynak Paylaşımı

Urvanov Sözdizimi Vurgulayıcı v2.9.1

% PDF with shared resources created first

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [10 0 R 11 0 R 12 0 R] /Count 3 >> endobj

3 0 obj << /Type /Font /Subtype /Type1 /BaseFont /Helvetica >> endobj

4 0 obj << /Type /XObject /Subtype /Image /Width 100 /Height 100 >> endobj

% ... more shared resources ...

10 0 obj << /Type /Page /Resources << /Font << /F1 3 0 R >> >> >> endobj

11 0 obj << /Type /Page /Resources << /XObject << /Im1 4 0 R >> >> >> endobj

12 0 obj << /Type /Page /Resources << /Font << /F1 3 0 R >> >> >> endobj

[Format Süresi: 0,0009 saniye]

3. Artımlı Belge Birleştirmesi

Urvanov Sözdizimi Vurgulayıcı v2.9.1

% Document created by combining existing PDFs

1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj

2 0 obj << /Type /Pages /Kids [100 0 R 25 0 R 75 0 R] /Count 3 >> endobj

% Objects from first source document

25 0 obj << /Type /Page /Contents 26 0 R /Parent 2 0 R >> endobj

% Objects from second source document

75 0 obj << /Type /Page /Contents 76 0 R /Parent 2 0 R >> endobj

% Objects from third source document

100 0 obj << /Type /Page /Contents 101 0 R /Parent 2 0 R >> endobj

[Format Süresi: 0,0008 saniye]

Yaygın Geliştirici Hataları ve Sonuçları

PDF yapısının karmaşıklığı, uygulama güvenilirliği ve kullanıcı deneyimi açısından ciddi sonuçlara yol açabilecek birçok yaygın hataya yol açar.

Hata 1: Nesne Kimliği Sırasının Görüntüleme Sırasına Eşit Olduğunu Varsaymak

Bu belki de PDF işlemeye yeni başlayan geliştiricilerin yaptığı en yaygın hatadır:

Urvanov Sözdizimi Vurgulayıcı v2.9.1

// WRONG: Processing pages by object ID order

function GetPagesInWrongOrder(Doc: TPDFDocument): TPageList;

var

i: Integer;

Obj: TPDFObject;

begin

Result := TPageList.Create;

// This approach processes pages in storage order, not display order

for i := 0 to Doc.Objects.Count - 1 do

begin

Obj := Doc.Objects[i];

if (Obj <> nil) and (Obj.GetValue('/Type') = '/Page') then

begin

Result.Add(Obj); // Wrong order!

end;

// Result will be in object ID order: [1, 4, 20]

// But display order should be: [20, 1, 4]

end;

[Format Süresi: 0,0005 saniye]

Bu hatanın sonuçları şunları içerir:

Çıktı belgelerinde sayfalar yanlış sırada görünüyor
Sayfa numaralandırması tutarsız hale geliyor
Kullanıcının kafa karışıklığı ve destek talepleri
Belge işleme hatlarında potansiyel veri bozulması

Hata 2: Gözlemlere Dayalı Sabit Kodlanmış Sayfa Eşlemesi

Geliştiriciler sayfa sıralama sorunlarıyla karşılaştıklarında bazen gözlemlenen kalıplara dayalı sabit kodlu düzeltmeler uygularlar:

Urvanov Sözdizimi Vurgulayıcı v2.9.1

// WRONG: Hard-coded page reordering based on heuristics

function ApplyPageReorderingHeuristics(Pages: TPageArray): TPageArray;

var

i: Integer;

begin

SetLength(Result, Length(Pages));

// Dangerous heuristic based on limited observations

if Length(Pages) = 3 then

begin

// "Fix" for specific 3-page documents observed during testing

Result[0] := Pages[1]; // Put second page first

Result[1] := Pages[2]; // Put third page second

Result[2] := Pages[0]; // Put first page last

end

else if Length(Pages) > 3 then

begin

// Generic "fix" that swaps first and last pages

Result[0] := Pages[Length(Pages) - 1];

Result[Length(Pages) - 1] := Pages[0];

// Keep middle pages in original order

for i := 1 to Length(Pages) - 2 do

Result[i] := Pages[i];

end

else

begin

// For other cases, just copy as-is

for i := 0 to High(Pages) do

Result[i] := Pages[i];

end;

[Format Süresi: 0,0007 saniye]

Bu yaklaşım temelde hatalıdır çünkü:

Yalnızca geliştirme sırasında gözlemlenen belirli PDF'ler için çalışır
Farklı yapılara sahip PDF'lerde feci şekilde başarısız oluyor
Kullanıcıların anlayamayacağı öngörülemeyen davranışlar yaratıyor
Daha fazla özel durum eklendikçe teknik borç birikir

Hata 3: Hiyerarşik Sayfa Ağaçlarını Yoksaymak

Birçok geliştirici, PDF sayfa ağaçlarının her zaman düz diziler olduğunu varsayar, ancak PDF spesifikasyonu hiyerarşik yapılara izin verir:

Urvanov Sözdizimi Vurgulayıcı v2.9.1

// WRONG: Assuming flat page tree structure

function GetPagesFromFlatTree(PagesObj: TPDFObject): TPageArray;

var

KidsArray: TPDFArray;

i: Integer;

begin

KidsArray := PagesObj.GetArray('/Kids');

if KidsArray = nil then Exit;

SetLength(Result, KidsArray.Count);

for i := 0 to KidsArray.Count - 1 do

begin

// This assumes all Kids entries are Page objects

// But they might be intermediate Pages objects!

Result[i] := KidsArray.GetIndirectObject(i);

end;

[Format Süresi: 0,0004 saniye]

Doğru Yaklaşım: Sayfa Ağacı Yapısını Takip Etmek

PDF sayfa sıralamasını gerçekleştirmenin doğru yolu, PDF spesifikasyonunu tam olarak takip eden eksiksiz bir Sayfa ağacı geçişi uygulamaktır.

Sayfa Ağacı Hiyerarşisini Anlamak

PDF sayfa ağaçları, kendi Kids dizilerini içeren ara Sayfalar nesneleri ile hiyerarşik olabilir:

Urvanov Sözdizimi Vurgulayıcı v2.9.1

% Hierarchical page tree example

1 0 obj

<< /Type /Catalog /Pages 2 0 R >>

endobj

% Root Pages object

2 0 obj

<< /Type /Pages

/Kids [3 0 R 8 0 R 15 0 R]

/Count 7 >>

endobj

% First intermediate Pages object (contains 3 pages)

3 0 obj

<< /Type /Pages

/Kids [4 0 R 5 0 R 6 0 R]

/Count 3

/Parent 2 0 R >>

endobj

% Second intermediate Pages object (contains 2 pages)

8 0 obj

<< /Type /Pages

/Kids [9 0 R 10 0 R]

/Count 2

/Parent 2 0 R >>

endobj

% Third intermediate Pages object (contains 2 pages)

15 0 obj

<< /Type /Pages

/Kids [16 0 R 17 0 R]

/Count 2

/Parent 2 0 R >>

endobj

% Actual page objects

4 0 obj << /Type /Page /Contents 40 0 R /Parent 3 0 R >> endobj

5 0 obj << /Type /Page /Contents 41 0 R /Parent 3 0 R >> endobj

% ... and so on

[Format Süresi: 0,0007 saniye]

Özyinelemeli Sayfa Ağacı Geçişini Uygulama

Urvanov Sözdizimi Vurgulayıcı v2.9.1

// CORRECT: Recursive page tree traversal

function GetPagesInCorrectOrder(Doc: TPDFDocument): TPageArray;

var

CatalogObj, RootPagesObj: TPDFObject;

PageList: TList;

begin

PageList := TList.Create;

try

// Step 1: Find the document catalog

CatalogObj := Doc.FindObject('/Type', '/Catalog');

if CatalogObj = nil then

raise Exception.Create('Document catalog not found');

// Step 2: Get the root Pages object

RootPagesObj := CatalogObj.GetIndirectObject('/Pages');

if RootPagesObj = nil then

raise Exception.Create('Root Pages object not found');

// Step 3: Recursively traverse the page tree

TraversePagesTree(RootPagesObj, PageList);

// Step 4: Convert list to array

SetLength(Result, PageList.Count);

for i := 0 to PageList.Count - 1 do

Result[i] := TPDFObject(PageList[i]);

finally

PageList.Free;

end;

procedure TraversePagesTree(PagesObj: TPDFObject; PageList: TList);

var

KidsArray: TPDFArray;

i: Integer;

ChildObj: TPDFObject;

ChildType: string;

begin

if PagesObj = nil then Exit;

// Get the Kids array from this Pages object

KidsArray := PagesObj.GetArray('/Kids');

if KidsArray = nil then Exit;

// Process each child in the Kids array

for i := 0 to KidsArray.Count - 1 do

begin

ChildObj := KidsArray.GetIndirectObject(i);

if ChildObj = nil then Continue;

ChildType := ChildObj.GetValue('/Type');

if ChildType = '/Page' then

begin

// This is a leaf page object - add it to our list

PageList.Add(ChildObj);

end

else if ChildType = '/Pages' then

begin

// This is an intermediate Pages object - recurse into it

TraversePagesTree(ChildObj, PageList);

end

else

begin

// Unexpected object type in Kids array

raise Exception.CreateFmt('Unexpected object type in Kids array: %s', [ChildType]);

end;

[Format Süresi: 0,0008 saniye]

Gerçek Dünyadaki PDF Varyasyonlarını ve Uç Durumları İşleme

Gerçek dünyadaki PDF dosyaları genellikle spesifikasyonda açıklanan ideal yapıdan sapar. Sağlam bir PDF işleme kütüphanesi bu varyasyonları incelikle ele almalıdır.

Yaygın Yapısal Anomaliler

1. Eksik veya Bozuk Katalog

Urvanov Sözdizimi Vurgulayıcı v2.9.1

% PDF with missing catalog reference

%PDF-1.4

% Object 1 should be catalog but is missing or corrupted

2 0 obj

<< /Type /Pages /Kids [3 0 R 4 0 R] /Count 2 >>

endobj

[Format Süresi: 0,0002 saniye]

2. Dairesel Referanslar

Urvanov Sözdizimi Vurgulayıcı v2.9.1

% PDF with circular page tree references (corrupted)

2 0 obj

<< /Type /Pages /Kids [3 0 R] /Count 1 /Parent 3 0 R >>

endobj

3 0 obj

<< /Type /Pages /Kids [2 0 R] /Count 1 /Parent 2 0 R >>

endobj

[Format Süresi: 0,0002 saniye]

3. Tutarsız Sayım Değerleri

Urvanov Sözdizimi Vurgulayıcı v2.9.1

% PDF with incorrect Count value

2 0 obj

<< /Type /Pages /Kids [3 0 R 4 0 R 5 0 R] /Count 5 >>

% Count says 5 but Kids array has only 3 elements

endobj

[Format Süresi: 0,0002 saniye]

Sağlam Hata İşleme Uygulaması

Urvanov Sözdizimi Vurgulayıcı v2.9.1

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

// Robust page tree traversal with comprehensive error handling

function GetPagesWithFallbacks(Doc: TPDFDocument): TPageArray;

var

AttemptCount: Integer;

ErrorMessages: TStringList;

begin

ErrorMessages := TStringList.Create;

try

AttemptCount := 0;

// Attempt 1: Standard PDF specification approach

Inc(AttemptCount);

try

Result := GetPagesViaStandardTraversal(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with standard traversal (attempt %d)', [AttemptCount]));

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// Attempt 2: Search for Pages objects and try each one

Inc(AttemptCount);

try

Result := GetPagesViaObjectSearch(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with object search (attempt %d)', [AttemptCount]));

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// Attempt 3: Brute force search for Page objects

Inc(AttemptCount);

try

Result := GetPagesViaBruteForce(Doc);

if Length(Result) > 0 then

begin

LogMessage(Format('Success with brute force search (attempt %d)', [AttemptCount]));

LogMessage('Warning: Document structure is non-standard');

Exit;

end;

except

on E: Exception do

ErrorMessages.Add(Format('Attempt %d failed: %s', [AttemptCount, E.Message]));

end;

// All attempts failed

raise Exception.Create('Failed to extract pages from PDF. Errors: ' +

ErrorMessages.Text);

finally

ErrorMessages.Free;

end;

function GetPagesViaObjectSearch(Doc: TPDFDocument): TPageArray;

var

i: Integer;

Obj: TPDFObject;

KidsArray: TPDFArray;

PageList: TList;

CandidateObjects: TList;

begin

CandidateObjects := TList.Create;

PageList := TList.Create;

try

// Find all objects that could be Pages objects

for i := 0 to Doc.Objects.Count - 1 do

begin

Obj := Doc.Objects[i];

if (Obj <> nil) and

(Obj.GetValue('/Type') = '/Pages') and

Obj.HasKey('/Kids') then

begin

CandidateObjects.Add(Obj);

end;

// Try each candidate Pages object

for i := 0 to CandidateObjects.Count - 1 do

begin

Obj := TPDFObject(CandidateObjects[i]);

KidsArray := Obj.GetArray('/Kids');

if (KidsArray <> nil) and (KidsArray.Count > 0) then

begin

// Validate that this Kids array contains actual pages

if ValidateKidsArray(KidsArray) then

begin

PageList.Clear;

TraversePagesTree(Obj, PageList);

if PageList.Count > 0 then

begin

// Found valid pages - convert to array and return

SetLength(Result, PageList.Count);

for j := 0 to PageList.Count - 1 do

Result[j] := TPDFObject(PageList[j]);

Exit;

end;

// No valid Pages object found

SetLength(Result, 0);

finally

CandidateObjects.Free;

PageList.Free;

end;

[Format Süresi: 0,0015 saniye]

Performans Optimizasyon Stratejileri

Büyük PDF dosyalarını işlerken veya yüksek hacimli belge işlemeyi gerçekleştirirken performans kritik bir husus haline gelir.

Yavaş Yükleme ve Önbelleğe Alma

Urvanov Sözdizimi Vurgulayıcı v2.9.1

// Performance-optimized page access with caching

type

TPDFPageCache = class

private

FPages: array of TPDFPage;

FPageObjects: array of TPDFObject;

FCacheHits: Integer;

FCacheMisses: Integer;

FMaxCacheSize: Integer;

public

constructor Create(MaxCacheSize: Integer = 100);

destructor Destroy; override;

function GetPage(Index: Integer): TPDFPage;

procedure ClearCache;

procedure GetCacheStatistics(out Hits, Misses: Integer);

end;

function TPDFPageCache.GetPage(Index: Integer): TPDFPage;

begin

// Check if page is already cached

if (Index >= 0) and (Index < Length(FPages)) and

(FPages[Index] <> nil) then

begin

Inc(FCacheHits);

Result := FPages[Index];

Exit;

end;

Inc(FCacheMisses);

// Load page from object if not cached

if (Index >= 0) and (Index < Length(FPageObjects)) and

(FPageObjects[Index] <> nil) then

begin

Result := TPDFPage.CreateFromObject(FPageObjects[Index]);

// Cache the page if we have room

if Length(FPages) < FMaxCacheSize then begin if Index >= Length(FPages) then

SetLength(FPages, Index + 1);

FPages[Index] := Result;

end;

end

else

begin

Result := nil;

end;

[Format Süresi: 0,0012 saniye]

Büyük Belgeler için Akış İşleme

Urvanov Sözdizimi Vurgulayıcı v2.9.1

// Streaming approach for processing large PDF documents

procedure ProcessLargePDFInChunks(const FileName: string; ChunkSize: Integer = 50);

var

Doc: TPDFDocument;

TotalPages: Integer;

ChunkStart, ChunkEnd: Integer;

i: Integer;

begin

Doc := TPDFDocument.Create;

try

Doc.LoadFromFile(FileName);

TotalPages := Doc.GetPageCount;

LogMessage(Format('Processing %d pages in chunks of %d', [TotalPages, ChunkSize]));

ChunkStart := 0;

while ChunkStart < TotalPages do

begin

ChunkEnd := Min(ChunkStart + ChunkSize - 1, TotalPages - 1);

LogMessage(Format('Processing chunk: pages %d-%d', [ChunkStart + 1, ChunkEnd + 1]));

// Process this chunk of pages

for i := ChunkStart to ChunkEnd do

begin

ProcessSinglePage(Doc, i);

end;

// Optional: Force garbage collection between chunks

if (ChunkStart mod (ChunkSize * 4)) = 0 then

begin

ForceGarbageCollection;

end;

ChunkStart := ChunkEnd + 1;

end;

finally

Doc.Free;

end;

[Format Süresi: 0,0006 saniye]

Gelişmiş PDF Yapı Analizi

Karmaşık PDF işleme gereksinimleriyle çalışan geliştiriciler için gelişmiş yapısal öğeleri anlamak çok önemlidir.

Sayfa Devralma ve Kaynak Yönetimi

PDF sayfaları, hiyerarşik bir kaynak yönetimi sistemi oluşturarak üst Pages nesnelerinden özellikleri devralabilir:

Urvanov Sözdizimi Vurgulayıcı v2.9.1

% Example of page inheritance in PDF structure

2 0 obj

<< /Type /Pages

/Kids [3 0 R 4 0 R]

/Count 2

/MediaBox [0 0 612 792]

/Resources <<

/Font << /F1 10 0 R >>

/ProcSet [/PDF /Text]

>> >>

endobj

% Child page inherits MediaBox and Resources from parent

3 0 obj

<< /Type /Page

/Parent 2 0 R

/Contents 5 0 R >>

% This page inherits MediaBox [0 0 612 792] and Resources from parent

endobj

% Child page overrides inherited MediaBox

4 0 obj

<< /Type /Page

/Parent 2 0 R

/Contents 6 0 R

/MediaBox [0 0 792 612] >>

% This page overrides MediaBox but still inherits Resources

endobj

[Format Süresi: 0,0005 saniye]

Kodda Sayfa Devralmayı İşleme

Urvanov Sözdizimi Vurgulayıcı v2.9.1

// Proper handling of page inheritance

function GetEffectivePageProperties(PageObj: TPDFObject): TPDFPageProperties;

var

CurrentObj: TPDFObject;

MediaBox: TPDFArray;

Resources: TPDFObject;

begin

// Initialize result

Result := TPDFPageProperties.Create;

// Walk up the parent chain to collect inherited properties

CurrentObj := PageObj;

while CurrentObj <> nil do

begin

// Check for MediaBox at this level

if Result.MediaBox.IsEmpty then

begin

MediaBox := CurrentObj.GetArray('/MediaBox');

if MediaBox <> nil then

Result.MediaBox := MediaBox;

end;

// Check for Resources at this level

if Result.Resources = nil then

begin

Resources := CurrentObj.GetDictionary('/Resources');

if Resources <> nil then

Result.Resources := Resources;

end;

// Check for other inheritable properties

CheckForInheritableProperty(CurrentObj, '/Rotate', Result.Rotate);

CheckForInheritableProperty(CurrentObj, '/CropBox', Result.CropBox);

// Move to parent object

CurrentObj := CurrentObj.GetIndirectObject('/Parent');

// Prevent infinite loops in corrupted PDFs

if CurrentObj = PageObj then

break;

end;

// Validate that we found required properties

if Result.MediaBox.IsEmpty then

raise Exception.Create('No MediaBox found in page inheritance chain');

end;

[Format Süresi: 0,0005 saniye]

PDF Sayfa Sıralaması için Test Stratejileri

Olası belge yapılarının çeşitliliği göz önüne alındığında, PDF sayfa sıralamasıyla uğraşırken kapsamlı test yapılması önemlidir.

Kapsamlı Test Paketleri Oluşturma

Urvanov Sözdizimi Vurgulayıcı v2.9.1

# Comprehensive PDF test case generation script

# Test Case 1: Sequential pages (baseline)

echo "Creating sequential page test..."

pdftk A=template.pdf cat A A A output test-sequential.pdf

# Test Case 2: Non-sequential object IDs

echo "Creating non-sequential object ID test..."

pdftk A=page3.pdf B=page1.pdf C=page2.pdf cat A B C output test-nonsequential.pdf

# Test Case 3: Hierarchical page tree

echo "Creating hierarchical page tree test..."

# This requires custom PDF generation tool

generate-hierarchical-pdf --depth 3 --pages-per-node 2 output test-hierarchical.pdf

# Test Case 4: Large document with mixed structures

echo "Creating large document test..."

pdftk A=large-doc.pdf cat 1-100 50-149 200-299 output test-large-mixed.pdf

# Test Case 5: Corrupted page tree

echo "Creating corrupted page tree test..."

# This requires custom corruption tool

corrupt-pdf-structure --target pages-tree test-sequential.pdf test-corrupted.pdf

# Test Case 6: Minimal single-page document

echo "Creating minimal single-page test..."

pdftk A=template.pdf cat 1 output test-single-page.pdf

[Format Süresi: 0,0003 saniye]

Otomatik Doğrulama Çerçevesi

Urvanov Sözdizimi Vurgulayıcı v2.9.1

100

// Comprehensive PDF page ordering validation framework

type

TPDFTestCase = record

FileName: string;

ExpectedPageCount: Integer;

ExpectedPageOrder: array of Integer;

Description: string;

end;

function RunPDFPageOrderingTests: Boolean;

var

TestCases: array of TPDFTestCase;

i: Integer;

PassCount, FailCount: Integer;

begin

// Define test cases

SetLength(TestCases, 6);

TestCases[0].FileName := 'test-sequential.pdf';

TestCases[0].ExpectedPageCount := 3;

TestCases[0].ExpectedPageOrder := [0, 1, 2];

TestCases[0].Description := 'Sequential page ordering';

TestCases[1].FileName := 'test-nonsequential.pdf';

TestCases[1].ExpectedPageCount := 3;

TestCases[1].ExpectedPageOrder := [2, 0, 1]; // Based on how pdftk reorders

TestCases[1].Description := 'Non-sequential object IDs';

// ... define other test cases ...

PassCount := 0;

FailCount := 0;

WriteLn('Running PDF page ordering tests...');

WriteLn('=' * 50);

for i := 0 to High(TestCases) do

begin

Write(Format('Test %d: %s... ', [i + 1, TestCases[i].Description]));

if ValidateTestCase(TestCases[i]) then

begin

WriteLn('PASS');

Inc(PassCount);

end

else

begin

WriteLn('FAIL');

Inc(FailCount);

end;

WriteLn('=' * 50);

WriteLn(Format('Results: %d passed, %d failed', [PassCount, FailCount]));

Result := FailCount = 0;

end;

function ValidateTestCase(const TestCase: TPDFTestCase): Boolean;

var

Doc: TPDFDocument;

ActualPages: TPageArray;

i: Integer;

begin

Result := False;

Doc := TPDFDocument.Create;

try

if not Doc.LoadFromFile(TestCase.FileName) then

begin

WriteLn(Format('Failed to load %s', [TestCase.FileName]));

Exit;

end;

ActualPages := GetPagesInCorrectOrder(Doc);

// Validate page count

if Length(ActualPages) <> TestCase.ExpectedPageCount then

begin

WriteLn(Format('Page count mismatch: expected %d, got %d',

[TestCase.ExpectedPageCount, Length(ActualPages)]));

Exit;

end;

// Validate page order (simplified - in real implementation,

// you'd compare actual page content or identifiers)

for i := 0 to High(ActualPages) do

begin

if not ValidatePageAtPosition(ActualPages[i], TestCase.ExpectedPageOrder[i]) then

begin

WriteLn(Format('Page order mismatch at position %d', [i]));

Exit;

end;

Result := True;

finally

Doc.Free;

end;

[Format Süresi: 0,0011 saniye]

PDF İşleme Kodunuzu Geleceğe Hazırlama

PDF standartları geliştikçe ve yeni kullanım durumları ortaya çıktıkça, gelecekteki gereksinimlere uyum sağlayabilecek kod yazmak önemlidir.

Genişletilebilirlik için Tasarım

Urvanov Sözdizimi Vurgulayıcı v2.9.1

// Extensible PDF page processing architecture

type

IPDFPageProcessor = interface

['{12345678-1234-1234-1234-123456789012}']

function ProcessPage(Page: TPDFPage; Context: TPDFProcessingContext): Boolean;

function GetProcessorName: string;

function GetSupportedPDFVersions: TStringArray;

end;

TPDFProcessingPipeline = class

private

FProcessors: TList;

FContext: TPDFProcessingContext;

public

constructor Create;

destructor Destroy; override;

procedure RegisterProcessor(Processor: IPDFPageProcessor);

procedure UnregisterProcessor(Processor: IPDFPageProcessor);

function ProcessDocument(Doc: TPDFDocument): Boolean;

end;

function TPDFProcessingPipeline.ProcessDocument(Doc: TPDFDocument): Boolean;

var

Pages: TPageArray;

i, j: Integer;

Page: TPDFPage;

Processor: IPDFPageProcessor;

Success: Boolean;

begin

Result := True;

// Get pages in correct order using our robust method

Pages := GetPagesInCorrectOrder(Doc);

// Process each page through all registered processors

for i := 0 to High(Pages) do

begin

Page := TPDFPage.CreateFromObject(Pages[i]);

try

FContext.CurrentPageIndex := i;

FContext.TotalPages := Length(Pages);

for j := 0 to FProcessors.Count - 1 do

begin

Processor := FProcessors[j];

Success := Processor.ProcessPage(Page, FContext);

if not Success then

begin

LogError(Format('Processor %s failed on page %d',

[Processor.GetProcessorName, i + 1]));

Result := False;

// Continue with other processors/pages or break based on policy

end;

finally

Page.Free;

end;

[Format Süresi: 0,0013 saniye]

Doğru PDF yapısı anlayışına yapılan yatırım, uygulamanın kullanım ömrü boyunca destek yükünün azalması, kullanıcı memnuniyetinin artması ve bakımın kolaylaşması açısından meyvelerini verir. PDF sayfa sıralaması yalnızca teknik bir ayrıntı değildir; belge bütünlüğünün kullanıcı deneyimini doğrudan etkileyen temel bir yönüdür. Bu karmaşıklığın üstesinden gelerek kullanıcıların en önemli belgeleri konusunda güvenebilecekleri PDF uygulamaları oluşturacaksınız.

Sonraki Makale