Master PDF Structure: XML Metadata, Bookmarks & Annotations

Understanding PDF XML Metadata and Bookmarks: A Technical Guide

📖 15 min read PDF Development 🔧 Intermediate-Advanced

This comprehensive guide explores four essential topics related not to the visual appearance of a PDF document, but to the ancillary data which may be included for interactive, onscreen use of documents, and the metadata used to carry extra information with a document for use by programs in a PDF workflow.

Key Topics Covered

📍 Destinations

Precise location markers that define specific positions within PDF documents. These enable accurate navigation for bookmarks and hyperlinks, while document outlines provide hierarchical table-of-contents functionality.

📄 XML Metadata

Structured XML streams that provide comprehensive document metadata using standardized XMP formats, extending beyond basic document properties to include rich descriptive information.

📎 File Attachments

Complete file embedding capability that packages external resources directly within PDF documents, creating self-contained document bundles similar to email attachments.

📝 Annotations

Interactive overlay elements that add text, graphics, and clickable functionality to PDF pages without modifying the underlying content. Includes hyperlinks for seamless document navigation and various markup tools for enhanced reader interaction.

Bookmarks and Destinations

Document navigation relies on hierarchical bookmark structures, technically known as the document outline. This tree-organized system presents clickable entries—commonly chapter titles, section headers, and subsection names—that enable readers to jump quickly to specific parts of the document. Each bookmark entry combines display text with destination information that specifies exactly where the link should navigate.

Understanding Destinations

PDF destinations serve as precise location markers within a document, specifying which page to display, where on that page to position the view, and what zoom level to apply. You can create destinations in two ways: define them directly inline (which we’ll use in our examples for clarity) or reference them by name through a document-wide naming system. Most PDF readers present bookmarks in a navigation panel alongside the main document content.

Each destination uses an array structure where the specific elements vary based on the viewing behavior you want to achieve. Here are the main destination patterns available:

Destination Types Table

Note: “page” represents an indirect reference to a page object. By default, these destinations work with the page’s crop box boundaries, falling back to the media box when no crop box is defined.

Array	Description
[page /Fit]	Scales the page to fit completely within the viewer window, adjusting both width and height proportionally.
[page /FitH top]	Positions the specified top coordinate at the window’s top edge while scaling horizontally to fit the full page width.
[page /FitV left]	Aligns the specified left coordinate with the window’s left edge while scaling vertically to fit the full page height.
[page /XYZ left top zoom]	Positions the coordinates (left, top) at the window’s upper-left corner and applies the specified zoom factor. Null values preserve current settings for those parameters.
[page /FitR left bottom right top]	Zooms and positions the view to display the rectangular area defined by the left, bottom, right, and top coordinates.
[page /FitB]	Similar to /Fit, but scales based on the actual content boundaries instead of the defined crop box area.
[page /FitBH top]	Functions like /FitH but uses the content bounding box instead of the crop box for horizontal scaling calculations.
[page /FitBV left]	Operates like /FitV but calculates vertical scaling based on the content bounding box rather than the crop box boundaries.

Document Outline Structure

Document outlines create a hierarchical navigation structure that functions as an interactive table of contents for PDF viewers. This tree-like organization helps users quickly navigate through complex documents by providing a clear structural overview. The system relies on two fundamental object types:

Outline dictionary – The root of the outline hierarchy
Outline item dictionaries – Individual entries in the outline

Outline Dictionary Structure Table

Key	Value Type	Value
/Type	name	If present, must be /Outlines.
/First	indirect reference to dictionary	References the initial top-level outline entry in the document hierarchy. This field is mandatory when outline entries exist.
/Last	indirect reference to dictionary	References the final top-level outline entry in the document hierarchy. This field is mandatory when outline entries exist.
/Count	integer	Specifies how many outline entries are currently expanded across the entire outline tree. Can be omitted when no entries are in an open state.

Outline Item Implementation

Every outline item consists of a dictionary that specifies its display title, target destination, and relationships to other items in the hierarchy.

Let’s examine how a simple document outline is structured in PDF syntax:

8 0 obj
<</Type/Outlines/Count 4/First 9 0 R/Last 9 0 R>>
endobj
9 0 obj
<</Title(Chapter 1: Experiment A)/Count 3/Parent 8 0 R/First 12 0 R/Last 18 0 R>>
endobj
12 0 obj
<</Title(1: Introduction)/Count 0/Parent 9 0 R/Next 15 0 R>>
endobj
15 0 obj
<</Title(2: Methodology)/Count 0/Parent 9 0 R/Prev 12 0 R/Next 18 0 R>>
endobj
18 0 obj
<</Title(3: Result verification)/Count 0/Parent 9 0 R/Prev 15 0 R/>>
endobj

Outline Item Dictionary Structure Table

* denotes a required entry

Key	Value Type	Value
/Title*	text string	Text to be displayed for this entry.
/Parent*	indirect reference to dictionary	References this item’s parent within the outline hierarchy, which can be either another outline item or the root outline dictionary.
/Prev	indirect reference to dictionary	References the preceding sibling item at the same hierarchical level, when applicable.
/Next	indirect reference to dictionary	References the following sibling item at the same hierarchical level, when applicable.
/First	indirect reference to dictionary	References the initial child item under this entry, when child items exist.
/Last	indirect reference to dictionary	References the final child item under this entry, when child items exist.
/Count	integer	When the entry is expanded, indicates the count of visible descendant entries. When collapsed, stores a negative value representing the total number of hidden descendants that would become visible upon expansion.
/Dest	name, string or array	The destination. Arrays are destinations, names are references to entries in the /Dests entry in the document catalog, strings are references to entries in the /Dests entry in the document’s name dictionary.

XML Metadata

Modern PDF documents can incorporate sophisticated XML-based metadata streams that offer far more detailed and structured information than traditional document properties. This advanced metadata system leverages Adobe’s XMP (Extensible Metadata Platform) specification to provide standardized, machine-readable document descriptions that enhance searchability, organization, and automated processing capabilities.

XMP Metadata Structure

XMP metadata is packaged as an XML document that utilizes RDF (Resource Description Framework) syntax to organize and describe document properties in a standardized format. This metadata content is embedded within a dedicated stream object that includes proper type identification for PDF processors:

6 0 obj
<</Length 1235/Type/Metadata/Subtype/XML>>stream
<?xpacket begin="锘xBF" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26"
><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
><rdf:Description rdf:about="" xmlns:pdf="http://ns.adobe.com/pdf/1.3/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xmp="http://ns.adobe.com/xap/1.0/"
><pdf:Producer>losLab PDF Library</pdf:Producer>
<dc:creator>losLab.com</dc:creator>
<dc:title>Delphi PDF SDKs</dc:title>
<xmp:CreateDate>2025-06-29T10:46:27+08:00</xmp:CreateDate>
<xmp:ModifyDate>2025-06-29T10:58:57+08:00</xmp:ModifyDate>
<xmp:MetadataDate>2025-06-29T10:46:27+08:00</xmp:MetadataDate>
<dc:description>Delphi Development Library for PDF creation & editing</dc:description>
<xmp:CreatorTool>HotPDF Component</xmp:CreatorTool>
<dc:subject>PDF Developer Library for RAD Studio></dc:subject>
<pdf:Keywords>Delphi, PDF SDK, PDF Component</pdf:Keywords>
</rdf:Description>Robust Delphi PDF development library</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
endstream
endobj

Standard Metadata Schemas

The XMP framework organizes metadata through well-established schema namespaces, each serving specific information categories:

📋 Dublin Core (dc:)

Basic bibliographic information

dc:title – Document title
dc:creator – Document author(s)
dc:subject – Document subject/keywords
dc:description – Document description
dc:format – MIME type

🏷️ XMP Basic (xmp:)

Core XMP properties

xmp:CreateDate – Creation date
xmp:ModifyDate – Modification date
xmp:CreatorTool – Creating application
xmp:MetadataDate – Metadata modification date

📄 PDF Schema (pdf:)

PDF-specific properties

pdf:Producer – PDF producer
pdf:Keywords – Document keywords
pdf:PDFVersion – PDF version

Integration with Document Catalog

The XML metadata stream is referenced from the document catalog:

1 0 obj
< < Type Catalog Pages 2 0 R Metadata 10 0 R Outlines 1 0 R>>
endobj

🎯 Best Practices for XML Metadata

Always include both document information dictionary and XMP metadata for maximum compatibility
Ensure metadata values are consistent between both locations
Use proper XML encoding (UTF-8) for international characters
Include creation and modification dates in ISO 8601 format
Validate XML structure to prevent parsing errors

File Attachments

PDF file attachments provide a convenient method for embedding external files directly within a PDF document, creating self-contained packages that include all necessary resources. These attachments can be associated with the entire document or linked to specific pages, depending on your needs. Most modern PDF viewers present these embedded files in a dedicated attachments panel, making it easy for users to access, view, or save the included content. This feature is particularly valuable for creating comprehensive document packages, such as presentations that include supplementary resources or reports with accompanying data files.

Embedded File Structure

At its core, an embedded file consists of a stream object that contains the actual file data, along with a stream dictionary entry specifying /Type /EmbeddedFile. This straightforward approach allows any type of file to be stored within a PDF. Here’s how the basic embedded file structure looks:

8 0 obj
< < Type EmbeddedFile Length 35>>
stream
This is a text file attachment...
endstream
endobj

PDF supports two distinct approaches for referencing embedded files, each serving different use cases: document-level attachments that are globally accessible, and page-level attachments that appear as interactive elements on specific pages.

Document-Level Attachments

For document-wide attachments, you’ll need to add an /EmbeddedFiles entry to the name dictionary, which is accessed through the /Names entry in the document catalog. This approach makes the attachment globally available throughout the entire PDF, regardless of which page the user is currently viewing:

9 0 obj
< < Names << EmbeddedFiles << Names [ (attachment.txt) << EF << F 8 0 R>> /F (attachment.txt) /Type /Filespec >> ] >>
     >>
   /Pages 1 0 R
   /Type /Catalog >>
endobj

Code Structure Explanation

/Names – Contains the name dictionary for the document
/EmbeddedFiles – Specifically handles embedded file names
(attachment.txt) – The filename as it appears to users
/EF – Embedded file dictionary containing the actual file reference
/F 8 0 R – Reference to the embedded file stream object
/Type /Filespec – Identifies this as a file specification dictionary

Page-Level Attachments

Page-specific attachments require a different approach using file attachment annotations. These are added to the /Annots array in the target page’s dictionary, creating a visible attachment icon that users can interact with directly on the page:

9 0 obj
< < Type Page (Other dictionary entries as usual) Annots [ << FS << EF << F 8 0 R>> /F (attachment.txt) /Type /Filespec >>
          /Subtype /FileAttachment
          /Contents (attachment.txt)
          /Rect [ 18 796.88976378 45 823.88976378 ]
       >> ]
>>
endobj

Page Attachment Properties

/FS – File specification dictionary (same as /EF above)
/Subtype /FileAttachment – Identifies this annotation as a file attachment
/Contents – Tooltip text that appears when hovering over the attachment icon
/Rect – Rectangle defining the position and size of the attachment icon on the page

Attachment Use Cases

📊 Data Files

Embed spreadsheets, databases, or raw data files alongside reports and analyses

🎨 Source Files

Include original design files, CAD drawings, or editable templates

📹 Media Resources

Attach video presentations, audio recordings, or interactive content

📋 Supporting Documents

Bundle related PDFs, contracts, or reference materials

Annotations

PDF annotations provide a powerful way to add interactive elements and visual markup to documents without altering the original page content. These overlay elements enhance the reading experience by allowing users to highlight text, add comments, or create clickable links. Among the most useful annotation types are hyperlinks, which enable seamless navigation between different sections of a document or to external resources.

Annotation Structure

While different annotation types serve various purposes, they all follow a consistent foundational structure with type-specific properties added as needed. PDF pages can contain multiple annotations, which are organized in an array referenced by the /Annots entry in each page’s dictionary. Every annotation is implemented as its own dictionary object with specific properties.

Annotation Dictionary Structure Table

* denotes a required entry

Key	Value Type	Value
/Type	name	When specified, this value must be set to /Annot to properly identify the dictionary type.
/Subtype*	name	Specifies the specific annotation category (e.g., Link, Text, Highlight).
/Rect*	rectangle	Defines the annotation’s position and dimensions using standard PDF coordinate units.
/Contents	text string	Contains the annotation’s text content or provides an alternative descriptive label for accessibility purposes.

The fundamental annotation dictionary example:

12 0 obj
< < Type Annot Subtype Link Rect [100 200 300 250] Border [0 0 1] C [0.0 0.0 1.0] Dest [5 0 R XYZ null null null]>>
endobj

Common Annotation Types

🔗 Link Annotations

Create clickable areas that navigate to destinations within the document or external resources.

/Subtype /Link – Identifies as a link annotation
/Dest – Destination array or named destination
/A – Action dictionary for more complex behaviors

📝 Text Annotations

Display pop-up notes and comments that appear when clicked.

/Subtype /Text – Identifies as a text annotation
/Contents – The text content of the annotation
/Open – Whether the annotation is initially open

🖍️ Markup Annotations

Highlight, underline, or strike through text content.

/Subtype /Highlight – Text highlighting
/Subtype /Underline – Text underlining
/Subtype /StrikeOut – Text strikethrough

Advanced Link Actions

Link annotations can perform various actions beyond simple navigation:

13 0 obj
< < Type Annot Subtype Link Rect [50 50 200 100] A << Type Action S URI URI (https: www.example.com)>> >>
endobj

Action Types

/S /GoTo – Navigate to a destination within the document
/S /GoToR – Navigate to a destination in another document
/S /URI – Open a web URL
/S /Launch – Launch an external application
/S /JavaScript – Execute JavaScript code

Annotation Appearance

Custom visual styling for annotations is achieved through appearance streams, enabling precise control over how annotations display to users:

14 0 obj
< < Type Annot Subtype Square Rect [100 100 200 150] C [1.0 0.0 0.0] BS << W 2 S S>>
   /AP < < N 15 0 R>> >>
endobj

15 0 obj
< < Type XObject Subtype Form BBox [0 0 100 50] Length 85>>
stream
q
1.0 0.0 0.0 RG
2 w
10 10 80 30 re
S
Q
endstream
endobj

Practical Implementation Guidelines

Document Structure Integration

Successful implementation requires understanding how these elements work together within the broader PDF document architecture:

1 0 obj
< < Type Catalog Pages 2 0 R Outlines 3 0 R Names << EmbeddedFiles 4 0 R>>
   /Metadata 5 0 R >>
endobj

✅ Implementation Checklist

Document Catalog Setup – Ensure proper references to outlines, names, and metadata
Object Numbering – Maintain consistent object numbering and cross-references
Stream Encoding – Apply appropriate filters and encoding for streams
Validation – Verify PDF structure with validation tools
Compatibility Testing – Test across different PDF viewers and versions

Common Issues and Solutions

❌ Bookmarks Not Displaying

Solution: Verify that the document catalog includes an /Outlines entry and that the outline hierarchy is properly structured with correct parent-child relationships.

❌ Metadata Not Recognized

Solution: Ensure the XML metadata stream is properly formatted, uses correct namespaces, and is referenced in the document catalog with /Type /Metadata and /Subtype /XML.

❌ Attachments Not Accessible

Solution: Check that embedded files are properly referenced in either the document-level names dictionary or page-level annotations dictionary, and that file specification dictionaries are correctly structured.

Conclusion

Mastering PDF metadata and bookmark implementation is crucial for developing professional-grade documents that offer superior user experience and functionality. These powerful features provide:

Enhanced Navigation – Through well-structured bookmarks and destinations
Rich Metadata – Enabling better document management and searchability
File Integration – Bundling related resources within documents
Interactive Elements – Creating engaging user experiences with annotations

By implementing these features correctly, you can create PDF documents that go beyond simple text and graphics to become comprehensive, interactive resources that serve both human readers and automated systems effectively.

🚀 Next Steps

Practice implementing these structures in your PDF creation workflow
Experiment with different annotation types and bookmark hierarchies
Test your implementations across multiple PDF viewers
Explore advanced PDF features building on these foundations

In-depth exploration of PDF document structure

Understanding the Inner Structure of PDF Welcome to the fascinating world of PDF internals! Have you ever wondered what makes a PDF file tick? Beyond the familiar documents we view daily lies a sophisticated architecture that has revolutionized digital document sharing. In this comprehensive exploration, we'll peel back the layers…

June 25, 2025

In "PDF Internals"

losLab PDF Library: A Comprehensive Feature Guide

Unleashing the Power of losLab PDF Library: A Comprehensive Feature Guide losLab PDF Library is a robust PDF Software Development Kit (SDK) that provides an extensive range of functionalities for handling PDF files. This guide will explore the myriad features offered by our PDF Developer Library, designed to meet the…

June 15, 2024

In "PDF Programming"

Understanding PDF: The Universal Document Format

PDF - The Document Format That Changed Everything Every day, millions of people open PDF files without giving it a second thought. But this ubiquitous format revolutionized how we share documents, ensuring that what you see on your screen matches exactly what someone else sees on theirs—whether they're using a…

June 25, 2025

In "PDF Internals"

losLab

Devoted to developing PDF and Spreadsheet developer library, including PDF creation, PDF manipulation, PDF rendering library, and Excel Spreadsheet creation & manipulation library.

Next PDF快速网页查看优化介绍：PDF线性化应用 »

Previous « PDF Text and Font Handling with Code Examples and Best Practices

HotPDF Delphi组件：在PDF文档中创建垂直文本布局

HotPDF Delphi组件：在PDF文档中创建垂直文本布局本综合指南演示了HotPDF组件如何让开发者轻松在PDF文档中生成Unicode垂直文本。理解垂直排版（縦書き/세로쓰기/竖排）垂直排版，也称为垂直书写，中文称为縱書，日文称为tategaki（縦書き），是一种起源于2000多年前古代中国的传统文本布局方法。这种书写系统从上到下、从右到左流动，创造出具有深厚文化意义的独特视觉外观。历史和文化背景垂直书写系统在东亚文学和文献中发挥了重要作用：中国：传统中文文本、古典诗歌和书法主要使用垂直布局。现代简体中文主要使用横向书写，但垂直文本在艺术和仪式场合仍然常见。日本：日语保持垂直（縦書き/tategaki）和水平（横書き/yokogaki）两种书写系统。垂直文本仍广泛用于小说、漫画、报纸和传统文档。韩国：历史上使用垂直书写（세로쓰기），但现代韩语（한글）主要使用水平布局。垂直文本出现在传统场合和艺术应用中。越南：传统越南文本在使用汉字（Chữ Hán）书写时使用垂直布局，但随着拉丁字母的采用，这种做法已基本消失。垂直文本的现代应用尽管全球趋向于水平书写，垂直文本布局在几个方面仍然相关：出版：台湾、日本和香港的传统小说、诗集和文学作品…

2 days ago

PDF 프로그래밍

HotPDF Delphi 컴포넌트: PDF 문서에서 세로쓰기

HotPDF Delphi 컴포넌트: PDF 문서에서 세로쓰기 텍스트 레이아웃 생성 이 포괄적인 가이드는 HotPDF 컴포넌트를 사용하여…

2 days ago

PDFプログラミング

HotPDF Delphiコンポーネント-PDFドキュメントでの縦書き

HotPDF Delphiコンポーネント：PDFドキュメントでの縦書きテキストレイアウトの作成この包括的なガイドでは、HotPDFコンポーネントを使用して、開発者がPDFドキュメントでUnicode縦書きテキストを簡単に生成する方法を実演します。縦書き組版の理解（縦書き/세로쓰기/竖排）縦書き組版は、日本語では縦書きまたはたてがきとも呼ばれ、2000年以上前の古代中国で生まれた伝統的なテキストレイアウト方法です。この書字体系は上から下、右から左に流れ、深い文化的意義を持つ独特の視覚的外観を作り出します。歴史的・文化的背景縦書きシステムは東アジアの文学と文書において重要な役割を果たしてきました：中国：伝統的な中国語テキスト、古典詩、書道では主に縦書きレイアウトが使用されていました。現代の簡体字中国語は主に横書きを使用していますが、縦書きテキストは芸術的・儀式的な文脈で一般的です。日本：日本語は縦書き（縦書き/たてがき）と横書き（横書き/よこがき）の両方の書字体系を維持しています。縦書きテキストは小説、漫画、新聞、伝統的な文書で広く使用されています。韓国：歴史的には縦書き（세로쓰기）を使用していましたが、現代韓国語（한글）は主に横書きレイアウトを使用しています。縦書きテキストは伝統的な文脈や芸術的応用で見られます。ベトナム：伝統的なベトナム語テキストは漢字（Chữ Hán）で書かれた際に縦書きレイアウトを使用していましたが、この慣行はラテン文字の採用とともにほぼ消失しました。縦書きテキストの現代的応用横書きへの世界的な傾向にもかかわらず、縦書きテキストレイアウトはいくつかの文脈で関連性を保っています：出版：台湾、日本、香港の伝統的な小説、詩集、文学作品…

2 days ago

Программирование PDF

Отладка проблем порядка страниц PDF: Реальный кейс-стади

Отладка проблем порядка страниц PDF: Реальный кейс-стади компонента HotPDF Опубликовано losLab | Разработка PDF |…

3 days ago

PDF 프로그래밍

PDF 페이지 순서 문제 디버깅: HotPDF 컴포넌트 실제 사례 연구

PDF 페이지 순서 문제 디버깅: HotPDF 컴포넌트 실제 사례 연구 발행자: losLab | PDF 개발…

4 days ago

PDFプログラミング

PDFページ順序問題のデバッグ：HotPDFコンポーネント実例研究

PDFページ順序問題のデバッグ：HotPDFコンポーネント実例研究発行者：losLab | PDF開発 | Delphi PDFコンポーネント PDF操作は特にページ順序を扱う際に複雑になることがあります。最近、私たちはPDF文書構造とページインデックスに関する重要な洞察を明らかにした魅力的なデバッグセッションに遭遇しました。このケーススタディは、一見単純な「オフバイワン」エラーがPDF仕様の深い調査に発展し、文書構造に関する根本的な誤解を明らかにした過程を示しています。 PDFページ順序の概念 - 物理的オブジェクト順序と論理的ページ順序の関係問題私たちはHotPDF DelphiコンポーネントのCopyPageと呼ばれるPDFページコピーユーティリティに取り組んでいました。このプログラムはデフォルトで最初のページをコピーするはずでしたが、代わりに常に2番目のページをコピーしていました。一見すると、これは単純なインデックスバグのように見えました -…

4 days ago

Master PDF Structure: XML Metadata, Bookmarks & Annotations

Key Topics Covered

📍 Destinations

📄 XML Metadata

📎 File Attachments

📝 Annotations

Bookmarks and Destinations

Understanding Destinations

Destination Types Table

Document Outline Structure

Outline Dictionary Structure Table

Outline Item Implementation

Outline Item Dictionary Structure Table

XML Metadata

XMP Metadata Structure

Standard Metadata Schemas

📋 Dublin Core (dc:)

🏷️ XMP Basic (xmp:)

📄 PDF Schema (pdf:)

Integration with Document Catalog

🎯 Best Practices for XML Metadata

File Attachments

Embedded File Structure

Document-Level Attachments

Code Structure Explanation

Page-Level Attachments

Page Attachment Properties

Attachment Use Cases

📊 Data Files

🎨 Source Files

📹 Media Resources

📋 Supporting Documents

Annotations

Annotation Structure

Annotation Dictionary Structure Table

Common Annotation Types

🔗 Link Annotations

📝 Text Annotations

🖍️ Markup Annotations

Advanced Link Actions

Action Types

Annotation Appearance

Practical Implementation Guidelines

Document Structure Integration

✅ Implementation Checklist

Common Issues and Solutions

❌ Bookmarks Not Displaying

❌ Metadata Not Recognized

❌ Attachments Not Accessible

Conclusion

🚀 Next Steps

Related

Recent Posts

Headline