Low-Level API

These functions allow direct manipulation of the PDF object graph.

Object Access

get_object(doc: Doc, iref)

Resolves an indirect reference to its actual object data.

  • iref: Represented as a complex number in Python (e.g., 1j represents Object 1, 2 + 3j represents Object 3 Generation 2).

trailer(doc: Doc) -> dict

Returns the Trailer dictionary of the document. This is the entry point for finding the Root catalog.

catalog(doc: Doc) -> tuple

Returns the Root Catalog dictionary and its reference.

Object Modification

update_object(doc: Doc, obj_num: int, new_obj) -> Doc

Updates an existing object in the document. This creates a new revision where the object at obj_num is replaced by new_obj.

add_object(doc: Doc, new_obj) -> tuple[Doc, complex]

Adds a completely new object to the document. Returns the new document and the reference (iref) of the newly created object.

Page Tree

flat_page_tree(doc: Doc) -> list

Traverses the internal page tree (which can be nested) and returns a flat list of page references and their inherited attributes (like Resources or MediaBox).

pages(doc: Doc) -> list

Returns a list of Page dictionaries with all inherited attributes resolved/merged.

Revision Management

rewind(doc: Doc) -> Doc

Reverts the document to the previous revision (undoing the last incremental update).

squash(doc: Doc) -> Doc

Consolidates all incremental updates into a single revision 0. This is useful for cleaning up a file with a long history of edits.