Low-Level API
These functions allow direct manipulation of the PDF object graph.
Object Access
get_object(doc: Doc, iref)
Resolves an indirect reference to its actual object data.
iref: Represented as a complex number in Python (e.g.,1jrepresents Object 1,2 + 3jrepresents Object 3 Generation 2).
trailer(doc: Doc) -> dict
Returns the Trailer dictionary of the document. This is the entry point for finding the Root catalog.
catalog(doc: Doc) -> tuple
Returns the Root Catalog dictionary and its reference.
Object Modification
update_object(doc: Doc, obj_num: int, new_obj) -> Doc
Updates an existing object in the document. This creates a new revision where the object at obj_num is replaced by new_obj.
add_object(doc: Doc, new_obj) -> tuple[Doc, complex]
Adds a completely new object to the document. Returns the new document and the reference (iref) of the newly created object.
Page Tree
flat_page_tree(doc: Doc) -> list
Traverses the internal page tree (which can be nested) and returns a flat list of page references and their inherited attributes (like Resources or MediaBox).
pages(doc: Doc) -> list
Returns a list of Page dictionaries with all inherited attributes resolved/merged.
Revision Management
rewind(doc: Doc) -> Doc
Reverts the document to the previous revision (undoing the last incremental update).
squash(doc: Doc) -> Doc
Consolidates all incremental updates into a single revision 0. This is useful for cleaning up a file with a long history of edits.