High-Level API

These functions are available directly under the pdfsyntax package.

File I/O

readfile(filename: str) -> Doc

Reads a PDF file from disk and initializes a Doc object.

writefile(doc: Doc, filename: str = None)

Writes the current state of the Doc object to disk. If filename is omitted, writes to sys.stdout.

Document Info

metadata(doc: Doc) -> dict

Returns standard metadata from the PDF Info dictionary (Title, Author, Subject, etc.).

structure(doc: Doc) -> dict

Returns technical details: PDF Version, Page count, Revision count, Encryption status, Linearization status.

Transformations

rotate(doc: Doc, degrees: int = 90, pages: list = []) -> Doc

Rotates specified pages by 90, 180, or 270 degrees.

  • degrees: Integer (must be a multiple of 90).
  • pages: A list of page indices (0-based) to rotate. If empty, rotates all pages.

remove_pages(doc: Doc, page_indices: set) -> Doc

Removes the pages specified by the set of indices.

keep_pages(doc: Doc, page_indices: set) -> Doc

Keeps only the pages specified, removing all others.

concat(doc_left: Doc, doc_right: Doc) -> Doc

Merges doc_right to the end of doc_left. Can also be used via the + operator: new_doc = doc1 + doc2.

Text Extraction

extract_page_text(doc: Doc, page_num: int) -> str

Extracts text from a specific page. It attempts to preserve spatial layout by inserting spaces and newlines to approximate the visual rendering.

Annotations

add_text_annotation(doc: Doc, page_num: int, text: str, rect: list, opened: bool = False) -> Doc

Adds a sticky note (Text Annotation) to a page.

  • rect: A list of 4 integers [x, y, width, height] defining the annotation area.