High-Level API
These functions are available directly under the pdfsyntax package.
File I/O
readfile(filename: str) -> Doc
Reads a PDF file from disk and initializes a Doc object.
writefile(doc: Doc, filename: str = None)
Writes the current state of the Doc object to disk. If filename is omitted, writes to sys.stdout.
Document Info
metadata(doc: Doc) -> dict
Returns standard metadata from the PDF Info dictionary (Title, Author, Subject, etc.).
structure(doc: Doc) -> dict
Returns technical details: PDF Version, Page count, Revision count, Encryption status, Linearization status.
Transformations
rotate(doc: Doc, degrees: int = 90, pages: list = []) -> Doc
Rotates specified pages by 90, 180, or 270 degrees.
degrees: Integer (must be a multiple of 90).pages: A list of page indices (0-based) to rotate. If empty, rotates all pages.
remove_pages(doc: Doc, page_indices: set) -> Doc
Removes the pages specified by the set of indices.
keep_pages(doc: Doc, page_indices: set) -> Doc
Keeps only the pages specified, removing all others.
concat(doc_left: Doc, doc_right: Doc) -> Doc
Merges doc_right to the end of doc_left. Can also be used via the + operator: new_doc = doc1 + doc2.
Text Extraction
extract_page_text(doc: Doc, page_num: int) -> str
Extracts text from a specific page. It attempts to preserve spatial layout by inserting spaces and newlines to approximate the visual rendering.
Annotations
add_text_annotation(doc: Doc, page_num: int, text: str, rect: list, opened: bool = False) -> Doc
Adds a sticky note (Text Annotation) to a page.
rect: A list of 4 integers[x, y, width, height]defining the annotation area.