Quick Start Guide

This guide will help you perform basic operations using the PDFSyntax Python API.

1. Opening and Inspecting a PDF

You can read a file and access its metadata and structure immediately.

from pdfsyntax import readfile, metadata, structure

# Load the document
doc = readfile("sample.pdf")

# Print structure information (Version, Page count, etc.)
print(structure(doc))
# Output: {'Version': '1.4', 'Pages': 5, 'Revisions': 1, ...}

# Access Metadata
print(metadata(doc))
# Output: {'Title': 'My Doc', 'Author': 'John Doe', ...}

2. Modifying a PDF (Rotation)

PDFSyntax treats documents as immutable objects. Transformations return a new document object representing a new revision.

from pdfsyntax import rotate

# Rotate all pages by 90 degrees clockwise
doc_rotated = rotate(doc, 90)

# The original 'doc' object remains unchanged.
# 'doc_rotated' contains the changes as an incremental update.
print(doc_rotated)
# Output: <PDF Doc in revision 1 with X modified object(s)>

3. Saving Changes

Write the modified document to a new file. By default, this appends the changes to the original file content (Incremental Update).

from pdfsyntax import writefile

writefile(doc_rotated, "output_rotated.pdf")

4. Text Extraction

You can extract text with spatial layout preservation.

from pdfsyntax import extract_page_text

# Extract text from the first page (index 0)
text = extract_page_text(doc, 0)
print(text)