Quick Start Guide
This guide will help you perform basic operations using the PDFSyntax Python API.
1. Opening and Inspecting a PDF
You can read a file and access its metadata and structure immediately.
from pdfsyntax import readfile, metadata, structure
# Load the document
doc = readfile("sample.pdf")
# Print structure information (Version, Page count, etc.)
print(structure(doc))
# Output: {'Version': '1.4', 'Pages': 5, 'Revisions': 1, ...}
# Access Metadata
print(metadata(doc))
# Output: {'Title': 'My Doc', 'Author': 'John Doe', ...}
2. Modifying a PDF (Rotation)
PDFSyntax treats documents as immutable objects. Transformations return a new document object representing a new revision.
from pdfsyntax import rotate
# Rotate all pages by 90 degrees clockwise
doc_rotated = rotate(doc, 90)
# The original 'doc' object remains unchanged.
# 'doc_rotated' contains the changes as an incremental update.
print(doc_rotated)
# Output: <PDF Doc in revision 1 with X modified object(s)>
3. Saving Changes
Write the modified document to a new file. By default, this appends the changes to the original file content (Incremental Update).
from pdfsyntax import writefile
writefile(doc_rotated, "output_rotated.pdf")
4. Text Extraction
You can extract text with spatial layout preservation.
from pdfsyntax import extract_page_text
# Extract text from the first page (index 0)
text = extract_page_text(doc, 0)
print(text)