Architecture & Design
PDFSyntax is built on specific architectural choices designed to make PDF manipulation safer and more transparent.
Immutability and Pure Functions
The library adopts a functional style. The core Doc object is treated as immutable. Functions that modify the document (like rotate or update_object) do not change the existing object in place. Instead, they return a new Doc object containing the requested changes.
# doc is unchanged; doc_new contains the rotation
doc_new = rotate(doc, 90)
The Doc Object
The Doc object acts as a container for:
- Index: A mapping of object numbers to their locations (byte offsets) in the file.
- Cache: Memoized parsed objects to avoid re-parsing the file repeatedly.
- Data: Handles to the raw binary file data.
Handling Revisions
PDFSyntax natively understands Incremental Updates.
- Revisions: When you load a PDF, it may already contain multiple revisions (original save + subsequent edits).
- Append-Only: When you modify the
Docusing PDFSyntax, you are essentially creating a new revision in memory. When you callwritefile, these changes are appended to the end of the original byte stream. - Rewind: You can programmatically roll back changes using
rewind(doc)to access previous states of the document. - Squash: You can merge all revisions into a single, clean file using
squash(doc)if the file size becomes too large due to history.