Statutes at Large
This task processes the United States Statutes at Large, which are the official compilation of all public and private laws and resolutions enacted by Congress.
The scraper converts metadata from the STATUTE
collection on GovInfo.gov into the project's standard bill file format.
How It Works
This is a two-step process:
- Download Statute Data: Use the
govinfo
task to download theSTATUTE
collection from GovInfo. It's recommended to extract themods
(for metadata) andpdf
(for text content) files. - Process Statutes: Run the
statutes
task to parse themods.xml
files and generatedata.json
files for each law, creating a representation as if it were a bill.
Usage
This process is captured in the scripts/statutes.sh
script.
# 1. Download STATUTE collection and extract metadata and PDFs
usc-run govinfo --collections=STATUTE --extract=mods,pdf
# 2. Process statutes from volumes 65-86 to create bill status files
usc-run statutes --volumes=65-86 --govtrack
# 3. Process statutes from volumes 65-106 to create bill text version files and extract text from PDFs
usc-run statutes --volumes=65-106 --textversions --extracttext
Options
--volume=<num>
or--volumes=<start-end>
: Process only the specified volume or range of volumes.--year=<num>
or--years=<start-end>
: Process only statutes from the specified year or range of years.--textversions
: Only creates thetext-versions/
files for each law, skipping the maindata.json
file. This is useful for filling in bill text data for historical congresses where metadata already exists.--linkpdf
: Creates a hard link from the downloaded statute PDF to the corresponding bill text directory (.../text-versions/enr/document.pdf
).--extracttext
: Converts the downloaded PDF for each statute into a plain text file usingpdftottext
and saves it in the bill text directory (.../text-versions/enr/document.txt
). Requirespdftotext
to be installed.