Statutes at Large

This task processes the United States Statutes at Large, which are the official compilation of all public and private laws and resolutions enacted by Congress.

The scraper converts metadata from the STATUTE collection on GovInfo.gov into the project's standard bill file format.

How It Works

This is a two-step process:

  1. Download Statute Data: Use the govinfo task to download the STATUTE collection from GovInfo. It's recommended to extract the mods (for metadata) and pdf (for text content) files.
  2. Process Statutes: Run the statutes task to parse the mods.xml files and generate data.json files for each law, creating a representation as if it were a bill.

Usage

This process is captured in the scripts/statutes.sh script.

# 1. Download STATUTE collection and extract metadata and PDFs
usc-run govinfo --collections=STATUTE --extract=mods,pdf

# 2. Process statutes from volumes 65-86 to create bill status files
usc-run statutes --volumes=65-86 --govtrack

# 3. Process statutes from volumes 65-106 to create bill text version files and extract text from PDFs
usc-run statutes --volumes=65-106 --textversions --extracttext

Options

  • --volume=<num> or --volumes=<start-end>: Process only the specified volume or range of volumes.
  • --year=<num> or --years=<start-end>: Process only statutes from the specified year or range of years.
  • --textversions: Only creates the text-versions/ files for each law, skipping the main data.json file. This is useful for filling in bill text data for historical congresses where metadata already exists.
  • --linkpdf: Creates a hard link from the downloaded statute PDF to the corresponding bill text directory (.../text-versions/enr/document.pdf).
  • --extracttext: Converts the downloaded PDF for each statute into a plain text file using pdftottext and saves it in the bill text directory (.../text-versions/enr/document.txt). Requires pdftotext to be installed.