Data & Commands Overview
This project's functionality is organized into tasks, each responsible for collecting a specific type of congressional data. These tasks are executed via the usc-run command-line utility.
Basic Usage
The general syntax for running a task is:
usc-run <data-type> [options]
Where <data-type> is one of the available scrapers. This section provides a detailed guide to each primary data type and its specific options.
Common Options
Several options are common to most or all commands:
--log=<level>: Sets the logging verbosity. Can bedebug,info,warn, orerror. Default iswarn.--debug: A shortcut for--log=debug.--force: Forces the re-downloading of all network resources, ignoring anything in thecachedirectory.--timestamps: Adds timestamps to all log output.--govtrack: When outputting XML, this flag ensures the output uses GovTrack legislator IDs for full backward compatibility with legacy data.--congress=<number>: Restricts the scrape to a specific Congress (e.g.,--congress=117). Many commands support a comma-separated list for multiple congresses.
Output Structure
By default, all tasks write their output to two top-level directories:
cache/: Stores raw downloaded files from the internet. Subsequent runs will use these files instead of re-downloading them unless you use the--forceflag.data/: Stores the final, processed, structured data. The internal structure of this directory depends on the data type being collected.
For most data objects (like a bill or vote), two files are generated:
data.json: A detailed JSON representation of the data.data.xml: An XML version that maintains backward compatibility with the format historically provided by GovTrack.us.