Data & Commands Overview
This project's functionality is organized into tasks, each responsible for collecting a specific type of congressional data. These tasks are executed via the usc-run
command-line utility.
Basic Usage
The general syntax for running a task is:
usc-run <data-type> [options]
Where <data-type>
is one of the available scrapers. This section provides a detailed guide to each primary data type and its specific options.
Common Options
Several options are common to most or all commands:
--log=<level>
: Sets the logging verbosity. Can bedebug
,info
,warn
, orerror
. Default iswarn
.--debug
: A shortcut for--log=debug
.--force
: Forces the re-downloading of all network resources, ignoring anything in thecache
directory.--timestamps
: Adds timestamps to all log output.--govtrack
: When outputting XML, this flag ensures the output uses GovTrack legislator IDs for full backward compatibility with legacy data.--congress=<number>
: Restricts the scrape to a specific Congress (e.g.,--congress=117
). Many commands support a comma-separated list for multiple congresses.
Output Structure
By default, all tasks write their output to two top-level directories:
cache/
: Stores raw downloaded files from the internet. Subsequent runs will use these files instead of re-downloading them unless you use the--force
flag.data/
: Stores the final, processed, structured data. The internal structure of this directory depends on the data type being collected.
For most data objects (like a bill or vote), two files are generated:
data.json
: A detailed JSON representation of the data.data.xml
: An XML version that maintains backward compatibility with the format historically provided by GovTrack.us.