Extending with Beanstalkd

This project includes a powerful --patch option that allows you to modify the scraper's behavior at runtime by applying a "monkey-patch." A prime example of this is the included contrib.beanstalkd module, which integrates the scraper with the Beanstalkd work queue.

Purpose

When enabled, this patch will push the ID of a bill, amendment, or vote onto a Beanstalkd queue immediately after its data file has been written to disk. This allows you to build a real-time processing pipeline where other services can listen to the queue and react instantly to new or updated data.

Usage

To enable the patch, add the --patch=contrib.beanstalkd flag to your usc-run command.

usc-run bills --patch=contrib.beanstalkd

This command will run the bills scraper as usual, but after each bill is processed, it will send the bill ID to the configured Beanstalkd tube.

Configuration

To use this feature, you must have a config.yml file with a beanstalk section specifying your queue connection details and tube names.

Example config.yml:

beanstalk:
  connection:
    host: 'localhost'
    port: 11300
  tubes:
    bills: 'us_bills'
    amendments: 'us_amendments'
    votes: 'us_votes'

The script requires unique names for each tube to avoid ambiguity.

How It Works

The congress/contrib/beanstalkd.py module contains a patch() function. When usc-run is invoked with --patch=contrib.beanstalkd, it imports this module and calls patch(). This function then replaces the standard process_bill, process_amendment, and output_vote functions with wrapped versions that add the queueing logic. This is an advanced technique that provides great flexibility for integrating the scrapers into larger systems.