Quick Start

This guide demonstrates how to train an RL agent to discover alphas using the scripts/rl.py entry point.

Running the RL Experiment

The primary script is scripts/rl.py. It initializes the environment, the alpha pool, and the PPO agent.

python -m scripts.rl \
    --instruments "csi300" \
    --pool_capacity 10 \
    --steps 100000 \
    --seed 42

Command Line Arguments

  • --instruments: The stock universe to use (e.g., csi300, csi500). This corresponds to the instrument files generated in your Qlib data folder.
  • --pool_capacity: The maximum number of alphas the agent calculates at once. A size of 10-20 is typical.
  • --steps: The total number of interaction steps for the PPO agent. 100k-200k is usually sufficient for convergence on smaller pools.
  • --seed: Random seed for reproducibility.
  • --use_llm: (Optional) If set to True, enables the hybrid AlphaGPT mode (requires OpenAI API key).

Understanding the Output

Results are saved in the out/results/ directory, organized by experiment parameters and timestamp.

Inside an experiment folder (e.g., out/results/csi300_10_42_2023...), you will find:

  1. *_steps_pool.json: This is the most important file. It contains the final portfolio of alphas.

    {
        "exprs": [
            "Div(Close, Open)",
            "Mean(Volume, 10)"
        ],
        "weights": [
            0.5,
            -0.2
        ]
    }

  2. model.zip: The saved Stable Baselines3 PPO model checkpoint.

Running Baselines

To compare RL against Genetic Programming (GP), you can run the provided GP script:

python gp.py

This script uses gplearn to evolve alphas. It is configured in gp.py to use the same StockData and QLibStockDataCalculator as the RL method, ensuring a fair comparison of the generation strategy.

Visualizing Training

AlphaGen logs metrics to TensorBoard. To view training progress (IC improvement, reward curves):

tensorboard --logdir ./out/tensorboard

Look for metrics like:

  • pool/best_ic_ret: The Information Coefficient of the current alpha pool on the training set.
  • test/ic_mean: The IC on the hold-out test set.