Quick Start

This guide demonstrates how to train an RL agent to discover alphas using the scripts/rl.py entry point.

Running the RL Experiment

The primary script is scripts/rl.py. It initializes the environment, the alpha pool, and the PPO agent.

python -m scripts.rl \
    --instruments "csi300" \
    --pool_capacity 10 \
    --steps 100000 \
    --seed 42

Command Line Arguments

--instruments: The stock universe to use (e.g., csi300, csi500). This corresponds to the instrument files generated in your Qlib data folder.
--pool_capacity: The maximum number of alphas the agent calculates at once. A size of 10-20 is typical.
--steps: The total number of interaction steps for the PPO agent. 100k-200k is usually sufficient for convergence on smaller pools.
--seed: Random seed for reproducibility.
--use_llm: (Optional) If set to True, enables the hybrid AlphaGPT mode (requires OpenAI API key).

Understanding the Output

Results are saved in the out/results/ directory, organized by experiment parameters and timestamp.

Inside an experiment folder (e.g., out/results/csi300_10_42_2023...), you will find:

*_steps_pool.json: This is the most important file. It contains the final portfolio of alphas.

{
    "exprs": [
        "Div(Close, Open)",
        "Mean(Volume, 10)"
    ],
    "weights": [
        0.5,
        -0.2
    ]
}

model.zip: The saved Stable Baselines3 PPO model checkpoint.

Running Baselines

To compare RL against Genetic Programming (GP), you can run the provided GP script:

python gp.py

This script uses gplearn to evolve alphas. It is configured in gp.py to use the same StockData and QLibStockDataCalculator as the RL method, ensuring a fair comparison of the generation strategy.

Visualizing Training

AlphaGen logs metrics to TensorBoard. To view training progress (IC improvement, reward curves):

tensorboard --logdir ./out/tensorboard

Look for metrics like:

pool/best_ic_ret: The Information Coefficient of the current alpha pool on the training set.
test/ic_mean: The IC on the hold-out test set.