Quick Start
This guide demonstrates how to train an RL agent to discover alphas using the scripts/rl.py entry point.
Running the RL Experiment
The primary script is scripts/rl.py. It initializes the environment, the alpha pool, and the PPO agent.
python -m scripts.rl \
--instruments "csi300" \
--pool_capacity 10 \
--steps 100000 \
--seed 42
Command Line Arguments
--instruments: The stock universe to use (e.g.,csi300,csi500). This corresponds to the instrument files generated in your Qlib data folder.--pool_capacity: The maximum number of alphas the agent calculates at once. A size of 10-20 is typical.--steps: The total number of interaction steps for the PPO agent. 100k-200k is usually sufficient for convergence on smaller pools.--seed: Random seed for reproducibility.--use_llm: (Optional) If set toTrue, enables the hybrid AlphaGPT mode (requires OpenAI API key).
Understanding the Output
Results are saved in the out/results/ directory, organized by experiment parameters and timestamp.
Inside an experiment folder (e.g., out/results/csi300_10_42_2023...), you will find:
-
*_steps_pool.json: This is the most important file. It contains the final portfolio of alphas.{ "exprs": [ "Div(Close, Open)", "Mean(Volume, 10)" ], "weights": [ 0.5, -0.2 ] } -
model.zip: The saved Stable Baselines3 PPO model checkpoint.
Running Baselines
To compare RL against Genetic Programming (GP), you can run the provided GP script:
python gp.py
This script uses gplearn to evolve alphas. It is configured in gp.py to use the same StockData and QLibStockDataCalculator as the RL method, ensuring a fair comparison of the generation strategy.
Visualizing Training
AlphaGen logs metrics to TensorBoard. To view training progress (IC improvement, reward curves):
tensorboard --logdir ./out/tensorboard
Look for metrics like:
pool/best_ic_ret: The Information Coefficient of the current alpha pool on the training set.test/ic_mean: The IC on the hold-out test set.