Installation and Data Setup
AlphaGen requires a specific environment setup, particularly for the data layer, which relies on Qlib's binary format for efficiency.
1. Environment Setup
This project requires Python 3.8+ and PyTorch. We recommend creating a dedicated virtual environment.
# Clone the repository
git clone https://github.com/RL-MLDM/alphagen.git
cd alphagen
# Create a virtual environment (optional but recommended)
conda create -n alphagen python=3.8
conda activate alphagen
# Install dependencies
pip install -r requirements.txt
Key Dependencies
stable_baselines3&sb3_contrib: Used for the PPO implementation.qlib: Used for high-speed data retrieval and backtesting.baostock: Used as the raw data source for Chinese A-shares.
2. Data Pipeline Configuration
AlphaGen does not work out-of-the-box without data. You must download raw stock data and convert it into the Qlib binary format.
Why Baostock and Qlib?
- Baostock: A free, open-source data provider for Chinese stock data. We use it to fetch OHLCV (Open, High, Low, Close, Volume) data.
- Qlib: A quantitative platform by Microsoft. It uses a binary file structure that is significantly faster than CSVs or SQL for the heavy tensor operations performed during alpha mining.
Running the Data Script
We provide a comprehensive script data_collection/fetch_baostock_data.py. This script handles downloading, cleaning, and converting the data.
# Run the data fetcher
python data_collection/fetch_baostock_data.py
What this script does:
- Login: Connects to the Baostock API.
- Fetch List: Gets the list of all A-shares.
- Download: Iterates through every stock to download daily K-line data and adjustment factors.
- Save CSV: Saves intermediate CSVs to
../data/export. - Dump Binary: Invokes
DumpDataAll(fromqlib_dump_bin.py) to convert CSVs into Qlib binaries.
Data Location
By default, the script initializes Qlib with data at:
~/.qlib/qlib_data/cn_data_baostock_fwdadj (or similar, check the script output).
If you want to store data elsewhere, modify the DataManager instantiation in fetch_baostock_data.py:
# In data_collection/fetch_baostock_data.py
dm = DataManager(
save_path="../data",
qlib_export_path="~/.qlib/qlib_data/cn_data", # <--- Destination for Qlib Binaries
qlib_base_data_path="~/.qlib/qlib_data/cn_data",
adjust_date="2009-01-01"
)
3. Verifying Installation
To ensure Qlib is reading the data correctly, you can run a simple python check:
from alphagen_qlib.stock_data import StockData, initialize_qlib
import torch
# Point this to your generated data path
initialize_qlib("~/.qlib/qlib_data/cn_data")
try:
# Attempt to load CSI300 data for a small range
data = StockData(
instrument="csi300",
start_time="2020-01-01",
end_time="2020-01-10",
device=torch.device("cpu")
)
print(f"Success! Loaded data shape: {data.data.shape}")
except Exception as e:
print(f"Data loading failed: {e}")