Data Layer Architecture
AlphaGen is designed for speed. Evaluating thousands of alphas during RL training requires a highly optimized data layer.
StockData Class
Located in alphagen_qlib/stock_data.py, this class is the bridge between Qlib's binary storage and PyTorch tensors.
Initialization
data = StockData(
instrument="csi300",
start_time="2010-01-01",
end_time="2020-12-31",
device=torch.device("cuda:0")
)
Internal Structure
Unlike standard Pandas DataFrames, StockData loads the entire dataset into a contiguous PyTorch Tensor on the GPU (if available).
- Tensor Shape:
(n_days, n_features, n_stocks).n_days: Total trading days in the range.n_features: 6 basic features (Open, Close, High, Low, Volume, VWAP).n_stocks: Number of unique stocks in the instrument set.
Handling Missing Data
Stock data is jagged (stocks halt, delist, or IPO). Qlib handles alignment. StockData maintains an internal mapping of valid stocks per day to ensure calculations like Rank or Mean only consider active stocks.
Calculator
The QLibStockDataCalculator (in alphagen_qlib/calculator.py) performs the actual evaluation.
- Batch Pearson Correlation: It implements a fast, tensor-based Pearson correlation to compute IC.
- Caching: It manages the evaluation context, ensuring the target (future return) is pre-calculated and normalized.