The Linear Alpha Pool

A key innovation in AlphaGen is the Linear Alpha Pool, which shifts the goal from "finding the single best alpha" to "finding the best combination of alphas."

The `LinearAlphaPool` Class

Located in alphagen/models/linear_alpha_pool.py, this class manages the ensemble.

State

exprs: A list of Expression objects currently in the pool.
weights: A numpy array of weights assigned to each expression.
capacity: The maximum number of alphas allowed.

Optimization Logic (`MseAlphaPool`)

When a new expression is generated by the RL agent or LLM, the pool attempts to incorporate it using the following logic:

Evaluation: Calculate the raw signal of the new alpha on the training data.
IC Calculation: Calculate the Individual IC (correlation with target) and Mutual IC (correlation with existing alphas).
Trial: Temporarily add the alpha to the pool.
Weight Optimization: Solve a linear regression problem to find optimal weights for the ensemble.
- The objective is typically to minimize Mean Squared Error (MSE) against the target label (future returns).
- L1 Regularization (Lasso) is often applied to encourage sparsity and select only impactful alphas.
- See MseAlphaPool.optimize.
Selection:
- If the new ensemble's performance (IC) improves, the alpha is kept.
- If the pool size exceeds capacity, the alpha with the lowest absolute weight is pruned.

Updating the Pool

The try_new_expr(expr) method returns the optimization objective value. This value serves as the Reward for the Reinforcement Learning agent. This creates a feedback loop: the agent learns to generate alphas that are not just good individually, but good complementary additions to the specific alphas already in the pool.

The Linear Alpha Pool