The Linear Alpha Pool

A key innovation in AlphaGen is the Linear Alpha Pool, which shifts the goal from "finding the single best alpha" to "finding the best combination of alphas."

The LinearAlphaPool Class

Located in alphagen/models/linear_alpha_pool.py, this class manages the ensemble.

State

  • exprs: A list of Expression objects currently in the pool.
  • weights: A numpy array of weights assigned to each expression.
  • capacity: The maximum number of alphas allowed.

Optimization Logic (MseAlphaPool)

When a new expression is generated by the RL agent or LLM, the pool attempts to incorporate it using the following logic:

  1. Evaluation: Calculate the raw signal of the new alpha on the training data.
  2. IC Calculation: Calculate the Individual IC (correlation with target) and Mutual IC (correlation with existing alphas).
  3. Trial: Temporarily add the alpha to the pool.
  4. Weight Optimization: Solve a linear regression problem to find optimal weights for the ensemble.
    • The objective is typically to minimize Mean Squared Error (MSE) against the target label (future returns).
    • L1 Regularization (Lasso) is often applied to encourage sparsity and select only impactful alphas.
    • See MseAlphaPool.optimize.
  5. Selection:
    • If the new ensemble's performance (IC) improves, the alpha is kept.
    • If the pool size exceeds capacity, the alpha with the lowest absolute weight is pruned.

Updating the Pool

The try_new_expr(expr) method returns the optimization objective value. This value serves as the Reward for the Reinforcement Learning agent. This creates a feedback loop: the agent learns to generate alphas that are not just good individually, but good complementary additions to the specific alphas already in the pool.