RL Agent Architecture

AlphaGen treats alpha generation as a discrete control problem. The agent constructs an expression tree token by token.

The Environment (AlphaEnv)

Located in alphagen/rl/env/core.py, the environment simulates the construction process.

  • State: The current sequence of tokens generated so far.
  • Action Space: A discrete space consisting of all Operators, Features, Constants, and a special SEP (Separator/Stop) token.
  • Dynamics:
    • The agent picks a token.
    • The token is appended to the builder.
    • If the tree is complete (valid) and SEP is chosen, the alpha is evaluated against the LinearAlphaPool.

Invalid Action Masking

A critical feature of AlphaGen is Maskable PPO.

Standard RL struggles with syntactic constraints (e.g., Add must be followed by two arguments). AlphaGen calculates a validity mask at every step:

  • If the agent just selected Add, it cannot select SEP immediately.
  • If the agent needs a time window (for Mean), it must select a DeltaTime token.

This logic is handled in AlphaEnv.valid_action_types(), which interacts with the ExpressionBuilder to determine what tokens are syntactically legal at the current tree depth.

Policy Network (LSTMSharedNet)

Located in alphagen/rl/policy.py.

  • Embedding: Each token type is embedded into a dense vector.
  • Positional Encoding: Added to retain sequence order information.
  • LSTM: Processes the sequence of generated tokens to capture the context of the mathematical expression being built.

Training Process

  1. Proximal Policy Optimization (PPO): Used for stable policy updates.
  2. Entropy Regularization: Encourages the agent to explore diverse formulas.
  3. Reward Shaping: The reward is the increment in the pool's IC. If a generated alpha does not improve the pool, the reward is 0 (or slightly negative to penalize complexity).