RL Agent Architecture

AlphaGen treats alpha generation as a discrete control problem. The agent constructs an expression tree token by token.

The Environment (`AlphaEnv`)

Located in alphagen/rl/env/core.py, the environment simulates the construction process.

State: The current sequence of tokens generated so far.
Action Space: A discrete space consisting of all Operators, Features, Constants, and a special SEP (Separator/Stop) token.
Dynamics:
- The agent picks a token.
- The token is appended to the builder.
- If the tree is complete (valid) and SEP is chosen, the alpha is evaluated against the LinearAlphaPool.

Invalid Action Masking

A critical feature of AlphaGen is Maskable PPO.

Standard RL struggles with syntactic constraints (e.g., Add must be followed by two arguments). AlphaGen calculates a validity mask at every step:

If the agent just selected Add, it cannot select SEP immediately.
If the agent needs a time window (for Mean), it must select a DeltaTime token.

This logic is handled in AlphaEnv.valid_action_types(), which interacts with the ExpressionBuilder to determine what tokens are syntactically legal at the current tree depth.

Policy Network (`LSTMSharedNet`)

Located in alphagen/rl/policy.py.

Embedding: Each token type is embedded into a dense vector.
Positional Encoding: Added to retain sequence order information.
LSTM: Processes the sequence of generated tokens to capture the context of the mathematical expression being built.

Training Process

Proximal Policy Optimization (PPO): Used for stable policy updates.
Entropy Regularization: Encourages the agent to explore diverse formulas.
Reward Shaping: The reward is the increment in the pool's IC. If a generated alpha does not improve the pool, the reward is 0 (or slightly negative to penalize complexity).

RL Agent Architecture

The Environment (AlphaEnv)

Invalid Action Masking

Policy Network (LSTMSharedNet)

Training Process

The Environment (`AlphaEnv`)

Policy Network (`LSTMSharedNet`)