RL Agent Architecture
AlphaGen treats alpha generation as a discrete control problem. The agent constructs an expression tree token by token.
The Environment (AlphaEnv)
Located in alphagen/rl/env/core.py, the environment simulates the construction process.
- State: The current sequence of tokens generated so far.
- Action Space: A discrete space consisting of all Operators, Features, Constants, and a special
SEP(Separator/Stop) token. - Dynamics:
- The agent picks a token.
- The token is appended to the builder.
- If the tree is complete (valid) and
SEPis chosen, the alpha is evaluated against theLinearAlphaPool.
Invalid Action Masking
A critical feature of AlphaGen is Maskable PPO.
Standard RL struggles with syntactic constraints (e.g., Add must be followed by two arguments). AlphaGen calculates a validity mask at every step:
- If the agent just selected
Add, it cannot selectSEPimmediately. - If the agent needs a time window (for
Mean), it must select aDeltaTimetoken.
This logic is handled in AlphaEnv.valid_action_types(), which interacts with the ExpressionBuilder to determine what tokens are syntactically legal at the current tree depth.
Policy Network (LSTMSharedNet)
Located in alphagen/rl/policy.py.
- Embedding: Each token type is embedded into a dense vector.
- Positional Encoding: Added to retain sequence order information.
- LSTM: Processes the sequence of generated tokens to capture the context of the mathematical expression being built.
Training Process
- Proximal Policy Optimization (PPO): Used for stable policy updates.
- Entropy Regularization: Encourages the agent to explore diverse formulas.
- Reward Shaping: The reward is the increment in the pool's IC. If a generated alpha does not improve the pool, the reward is 0 (or slightly negative to penalize complexity).