TL;DR
Stop worrying about algorithms, just change the network architecture to SimBa
Overview. SimBa infuses simplicity bias through architectural changes, without modifying the underlying deep RL algorithm.
(a) SimBa enhaces sample efficiency: Sample efficiency across various RL algorithms, including off-policy model-free (SAC), off-policy model-based (TD-MPC2), on-policy model-free (PPO), and unsupervised (METRA) RL methods. (b) Off-policy RL Benchmark When applied to SAC, SimBa matches or surpasses state-of-the-art off-policy RL methods with minimal computational overhead across 51 continuous control tasks, by only modifying
the network architecture and scaling up the number of network parameters.
Abstract
We introduce SimBa, an architecture designed to inject simplicity bias for scaling up the parameters in deep RL. Simba consists of three components: (i) standardizing input observations with running statistics, (ii) incorporating residual feedforward blocks to provide a linear pathway from the input to the output, and (iii) applying layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms—including off-policy, on-policy, and unsupervised methods—is consistently improved. Moreover, when SimBa is integrated into SAC, it matches or surpasses state-of-the-art deep RL methods with high computational efficiency across 51 tasks from DMC, MyoSuite, and HumanoidBench, solely by modifying the network architecture. These results demonstrate SimBa's broad applicability and effectiveness across diverse RL algorithms and environments.
SimBa Architecture
SimBa comprises three components: Running Statistics Normalization, Residual Feedforward Blocks, and Post-Layer Normalization. These components lower the network's functional complexity, enhancing generalization for highly overparameterized configurations.
SimBa with Off-Policy RL
SimBa with On-Policy RL
PPO
PPO + SimBa
SimBa with Unsupervised RL
METRA
METRA + SimBa
Paper
SimBa: Simplicity Bias for Scaling Up Parameters for Deep RLHojoon Lee*, Dongyoon Hwang*, Donghu Kim,
Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman,
Jaegul Choo, Peter Stone, Takuma Seno
arXiv preprint
Citation
If you find our work useful, please consider citing the paper as follows: