Environments

We provide 8 blazingly fast goal-conditioned environments based on MJX and BRAX and jitted framework for quick experimentation with goal-conditioned self-supervised reinforcement learning.

Environment	Env name	Code
Reacher	`reacher`	link
Half Cheetah	`cheetah`	link
Pusher	`pusher_easy` `pusher_hard`	link
Ant	`ant`	link
Ant Maze	`ant_u_maze` `ant_big_maze` `ant_hardest_maze`	link
Ant Soccer	`ant_ball`	link
Ant Push	`ant_push`	link
Humanoid	`humanoid`	link

Adding new environments

Each environment implementation has 2 main parts: an XML file and a Python file.

The XML file contains information about geometries, placements, properties, and movements of objects in the environment. Depending on the Brax pipeline used, the XML file may vary slightly, but generally, it should follow MuJoCo XML reference. Since all environments are vectorized and compiled with JAX, the information in MJX guide should also be taken into consideration, particularly the feature parity section and performance tuning section.

XML files

In our experience XML files that worked with standard MuJoCo require some tuning for MJX. In particular, the number of solver iterations should be carefully adjusted, so that the environment is fast but still stable.

The Python file contains the logic of the environment, a description of how the environment is initialized, restored, and how one environment step looks. The class describing the environment should inherit from BRAX's PipelineEnv class. All environment logic should be JIT-able with JAX, which requires some care in using certain Python instructions like if and for. The observation returned by the step function of the environment should be a state of the environment concatenated with the current environment goal. Each environment class should also provide 2 additional properties: * self.state_dim - The size of the state of the environment (that is observation without the goal). * self.goal_indices - Array with state indices that make the goal. For example, in the Ant environment the goal is specified as the x and y coordinates of the torso. Thus we specify self.goal_indices = jnp.array([0, 1]), since the x and y coordinates of the torso are at positions 0 and 1 in the state of the environment.

To use the new environment it should be added to the create_env function in utils.py.