environment overview

environment Mar 19, 2026 1 min read

an environment defines how your model interacts with a task during rl training.

you configure three things:

  • tools the model can call
  • rewards used to score behavior
  • dataset preprocessing and splitting

core interface

every environment extends BaseEnv and implements these methods:

from benchmax.envs.base_env import BaseEnv, ToolDefinition, StandardizedExample

class MyEnv(BaseEnv):
    async def list_tools(self) -> list[ToolDefinition]:
        ...

    async def run_tool(self, rollout_id: str, tool_name: str, **tool_args):
        ...

    async def compute_reward(self, rollout_id: str, completion: list[dict[str, Any]], ground_truth, **kwargs) -> dict[str, float]:
        ...

    @classmethod
    def dataset_preprocess(cls, example, **kwargs) -> StandardizedExample:
        ...

go deeper