video game
rl
world
the space the player moves through
environment
the system the model interacts with
player
decides what to do each frame
agent
the llm being trained
moves
jump, shoot, move left
actions
tokens; tool calls like edit_file()
score
points the game assigns
reward
a number the environment returns
one playthrough
start to game over
episode / rollout
start to final reward