agent
llm
environment
gives rewards
action
reward + obs