agent llm environment gives rewards action reward + obs