reward hacking
you specify
score = 1 if summary is short
you want
an accurate, useful summary
what the model finds
returns "." — shortest possible output
copies the first sentence verbatim
repeats the task description back