open questions 1 / 4

can better in-context reasoning close the gap with weight updates?

the question

training a model to reason more effectively over retrieved information might deliver most of the benefits of weight-level learning — without touching weights at all.

the sticking point

early results are promising. but compositionality — recombining knowledge across domains — may require weight-level representations that context retrieval can't replicate.

active debate

what should a model learn, and what should it resist overwriting?

the question

if a library's API changes, you want the model to update. if someone presents arguments against well-established physics, you don't. the problem is hard to even formalize precisely.

the sticking point

you need a principled way to distinguish "my knowledge is outdated" from "I'm being manipulated into overwriting something correct." no one has solved this.

unsolved

why are weight updates so much less data-efficient than context inserts?

the question

inserting a skill into context is enough for a model to use it. baking that same skill into weights typically requires a bootstrapped synthetic dataset — often large.

the sticking point

questions of synthetic data quality, diversity, and whether it captures the right behaviors all follow from there — and compound. there's no principled answer yet.

partially understood

how do you generate a learning signal when there's no clean verifier?

the question

reward functions work well for math and code — there are ground-truth answers. most real-world tasks don't have clean verifiers. if the agent must generate its own reward signal from environmental feedback, that's its own learning problem.

the sticking point

the reward-learning problem is nested inside the continual learning problem. solving both simultaneously is underexplored — and probably important for any agentic continual learning system.

underexplored