standard fine-tuning
task A (original)
88%
task B (new)
0%
task A: −50pp degradation
sparse fine-tuning
task A (original)
88%
task B (new)
0%
task A: −6pp (89% less forgetting)
before training on task B