Multitasking Learning
Single-task learning: train one model to do one job. But you need to start from scratch when switching to another task; Partially pre-trained models(Word2Vec, GloVe or ELMo) might help but still not ideal. (This is not a concern ever since the introduction of PLMs)
Tasks need supervision but the pre-training for general LMs are not using the same objective (before 2020).
Unified multitasking learning: one model to rule them all. No or minimum overhead dealing with new problems and potentially moves toward continual/life-long learning.
decaNLP: The Natural Language Decathlon
- QA as the unified framework: Question + Context -> Answer
- Multitasking can hurt the performance: catastrophic forgetting
- Multitasking helps the zero-shot learning the most
Multitasking Training Strategies
Issues for MT: pointer generator enforces the model to learn to point back to the input but machine translation task is ignoring that part of the model
- Anti-Curriculum Training: difficult tasks first to escape local optima; By difficult, it means how many epochs it takes to converge for a single-task model
Transferability
- It transfers well to new tasks than random initialization
- Zero-shot performance
- compositional multitasking? e.g. English to German summary