202205231928 Cost Equivalent Curve 202205231931 Transfer Learning Approaches
Directly modeling commonsense knowledge graph does not help the model, maybe this calls for something like knowledge augmented language modeling.
Page 1
First, we propose a new multitask benchmark, RAINBOW, to promote research on commonsense models that generalize well over multiple tasks and datasets
a novel evaluation, the cost equivalent curve, that sheds new insight on how the choice of source datasets, pretrained language models, and transfer learning methods impacts performance and data efficiency.
Page 2
we define cost as the number of training examples in the target dataset.
The construction of cost equivalent curves makes one technical assumption: the relationship between performance and cost is continuous and strictly monotonic (i.e., increas- ing or decreasing)
Page 3
multitask training (Caruana 1995): training on multi- ple datasets (including the target dataset) all at once,
equential training (Pratt, Mostow, and Kamm 1991): first training on multiple datasets (excluding the target dataset) through multitask training, and then continuing to train on the target dataset alone,
multitask fine-tuning (Liu et al. 2019a): first training on all datasets (including the target dataset) through mul- titask training, and then continuing to fine-tune on the tar- get dataset alone.
Finding 1: Sequential training almost always matches or beats other approaches.
Page 4
Finding 2: Sequential training rarely hurts performance.
Multitask training helps most often in the low- data regime.
multi- task learning tends to help when data is scarce, but may hurt performance if data is plentiful.
Page 5
The off-the-shelf T5’s weights come from multitask pretraining, where many tasks are mixed with a language modeling objective to learn a powerful initialization for the weights. In fact, both GLUE and SUPERGLUE were mixed into the pretraining (Raffel et al. 2019).
Larger models benefit more from transfer
Page 6
the serialized language from the knowledge graphs is not in a QA format, and the knowledge graph com- pletion task is generative while all other tasks are discrimi- native