Simplifying BERT-based models to increase efficiency, capacity

Metadata

To make BERT-based models more efficient, we progressively eliminate redundant individual-word embeddings in intermediate layers of the network, while trying to minimize the effect on the complete-sentence embeddings. — Updated on 2022-07-02 15:55:45 — Group: #Personal
The basic idea is that, in each of the network’s encoders, we preserve the embedding of the CLS token but select a representative subset — a core set — of the other tokens’ embeddings. — Updated on 2022-07-02 16:00:05 — Group: #Personal