Adaptive Computation

It is a concept that models can perform conditionally computation, primarily based on the input. It is a form of 202109261532 Machine Learning Optimization. See ¹ , ² and ³ .

Models spend more computation on difficult problems and less on easy problems, or treat those questions in different ways from varying perspectives.

However, it is known for unstable results and sensitivity towards hyper-parameters e.g. \(\tau\) in ACT, especially those controlling the trade off between computation and accuracy. ¹

Footnotes

~~[[Paper PonderNet]]~~❌

Paper Switch Transformers

~~[[Paper EarlyBERT]]~~❌

Links to this page

202109261818 Mixture of Experts

Mixture of Experts (MoE) is special case of 202109261533 Adaptive Computation where multiple experts or learners are used to solve a problem by dividing the problem space into regions. In neural networks, a MoE model can use a gating/routing network to decide which expert/sub-model to use. But it has some limitations like training stability, complexity, and communication cost.

202109261533 Adaptive Computation

Adaptive Computation