5. Text Generation

#article #Public

Metadata

Page Notes

Highlights

  • This is a common problem with greedy search algorithms, which can fail to give you the optimal solution; in the context of decoding, they can miss word sequences whose overall probability is higher just because high-probability words happen to be preceded by low-probability ones. — Updated on 2022-07-06 20:55:54 — Group: #Public

  • Although greedy search decoding is rarely used for text generation tasks that require diversity, it can be useful for producing short sequences like arithmetic where a deterministic and factually correct output is preferred. — Updated on 2022-07-06 20:56:34 — Group: #Public

  • By tuning T we can control the shape of the probability distribution.5 When T≪1, the distribution becomes peaked around the origin and the rare tokens are suppressed. On the other hand, when T≫1, the distribution flattens out and each token becomes equally likely. — Updated on 2022-07-06 21:03:14 — Group: #Public

  • The main lesson we can draw from temperature is that it allows us to control the quality of the samples, but there’s always a trade-off between coherence (low temperature) and diversity (high temperature) that one has to tune to the use case at hand. — Updated on 2022-07-06 21:04:11 — Group: #Public

  • The idea behind top-k sampling is to avoid the low-probability choices by only sampling from the k tokens with the highest probability. — Updated on 2022-07-06 21:06:55 — Group: #Public

  • With nucleus or top-p sampling, instead of choosing a fixed cutoff value, we set a condition of when to cut off. This condition is when a certain probability mass in the selection is reached. — Updated on 2022-07-06 21:07:37 — Group: #Public

  • Unfortunately, there is no universally “best” decoding method. Which approach is best will depend on the nature of the task you are generating text for. If you want your model to perform a precise task like arithmetic or providing an answer to a specific question, then you should lower the temperature or use deterministic methods like greedy search in combination with beam search to guarantee getting the most likely answer. If you want the model to generate longer texts and even be a bit creative, then you should switch to sampling methods and increase the temperature or use a mix of top-k and nucleus sampling. — Updated on 2022-07-06 21:12:17 — Group: #Public