What is Natural Language Generation
NLP refers to the property that a model generates text at the output side.
Language model and conditional language model are corner stones of NLG: Language model gives a probability distribution given an input. Conditional language model takes more than just the input but also other signals as well.
Decoding
Greedy decoding
Beam search
- small k: deteriorates to greedy decoding, no backtracking and generally yields poor performance
- large k: better performance but expensive to track, higher beam size might lead to degraded performance as well (generic but less useful)
Sampling
- Pure/naive sampling directly from the distribution instead of just argmax in greedy decoding
- Top-k sampling
Temperature in Softmax
It changes the decoding probability distribution, not a decoding algorithm itself.
$$ P_t(w) = \frac{exp(s_w)}{\sum exp(s_o)} \rightarrow \frac{exp(s_w/\tau)}{\sum exp(s_o/\tau)} $$
- Higher temperature: everything is squeezed towards 1/uniform and therefore have closer probabilities, thus more diverse output;
- Lower temperature: the distribution is more spiky and less diverse;
Tasks and approaches
Summarization
- single-document summarization
- multi-document summarization
Or
- extractive summarization
- abstractive summarization
Neural summarization
- Pointer generator/copy mechanism
-
Bottom-up summarization:
- content selection: tag a word to include in the generation or not
- generation: only generate on the selected words
Dialogue
- Task-oriented
- Social dialogue
Traditional RNN models does not help in this take because of:
- genericness: change the sampling or change generation process (e.g. to add retrieval process)
- irrelevant response: use mutual information to penalize generic responses
- repetition: block generating same n-grams, coverage mechanism
- lack of context
- lack of consistency, persona
Storytelling
Image or prompt -> story
Evaluation
ROUGE: Recall-oriented Understudy for Gisting Evaluation, focusing more on recall(information retrieval) than precision(BLEU). Higher ROUGE score does not guarantee better summarization.
Perplexity only tells you how strong your LM is but not generation.
Aspect-based automatic metrics
- Fluency
- Style
- Diversity
- Relevance
Human evaluations aren’t perfect either.
Trends and the future
- incorporating discrete latent variables
- non-autoregressive generation
- better objectives
- use constraints in open-end generation tasks
- aim for specific targets for both the model and evaluation
- automatic metrics help
- reproducibility