Semantic Representation Beyond Word Vectors
We want to represent the meaning of larger phrases instead of words. i.e. to find the meaning of a semantic composition of smaller elements, aka. compositionality.
The Recursion from Small Parts to Bigger Parts
The hierarchical structure of the language that can theoretically expand to infinity.
The principle of compositionality: the meaning of a sentence is defined by 1. the words and 2. the way it is constructed.
The difference between recurrency and recursiveness
Recurrency: modeling the meaning of the input up to the current step; Recursiveness: modeling each constituency bottom up, requiring a tree/syntactic structure;
Recursive Neural Network for Structure Prediction
Plausibility Score Meaning Vector
▲ ▲
└─────────────┬────────────┘
│
│
┌──────────────┐
│ │
│ NN │
│ │
│ │
└──────────────┘
▲
│
┌────────┴─┐
│ │
│ │
│ │
W1 W2
Equation
$$ p = \mathit{tanh}(W \begin{bmatrix} c_1\\ c_2 \end{bmatrix} + b ) $$
Syntactically-Untied RNN
Instead of using the same weight matrix \(W\) like the one in [[|TreeRNN]]❌, use different weight matrices for different constituents.
New equation
$$ p = \mathit{tanh}(W \begin{bmatrix} C_2 \cdot c_1\\ C_1 \cdot c_2 \end{bmatrix} + b ) $$
Each constituent \(i\) not only has a meaning vector \(c_i\) but also an operator matrix \(C_i\) so that “good” in “good person” can have an influence on the “person”.
$$ P = \mathit{W}( \begin{bmatrix} C_1\\ C_2 \end{bmatrix} ) $$
and the parent operator matrix is calculated based on the two constituents’ operator matrices.
Recursive Neural Tensor Network
$$ p_i = c_1 \cdot \mathit{W_i} \cdot c_2^\intercal + p_i' $$
\(p_i\) comes from the [[]]❌.