Just before I started my PhD at ILLC under the supervision of Jelle, we did a small one week project to study the properties of the sentence embeddings computed with a recursive neural network trained for classifying sentences based on their sentiment.
Word Embeddings like Word2Vec and Glove have been very successful and their invention has been a progressive step in solving NLP tasks. Having such rich and simple representation for words have made researchers keen about having a similar solution for representing sentences. Hence, currently, the question of how we can build the sentence representation from its words representation is one of the very important questions. The simplest answer to this question is averaging the word vector which despite its simplicity achieves a high accuracy on several tasks. However, in this model, the order and role of the words in the sentence are not taken into account.
Our main aim here was to see if a recursive LSTM network that is trained to classify sentences based on their sentiment a good composition function for calculating sentence embeddings for purposes other than sentiment analysis. So using the last hidden layer of a recursive LSTM as sentence embedding we wanted to study the properties of this space.
More specifically we wanted to see if different relations between sentences if reflected in this space. As an example of the kind of relation between sentences, we have considered the negation. Hence the question we have tried to answer in this part is “Can we find a transformation matrix in this space that transfers a given sentence to its negated form?”
Our experiments indicated that we achieve a fair accuracy for sentences that are negated by changing the structure of the sentences, i.e by adding “not” or “do not”. Looking into this results give us the impression that the sentence embeddings are distributed based on their structure and also their sentiment. Considering the fact that these embeddings are calculated with a recursive LSTM which is build based on the parse tree of the sentences we could expect the sentence structures to be reflected in this space. Moreover, since the model is trained to be able to distinguish between sentences sentimentally it is obvious that sentences with different sentiments should be distinguishable in this space. But this is still a nice property that not only a sentence can be transferred to a sentence with negated structure but also it is transferred to a semantically related sentence.
We did further qualitative analysis by clustering the sentence embeddings. Applying the DBSCAN algorithm we observed that sentences in the same cluster have a similar structure or they are semantically-sentimentally related. It seems there is a hierarchy of clusters regarding the sentiment and structure of the sentences.
We tried to do the same experiments with recurrent LSTM model, however in the sentence embedding computed with a recurrent LSTM model nothing about the sentence structure is preserved.