Having one representation per word fails to capture polysemy. We propose an approach to learn multiple representations per word by topic-modeling the context with Hierarchical Dirichlet Process (HDP).

You can find the published work here.

We hypothesize that the distributions of topics of documents where a particular word occurs is a good signal to discriminate between different senses of the word.

As an example:

Based on this, we propose 3 models of using distributions of topics in learning embeddings:

HTLE, HTLEadd, and STLE.
  • HTLE: Uses the hard topic labels resulting from HDP sampling to learn representations
  • HTLEadd: Uses the sum of the hard topic-labeled representation and the generic (i.e. unlabeled) representation
  • STLE: Uses the topic distribution to compute a weighted sum over the word-topic representations

The learned embeddings capture polysemy in the language:

This is the list of nearest neighbors of the word bat in different representation spaces using cosine similarity. Pre-trained Skipgram and Pre-trained GloVe are pre-trained embeddings on big data. Skipgram is the Skipgram baseline and HTLE is our proposed model, both trained on Wikipedia.

We evaluate the embeddings on the lexical substitution task. Here are the results:

The code is available here.

To cite this paper:

 @InProceedings{fadaee-bisazza-monz:2017:Short1,
 author    = {Fadaee, Marzieh  and  Bisazza, Arianna  and  Monz, Christof},
 title     = {Learning Topic-Sensitive Word Representations},
 booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
 month     = {July},
 year      = {2017},
 address   = {Vancouver, Canada},
 publisher = {Association for Computational Linguistics},
 pages     = {441--447},
 url       = {http://aclweb.org/anthology/P17-2070}
}