August 2015, Barcelona, ESSLLI
This course reviews the recent tradition of models of vector-based
semantic composition. In these models, lexical semantics is
typically represented with vectors (as in lexical distributional
semantics) and composition is represented using techniques from
vector, matrix or tensor algebra. These models are now seen by many
as filling the gap between (corpus-based and probabilistic)
distributional lexical semantics, and (intuition-driven,
logic-based) formal semantics.
We focus in particular on neural models, where the vectors become
activation vectors, and the composition functions can be learned
from data using standard techniques from neural network modelling.
We discuss recent successes with such models for language
modelling (Mikolov et ak. 2012), constituency parsing (Socher et
al. 2013) and dependency parsing (Le et al. 2014). We also relate
these models to foundational neural networks models of hierarchical
structure in language (Elman, 1990; Pollack, 1990), and to recent
developments in understanding how the brain processes such structure.
Motivation and description
Until recently, almost all models of compositional semantics were
based on formal logic and the lambda calculus. Within computational
linguistics, semantic composition formed one of the few domains where
corpus-based and probabilistic approach had little impact, unlike
lexical semantics where distributional and vector-based models have
already been influential for several decades.
Recently, this situation has started to change with the development of
a number of approaches to of semantic composition with vectorial
representations.
The simplest approach, with some surprising successes, has been to use
vector addition and multiplication as composition functions
(Mitchell et al, 2008). Much more complex is the approach
of (Baroni et al, 2013; Grefenstette et al, 2013), which
rely on the philosophy of formal semantics that “composition is
largely a matter of function application”. They use tensors
to represent functor words (i.e., verbs, adjectives, etc.), linear
maps as composition functions, and use contexts of phrases (in a
similar way as in distributional lexical semantics) for estimating
tensors’ elements and functions’ parameters. Socher and colleagues
propose two neural network frameworks: recursive auto encoder
(2011) for unsupervised learning, and recursive neural network for
supervised learning with task-based training signal (2010) (e.g.,
for sentiment analysis, the training signal is the sentiments given
by voters).
Although it is difficult to say which approach will be most
successfull in the longterm and we will review all approaches
mentioned, we will focus in our course on the neural-net-based
approach, e.g., the works of Socher and colleagues (including some of
our own). This focus is likely to be a major difference with the
related course as offered at ESSLLI’14 by Baroni & Dinu. Our choice is
based on the belief that learning composition functions should be
task-driven. In other words, we should prefer approaches that perform
well in many tasks. And the neural-net-based approach does: it is
widely used in many state-of-the-art systems, from syntactic parsing
(Socher et al, 2013), to sentiment analysis (Hermann et al, 2013;
Socher et al, 2013), and paraphrase detection (Socher et al, 2011).
The course therefore covers a wide range of neural-net-based
composition architectures (including traditional recursive neural
network, matrix-vector recursive neural network, recursive neural
tensor network, inside-outside recursive neural network, and recursive
autoencoder) and their applications (such as syntactic parsing,
sentiment analysis, semantic role labelling, paraphrase detection, and
machine translation). The neural interpretation of the vectorial
semantics also provides a natural bridge to discussing the topic of
the possible implementations of hierarchical structures in the brain
(Borensztajn et al, 2014).
We will make use of tutorial papers that we will make available, and
original research articles (reviewing the cited work).
Tentative outline
Day 1: Introduction to compositional distributional semantics.
Further reading:
Douglas S. Blank (1997), LEARNING TO SEE ANALOGIES:
A CONNECTIONIST EXPLORATION, PhD Thesis Indiana University, chapter 3: Connectionist Foundations (pdf)
Day 2: Background knowledge: Basic machine learning, foundational neural networks models
Further reading:
Ben Krose and Patrick van der Smagt (1996). An introduction to Neural Networks. (pdf)
Day 3: Recursive neural networks, extensions and applications I
Further reading:
Richard Socher’s Deep Learning for NLP tutorial
Day 4: Recursive neural networks, extensions and applications II; The Inside-Outside Semantics framework.
Further reading:
- P. Le and W.H. Zuidema (2014). Inside-Outside Semantics: A Framework for Neural Models of Semantic Composition. NIPS 2014 Workshop on Deep Learning and Representation Learning. [pdf]
Day 5: Reflections on the goals of compositional semantics. Relevance for cognitive science and neuroscience