Word & Sentence Representations and Responsible AI

There currently is much discussion (finally!) about the need to study the social consequences of the widespread adoption of natural language processing technology, and to pay much more attention to ethics in our education. One important issue there is the effects of racial, sexist and other biases present in the data that we train our NLP algorithms on. Are the AI systems we build as biased as humans are? Or even worse? Does that lead to real world consequences?

These are difficult questions, in turns out. But if identifying bias is hard, fixing it is even harder. Simple fixes trying to balance the dataset often don’t well, because the accuracy of our systems drops when data are rmoved or artificial data is added.

There is an interesting parallel between the need to both identify and correct bias (w.r.t. ethnicity, gender, sexual orientation, religion, class, education), and the quest to try to identify and influence how modern deep learning systems represent linguistic category information (number, gender, determinacy, animacy, case etc). In this post I would like to explore the extent to which the approach we have been developing for the latter, can be applied to also achieve the former.

Diagnostic classification

Let’s start with that prior work. The approach we have developed to answer linguistic questions about deep learning system is called ‘Diagnostic classification’. The idea that we train a deep learning model — henceforth the ‘target model’ — for some natural language task, for instance, for predicting the next word in a sentence. With current techniques (e.g., two layer LSTMs, a clever training regime, and enormous datasets) we have become really good at that task. But what linguistic information is the deep learning model using to make its excellent predictions?

To figure that out we have tried all the visualization and ‘ablation’ tricks (i.e., systematically damaging the trained network and see what happens) from the literature, but found that they are only of limited use. LSTMs and other deep learning models are (i) high dimensional, and (ii) highly nonlinear. This means that visualization is of little use, because the solutions the LSTM finds have information distributed over hundreds or thousands of dimensions, which our eyes cannot track all at once. Moreover, the solutions often involve interactions between parts, such that the function of a part typically is a different one for each configurations of the other parts; knocking out components one by one is therefore not likely to reveal what is really going on.

The solution we found (inspired by lots of earlier work from other groups, and in parallel to other groups) is to develop a series of meta-models to help figure out the inner workings of the target model. The function of the meta-models is in the first place to diagnose what is going on, and often these models are classifiers (although sometime they are regressors or models producing complex, structured output); hence, we refer to them as diagnostic classifiers.

We published the first paper on diagnostic classifiers in 2016 (introducing the term), where we focused on networks trained to perform simple arithmetics — with only addition, substraction and brackets. In 2017 we published a paper on the same task that used diagnostic classifiers on the same task, but went on to use the insights gained to change the training regime. By adding closeness to the nearest symbolic solution to the loss function, we managed to ‘guide’ the target network to even better performance (“symbolic guidance“).

Our latest paper, to be presented at the upcoming BlackboxNLP workshop, applies the whole framework to language modelling. We build on the work of Linzen et al (2016) and Gulordava et al. (2018), who studied the ability of LSTM-based language models to learn about number agreement and other syntactic dependencies between words in a sentence.

 

References

Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg. 2016. Assessing the ability of lstms to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics, 4:521–535.
Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, and Marco Baroni. 2018. Colorless green recurrent networks dream hierarchically. In Proceedings of the 2018 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), volume 1, pages 1195–
1205.

Hupkes et al. 2016, 2017, 2018

Other resources

The Mexican restaurant example is mentioned here:
https://blog.conceptnet.io/posts/2017/conceptnet-numberbatch-17-04-better-less-stereotyped-word-vectors/

The Science paper on gender and other biases is discussed and link to here:
https://joanna-bryson.blogspot.com/2017/04/we-didnt-prove-prejudice-is-true-role.html

And here are some general resources on ethics & AI (in particular week 6: Fairness) https://github.com/sblodgett/ai-ethics

Outnumbered (https://www.amazon.com/Outnumbered-Exploring-Algorithms-Control-Lives/dp/147294741X)

Experiential, Distributional and Dependency-based Word Embeddings have Complementary Roles in Decoding Brain Activity

Samira Abnar, Rasyan Ahmed, Max Mijnheer, Willem Zuidema

CMCL’18

We evaluate 8 different word embedding models on their usefulness for predicting the neural activation patterns associated with concrete nouns. The models we consider include an experiential model, based on crowd-sourced association data, several popular neural and distributional models, and a model that reflects the syntactic context of words (based on dependency parses). Our goal is to assess the cognitive plausibility of these various embedding models, and understand how we can further improve our methods for interpreting brain imaging data.

We show that neural word embedding models exhibit superior performance on the tasks we consider, beating experiential word representation model. The syntactically informed model gives the overall best performance when predicting brain activation patterns from word embeddings; the GloVe distributional method gives the overall best performance when predicting in the reverse direction (words vectors from brain images). Interestingly, however, the error patterns of these different models are markedly different. This may support the idea that the brain uses different systems for processing different kinds of words. Moreover, we suggest that taking the relative strengths of different embedding models into account will lead to better models of the brain activity associated with words.

Language, art and music are extremely revealing about workings of the human mind

I was interviewed by Gisela Govaart about my research. The interview is published online here.

***

Language, art and music are extremely revealing about workings of the human mind” – An interview with Jelle Zuidema
by Gisela Govaart, January 2016

Jelle Zuidema is assistant professor in cognitive science and computational linguistics at the Institute for Logic, Language and Computation. He does research on these topics, coordinates the Cognition, Language and Computation lab and supervises five PhD and several MSc students there. He teaches in the interdisciplinary master’s programs Brain & Cognitive Sciences (MBCS), Artificial Intelligence, and Logic, and coordinates the Cognitive Science track in the MBCS. Jelle was the organizer of the SMART CS events from 2011 until 2015.

Jelle Zuidema

“I started my studies with two programs in parallel at the University of Utrecht: Liberal Arts – where I focused on Literature – and Artificial Intelligence. In my final two years I dropped Liberal Arts, because I decided I needed to specialize; I got my degree in AI, with a specialization in Theoretical Biology. My thesis was on Evolution of Language, so it was a rather weird mix. I was first interested in evolution, and then my supervisor suggested: since you have this background in computational linguistics and logic, why don’t you look at the evolution of language. So it was a bit accidental, but immediately things started to fall into place, and I got really excited about the topic, and decided that I wanted to do my PhD on that as well. For my PhD I moved first briefly to Paris, and then I was in Brussels for two years, in the group of Luc Steels. After two years Brussels I moved to Edinburgh, and I actually got my PhD degree from the University of Edinburgh in the group of Simon Kirby.”

“When I moved to Edinburgh, my supervisor suggested that I should make use of the fact that they have some of the best theoretical linguists in the world. I ended up sitting in in a couple of classes, one of which was ‘Theoretical Linguistics’, taught by Mark Steedman. We had to read the basics of Chomsky, Shannon, Ross, and a bit of Steedman himself –the history of information theory and grammar formalisms. This course was very influential for the questions that I started to ask myself later. Computational linguistics is very much dominated now by the statistical, machine learning-based approach: trying to learn from data. In that field I am in the tiny little corner where people actually take some inspiration from linguistics, about hierarchical structures for example. As a consequence, linguists group us together with the machine learning people, and the machine learning people group us together with the linguists. We are getting fire from both sides.”

“A lot of people think that there is a contradiction between statistical learning and hierarchical structure. That is why I felt that I found my natural home when I moved to Amsterdam to work at the ILLC. The computational linguists at the ILLC have this long tradition of working on data oriented parsing, a model that nicely shows there is a continuum from what people typically think of as statistical learning – which is about computing statistics over neighboring words – to hierarchical structures – where people typically are said to ignore frequency information. However, there is no reason that we should ignore frequency information while making use of hierarchical structures. It turns out that you can define probability distributions over very complex objects, such as parse trees, or logical formulae. I think that this marriage of hierarchical modeling and probabilistic modeling is where the most interesting questions are. In our research, we often simplify on both sides. We do not have as fancy statistics as some of the machine learning people, and we do not have as complex symbolic hierarchical objects as the theoretical linguists. Instead, we try to find models that go in the right direction, by defining probabilities over relatively simple hierarchical trees for sentences. That allows us to study questions about language processing, about how we compute the meaning of sentences, and about language acquisition. For me, this offers a nice way out of the endless debates that we observe in linguistics, between people who are favoring statistics and people who are favoring complex models.”

“When I first started computational modeling, I got really excited about it, because you suddenly realize that a lot of thinking that you have been doing before was very imprecise. The first computational model really is a shocking experience: how much more precision you have to give, how many more detailed questions you suddenly have to answer. I was looking at evolution, and people do a lot of thinking about evolution; they talk for example about mutations and selection. But then you start thinking: what is a mutation exactly? It means that by chance, something changes to something else. Fair enough; but what exactly was changing, and with which probability? Suddenly you have to account for all these details. You have to come up with parameters that quantify exactly the probability that something will change into something else. And it turns out that this really matters for what comes out of the model. When you first experience this world of detail that you were overlooking with verbal theorizing, then you want to spread the word.”

“So, do we simplify a lot? Yes we are, but everyone else is. Everyone in science is making ridiculous simplifications. I think one of the great advantages of the approach I am advocating is that the simplifications we make are for a large part very explicit. That means that it is easy for people to criticize. When I build a parser that processes a sentence and computes its hierarchical structure, and I show this to linguists, they start complaining: ‘Yeah but this is 1970s Chomskyan theory, we do not believe in noun phrases anymore’. There are a lot of comments on the symbols that we use, and on these very simple tree structures that these parsers work with. But these people do not realize that when they build a fancy modern parse tree, they often do not specify how all of these decisions are made that lead people to actually come up with that parse tree. For me, it is all summarized by this quote from Cavalli-Sforza and Feldman: ‘Verbal theories avoid the charge of oversimplification only at the expense of ambiguity’. So, yes, we simplify. And we are proud of it. And we are proud that everyone can see where we simplify.”

“From the very beginning of my studies I was already somewhat surprised that there was not more interaction between cognitive science and art. I was always ‘primed’ by my cognitive science courses when I was taking literature courses. For me, the most interesting thing about literature really is that when people are reading a novel – which is really just black ink on a white background – people can get so lost in a completely different, counterfactual world. Literature is a technology that is manipulating people into having feelings and having thoughts about worlds that do not exist. I think this is utterly amazing, and I really think we all should want to understand how that could work. Something similar happens with music. The influence that music – which is really just sound waves – can have on the emotional system of people is amazing. It is astonishing how this balance you find in music between simplicity and complexity, between predictability and unpredictability, somehow has found this direct access to the emotional centers in our brains. I think that music and literature, and this is true for the other arts as well, are some of the most fascinating systems from a cognitive science point of view that you can think of. And I do not think that enough people are studying it from that angle. I really believe that there is a very big role to play there for the humanities, in studying the properties of language, art and music, and use them as sort of ‘extreme cases’: as cases that are extremely revealing about workings of the human mind.”

“There is a very practical reason why SMART as an initiative exists, and that is that the government wants universities to specialize, to become more different from each other. The government told the universities that they should decide on research focus areas. Hence, the universities told the faculties that they should decide on research focus areas. The Faculties of Humanities – and in many other faculties, actually – thought it was a good idea to choose Brain and Cognitive Science Henkjan Honing and Kees Hengeveld, who were the ‘trekkers’, as it is called, were looking for someone who knew the research at ACLC and at ILLC, where most of the cognitive science at the humanities faculties is happening, and could somehow bridge these two research institutes. Then they asked me. My thought was that if you are to do something like this, you need a good brand name. So I came up with the name SMART Cognitive Science, which is an acronym for Speech & language, Music, Art, Reasoning & Thought. But there is also a teaser: it contrasts with ‘expensive cognitive science’ that is happening in other faculties a lot. This is what the humanities are good at: asking very smart questions about cognition. I think it is important that cheap but good research in cognitive science is also supported, and that not all the money goes to people with expensive fMRI or MEG machines.”

“For now, what I am most interested in is to try to make a bridge between language at the behavioral level, as we study it in linguistics, and at the neural level. I am working on this as part of this big national initiative ‘Language in Interaction’. The idea that I and some other people in the world are pursuing is that the structure of language, the kind of computations that you need to be able to process language, are really revealing about the underlying neural implementation. What we are exploring is whether what we know about how language works puts constraints about what the possible neural implementations are. Neuroscience is a very advanced field at the level of single neurons. Cognitive neuroscience is also a very advanced field at the level of mapping the brain, of determining which structures in the brain are correlated with what kind of behaviors, also when it comes to what we call the language network. But when you ask how networks of neurons really support the computation of what sentences mean, or how sentences are structured, or how words are stored, or how the meaning of words are retrieved, then we really do not have a clue, not even how to start. What I am putting my bets on, which is a high-risk bet, is that the structure of language really reveals something fundamental about how the brain is organized. This is controversial both in linguistics and in neuroscience. So it is a high-risk high-gain kind of project. There are a few people in the world that share this intuition, but there are also a lot of people who think it is a waste of time. My experience is that these guiding intuitions at the very least help to focus your research, and help you discover interesting things on the way, even if you never reach the endpoint. It is a little bit like ‘you aim for the stars, and you might reach the moon’.”

“Even though much of my research is now focused on how we process language, on how we compute the meaning of sentences, the ultimate question is that I try to understand is: What makes human language unique. I am still motivated by these old questions that I did my Master’s and PhD on of how the crucial difference between chimpanzees, bonobos and humans emerged in evolution. Richard Lewontin has this nice quote: “On the average, chimpanzees and humans are very similar to each other at the level of genes and proteins, but they differ radically in their ability to write books about each other”. That is what I am trying to understand. I really think that there is a good possibility that the answer is somewhere in the neural code. It is a bit similar to the genetic code. There is this extremely interesting history of the discovery of DNA. One of the things that are so intriguing about this process, is that lots of people worldwide had results that almost revealed what was happening. And then, when it was finally discovered, by Watson and Crick, everything fell into place. And all of these extremely puzzling results suddenly started to make sense with the discovery of the double helix, and this universal code for all life on earth. I think there is a possibility – maybe it is a bit wishful thinking –that there is something similar for language, something about how language is encoded in the brain. There are so many questions that are puzzling us now: how can it be that no other species has language? How can it be that no species has a little bit of language (either you have it or not)? How can language be so different from everything else on the planet, while our neurons and proteins are so similar to those of other species? All these sort of weird paradoxes that we observe might fall into place once we have cracked the code. But maybe people in 100 years will laugh at me, because it is just as much an enigma as it is now. But I am optimistic. And I think I am optimistic both because I believe in it, and because it is a good research strategy to be optimistic.”

Phong Le’s PhD defense

On June 3d, my PhD student Phong Le successfully defended his PhD thesis, entitled “Learning Vector Representations for Sentences – The Recursive Deep Learning Approach” (committee members Max Welling, Mirella Lapata, Marco Baroni, Raquel Fernandez, Ivan Titov).

***

Learning Vector Representations for Sentences – The Recursive Deep Learning Approach
Phong Lê

Abstract:

Natural language processing (NLP) systems, until recently, relied heavily on sophisticated representations and carefully designed feature sets. Now with the rise of deep learning, for the first time in the history of NLP, the importance of such manual feature engineering has started to be challenged. Deep learning systems using very few handcrafted features can achieve state-of-the-art (or nearly state-of-the-art) performance on many tasks, such as syntactic parsing, machine translation, sentiment analysis, and language modelling. However, rather than letting deep learning replace linguistically informed approaches, in this dissertation I explore how linguistic knowledge can provide insights for building even better neural network models. I tackle the problem of transforming sentences into vectors by employing a hybrid approach of symbolic NLP and connectionist deep learning based on the principle of compositionality. In this approach, the role of symbolic NLP is to provide syntactic structures whereas composition functions are implemented (and trained) by connectionist deep learning.

All of the models I develop in this dissertation are variants of the Recursive neural network (RNN). The RNN takes a sentence, syntactic tree, and vector representations for the words in the sentence as input, and applies a neural network to recursively compute vector representations for all the phrases in the tree and the complete sentence. The RNN is a popular model because of its elegant definition and promising empirical results. However, it also has some serious limitations: (i) the composition functions it can learn are linguistically impoverished, (ii) it can only be used in a bottom-up fashion, and (iii) it is extremely sensitive to errors in the syntactic trees it is presented with. Starting with the classic RNN, I propose extensions along three different directions that solve each of these problems.

The first direction focuses on strengthening the composition functions. One way to do that is making use of syntactic information and contexts, as in Chapter 3. In that chapter, I propose composition functions, which are also one-layer feed-forward neural networks, taking into account representations of syntactic labels (e.g. N, VP), context words, and head words. Another way is to replace one-layer neural networks by more advanced networks. In Chapter 6, based on empirical results which show that the Long short term memory (LSTM) architecture can capture long range dependencies and deal with the vanishing gradient problem more effectively than Recurrent neural networks, I introduce a novel variant of the LSTM, called Recursive-LSTM, that works on trees. Empirical results on an artificial task and on the Stanford Sentiment Treebank confirm that the proposed Recursive-LSTM model is superior to the classic RNN model in terms of accuracy. Furthermore, in Chapter 7, I demonstrate how a convolutional neural network can be used as a composition function.

The second direction to extend the classic RNN is to focus on how information flows in a parse tree. In traditional compositional semantics approaches, including the RNN model, information flows in a bottom-up manner, leading to a situation where there is no way for a node to be aware of its surrounding context. As a result, these approaches are not applicable to top-down processes such as several top-down generative parsing models, and to problems requiring contexts such as semantic role labelling. In Chapter 4, I propose a solution to this, namely the Inside-Outside Semantic framework, in which the key idea is to allow information to flow not only bottom-up but also top-down. In this way, we can recursively compute representations for the content and the context of the phrase that a node in a parse tree covers. The Inside-Outside RNN model, a neural-net-based instance of this framework, is shown to work well on several tasks, including unsupervised composition function learning from raw texts, supervised semantic role labelling, and dependency parsing (Chapter 5).

The third direction is dealing with the uncertainty of the correct parse. As a result of relying on the principle of compositionality, compositional semantics uses syntactic parse trees to guide composition, which in turn makes compositional semantics approaches vulnerable to the errors of automatic parsers. The problems here are that automatic parsers are not flawless, and that they are not aware of domains to which they are applied. To overcome this problem, in Chapter 7, I propose the Forest Convolutional Network model, which takes as input a forest of parse trees rather than a single tree as in traditional approaches. The key idea is that we should give the model several options and let it select (or combine) ones that best fit its need. Empirical results show that the model performs on par with state-of-the-art models on the Stanford Sentiment Treebank and on the TREC question dataset.

The dissertation thus proposes solutions to the main shortcomings of the RNN model. It provides all components for a completely neural implementation of a syntacticsemantic parser: the three ideas above essentially yield a neural inside-outside algorithm. This represents an approach to NLP that combines the best of two worlds: all the flexibility and learning power of deep learning without sacrificing the linguistic adequacy of earlier approaches in computational linguistics.

***

Summary in Dutch:

Het leren van vector-representaties van zinnen – de ‘recursive deep learning’-aanpak
Phong Lê

Samenvatting:

Systemen voor taalverwerking per computer waren tot voor kort grotendeels gebaseerd op complexe, symbolische representaties en, voor zover ze gebruik maken van machinaal leren, toch afhankelijk van met de hand geselecteerde lijstjes van kenmerken. Met de opkomst van ‘deep learning’ is het, voor het eerst in the geschiedenis van het vakgebied, mogelijk geworden om ook die kenmerk-selectie te gaan automatiseren. In de afgelopen jaren hebben we succesvolle deep learning-systemen zien verschijnen die nauwelijks of geen handmatige kenmerk-selectie behoeven en toch bij de best presterende systemen behoren op taken zoals automatisch ontleden, automatisch vertalen, sentiment-analyse en woordvoorspelling.

Die successen betekenen echter niet dat we alle taalkundig ge ̈ınformeerde benaderingen nu aan de kant moeten schuiven. In dit proefschrift exploreer ik op welke manier taalkundige kennis ingezet kan worden om nog betere neurale netwerk-modellen van taal te kunnen bouwen. Ik pak de uitdaging op om vector-representaties voor zinnen uit te rekenen op basis van een hybride symbolisch-connectionistische benadering, uitgaande van het zogeheten compositionaliteitsbeginsel. In mijn aanpak levert de symbolische traditie de syntactische structuur van zinnen, maar gebruik ik neurale netwerken om representaties van woorden, combinaties van woorden en zinnen te leren.

Alle modellen die ik uitwerk in dit proefschrift zijn varianten van het Recursive Neural Network (RNN). Een RNN neemt een zin, een syntactische boom en vectorrepresentaties van de woorden in die zin als input. Vervolgens gebruikt het model een neuraal netwerk om recursief representaties uit te rekenen voor combinaties van woorden, beginnend bij de combinaties van woorden die volgens de syntactische boom een frase vormen, en eindigend met een representatie voor de hele zin. Het RNN is een populair model vanwege de elegante definitie en veelbelovende empirische resultaten. Het model heeft echter ook heel duidelijke beperkingen: (i) de compositie-functies die het leert zijn taalkundig defici ̈ent; (ii) het model kan alleen in een bottom-up richting worden toegepast; (iii) het model is extreem gevoelig voor fouten in de aangeboden syntactische bomen. Met het standaard RNN-model als startpunt stel ik daarom een uitbreidingen voor in drie richtingen als oplossingen voor elk van deze drie problemen.

Het eerste type uitbreidingen betreft het verbeteren van de compositie-functies. E ́en manier om dat te doen is om gebruik te maken van syntactische en context-informatie, zoals ik dat doe in hoofdstuk 3. De compositie-functies in dat hoofdstuk zijn nog steeds zogeheten ‘one-layer feedforward’-netwerken, maar er is een apart netwerk voor iedere combinatie van syntactische categorie ̈en en ‘heads’. Een andere manier is om die eenvoudige netwerken te vervangen door complexere. In hoofdstuk 6 rapporteer ik resultaten waaruit blijkt dat het zogeheten Long Short Term Memory-netwerk (LSTM) effectiever omgaat met lange afstandsafhankelijkheden en het ‘vanishing gradient’probleem dan de veelgebruikte recurrente netwerken. Ik werk in dat hoofdstuk een nieuwe variant van het LSTM uit, het ‘Recursive LSTM’, dat werkt met syntactisch bomen. Empirische resultaten op een kuntmatige taak en op de Stanford Sentiment Treebank laten zien dat dit nieuwe model veel accurater is dan het standaard RNN. In hoofdstuk 7 laat ik tenslotte zien dat ook zogeheten convolutional neural networks succesvol gebruikt kunnen worden om de compositie-functie mee te implementeren.

Het tweede type uitbreidingen betreft de manier waarop informatie stroomt door een syntactische boom. In klassieke compositionele semantiek-benaderingen, waaronder ook de RNN, is die informatie-stroom strikt bottom-up, waardoor een knoop in zo’n boom geen toegang heeft tot informatie over de context van een zin. Zulke benaderingen zijn daarom moeilijk te combineren met technieken die top-down werken, zoals verschillende populaire statistische modellen voor automatisch ontleden, of technieken die gebruik maken van context-informatie, zoals populaire modellen voor semantische rolbepaling. In hoofdstuk 4 stel ik een oplossing voor voor deze problemen, onder de naam ‘Inside-Outside Semantics framework’, waar het centrale idee is dat informatie zowel bottom-up als top-down moet kunnen stromen. Ik stel voor om voor elke knoop in een syntactische boom twee representaties te berekenen (via recursieve definities): een ‘content representation’ voor het corresponderende deel van de zin die bottom-up wordt berekend, en een ’context representation’ die top-down wordt bepaald. Ik laat zien, in hoofdstuk 5, dat een neurale netwerk-implementatie van dit idee heel goed werkt op een reeks van verschillende taken, inclusief ‘unsupervised composition function learning’, ‘semantic role labeling’ en ‘dependency parsing’.

Het derde type uitbreidingen betreft de omgang met onzekerheid over de juiste syntactische ontleedboom. Ontleedbomen zijn een cruciaal onderdeel van alle modellen in deze dissertatie, omdat volgens het compositionaliteitsbeginsel de syntactische structuur bepaalt welke semantische composities worden uitgevoerd, en op welk moment. Dat maakt de aanpak gevoelig voor fouten in de ontleedbomen. Dergelijke fouten worden onvermijdelijk door automatische ontleedprogramma’s ge ̈ıntroduceerd, omdat die programma’s binnen het domein waar ze voor zijn ontwikkeld al niet foutloos opereren, maar bovendien in veel gevallen buiten dat domein worden ingezet. Om dit probleem het hoofd te bieden stel ik in hoofdstuk 7 het ‘Forest Convolutional Network’ voor, dat in plaats van een enkele ontleedboom een grote verzameling bomen, een zogeheten ‘parse forest’, als input krijgt. Het idee achter dit model is dus dat het model uit een variatie aan mogelijkheden de syntactisch structuur kiest (of samenstelt) die het beste past bij de waar het model voor wordt geoptimaliseerd. De empirische resultaten laten zien dat het resulterende model tot de best beschikbare modellen behoort op twee populaire taken: de ‘Stanford Sentiment Treebank’-taak en de ’TREC vraag-classificatie’-taak.

In dit proefschrift beschrijf ik dus concrete oplossingen voor de belangrijkste tekortkomingen van het RNN-model. Daarmee bevat dit proefschrift alle ingredi ̈enten voor een volledige neurale implementatie van een syntactisch-semantische parser: de drie beschreven uitbreidingen komen neer op een neurale versie van het ‘inside-outside’algoritme. De aanpak in dit proefschrift biedt daarmee het beste van twee werelden: de enorme flexibiliteit en zelflerende kracht van ‘deep learning’, zonder de taalkundige principes en uitdrukkingskracht van eerdere benaderingen in de computationele taalkunde op te geven.