Our research concentrates on statistical models for structured language processing with application to machine translation, paraphrasing, semantic and morpho-syntactic parsing, and statistical learning for NLP .

Machine Translation is currently a central embedding application for our work. We aim at a range of phenomena in the quest for more adequate and more fluent MT systems learned from bilingual parallel data, including reordering, morphological variation, domain adaptation, evaluation and tuning. 

The general approach we take aims at inducing the latent structure that represents relevant salient regularities in natural language data (mono- and multi-lingual corpora) for improved language applications. Our current work concentrates on
  • exploiting regularities in word aligned parallel data for learning hierarchical reordering models over permutations and word alignments, e.g., learning hierarchical preordering from bare word alignments.
  • inducing models sensitive to domain variation in big parallel data, e.g., data selection, word alignment, model adaptation.
  • devising better MT evaluation metrics, e.g., BEER.
  • inducing novel semantic representations within meaning-preserving language processing models as a surrogate for actual semantic representations that work with a form and its referent.
  • inducing morpho-syntactic generation models for translation into morphologically-rich languages.

Khalil Sima'an (Vici NWO, ILLC, UvA)


Gideon Wenniger (NWO, Open Comp).
Jun 2010 - Jun 2014 PhD student
Alignment and Hierarchical SMT
Sophie Arnoult (NWO, Open Comp).
Aug 2012 - Aug 2017
PhD student
TAG and Hierarchical SMT
Milos' Stanojevic' (STW  DatAptor) Mar 2013 - Mar 2017 PhD student MT and Hierarchical Alignments
Hoang Cuong (EXPERT ITN)
Oct 2013 - Oct 2017
PhD student
Hierarchical MT with TMs 
Joachim Daiber (EXPERT ITN)
Oct 2013 - Oct 2017
PhD student
Hierarchical MT with TMs
Philip Schulz (Vici)
Nov 2013 - Nov 2017
PhD student
MT and Meaning Preserving Models
Aaron Li-Feng Han
Sep 2014 - Sep 2017
Research Internship
MT Evaluation
Amir Kamran (STW DatAptor)
Jan 2014 - Jun 2016
MT and Domain Adaptation
Wilker Aziz (Vici)
Jan 2015 - 1 Jan 2018 Postdoc
Bushra Jawaid (STW DatAptor)
Jan 2015 - 1 Jan 2016 Researcher
SMT adaptation
Joost Bastings (Vici)
Jan 2015 - Dec 2018
PhD student SMT and meaning preservation
.Christos Christos Louisos
Nov 2014 - Oct 2015
SMT and adaptation
Stella Frank (EC QT21)
Jan 2015 - Dec 2017
SMT, learning, morpho-syntax and semantics
Vacancy (Vici)
SMT and meaning preservation

Bart Mellebeek (STW  DatAptor) Jan 2013 - Oct 2014 Postdoc MT and Domain Adaptation

Ongoing Research projects
  • Grant 2014 QT21, H2020 Cracking the language barrier (3year x 1fte Researcher; Co-applicant and PI UvA, Coordinator DFKI, Germany). 
  • Grant 2013 VICI NWO. Machine Translators: Teaching Computers to Translate Using their own Words (Euro 1.5 million; PI).
  • Grant 2012  Marie Curie ITN project EXPERT (2 PhD positions; Co-applicant and PI UvA, Coordinator University of Wolverhampton/Sheffield)
  • Grant 2012 STW (Technology Foundation) project DatAptor (Euro 750k; PI)
  • Grant 2012 Free Competition of NWO Exact Sciences Board, Statistical Translation of Novel Constructions (Euro 230k; PI)

Concluded projects

Software and Data Packages

Former Ph.D. students
  • Gideon Maillette de Buy Wenniger: graduated 10 June 2016, now postdoc at DCU
  • Markos Mylonakis (UvA, NWO VIDI): graduated 19 January 2012, Xerox Research Centre Europe, now
  • Reut Tsarfaty (UvA, NWO MOSAIEK project): graduated 24 March 2010
  • Hany Hassan co-supervision together with  Andy Way  at Dublin City University, Dublin, Ireland.  Now at Microsoft Research Redmond.


Former M.Sc. students

Where and When
After graduation went to...
Jakub Zavrel U. Utrecht, 1995/6 Vector-Space Models for Parsing TextKernel (co-founder)
Jorn Veenstra U. Utrecht, 1995/6 - with Joos Kok
Head Correlation Detection for Syntactic Analysis
Vera Hollink UvA 2002 - with Henk Zeevat   Anaphora Resolution by Probabilistic Parsers
UvA, Amsterdam (PhD student, graduated)
Luciano Buratto UvA 2002    DOP Estimation by Backoff Smoothing            MoL-2002-07 U. of Warwick, UK (PhD student, graduated)
Oren Tsur UvA 2003 - with Maarten de Rijke
QA and Learning Bibliography Classifiers       MoL-2003-06 Hebrew University, Israel (PhD student, graduated)
Andreas Zollmann UvA 2004 - with Detlef Prescher A Consistent and Efficient Estimator for DOP  MoL-2004-02 CMU, USA (PhD student, graduated)
Roy Bar-Haim
Technion 2004/5 - with Y.Winter+ A. Itai
Probabilistic Methods for Hebrew Morphological Analysis     U. of Bar-Ilan, Israel (PhD student, graduated)
Thuy Linh Nguyen UvA 2004 Rank-Consistency and DOP Estimation CMU, USA (PhD student, graduated)
Felix Hageloh
UvA 2006
Simulating Collins'97 model using Treebank Transforms and PCFGs
Softwar Engineer
Markos Mylonakis
UvA 2006/7
Bi-directional Noisy-Channel Estimators (Bi-EM)
UvA, ILLC (PhD student, graduated)
Barbara Plank
UvA 2007
Parsing with Domain-awareness
Groningen University (Ph.D. Student, graduated)
Saib Mansour
Technion 2008 withA. Itai and Yoad Winter Segmentation and POS tagging for Arabic and Hebrew
RWTH Aachen (Ph.D. student, graduated)
Sanne Korzec
UvA 2010
Phrase probability estimation in SMT
Sophie Arnoult
UvA 2011
Adjuncts in Statistical Machine Translation
Univ. of Amsterdam, PhD student
Joost Bastings
UvA 2012
Parsing with graph symbols
SAAB Sweden Consultant
Katya Garmash
UvA 2012
Paraphrasing and SMT (ongoing)
University of Amsterdam, PhD student
Dieuwke Hupkes
UvA 2013
Translation equivalence and Syntactic Structure
University of Amsterdam, Research Assistant