Statistical Language Processing and Learning Lab.

Part of Language and Computation, ILLC, FNWI, UvA
Dr Khalil Sima'an

Our research concentrates on structured language processing with application to statistical machine translation and
morpho-syntactic parsing .
Within machine translation we aim at data-driven, linguistically-enriched models that can statistically learn adequate and fluent MT systems from bilingual parallel data.

In technical terms, our work aims at inducing and exploiting the latent structure that represents relevant salient regularities in natural language data (mono- and multi-lingual corpora) for improved language applications. Sample recent topics (see, e.g., publications or the individual pages of the group members):
  • hierarchical alignment structure and learning synchronous reordering models in machine translation (with Gideon Maillete de Buij Wenniger)
  • hierarchical statistical MT models with an eye to recursion and adjunction (with Sophie Arnoult)
  • effective syntax-based models for machine translation, including syntactic language models, syntactic and latent reordering models, synchronous grammar learning  (with H. Hassan (DCU), A. Way (DCU), M. Mylonakis and M. Khalilov),
  • representational and computational issues in parsing morphologically-rich languages (mainly Semitic - with Reut Tsarfaty, Yoad Winter (now U. Utrecht), Roy Bar-Haim (Technion) and Alon Itai (Technion)),
  • theoretical and computational aspects: algorithms, complexity and statistical learning for Data-Oriented Parsing (with R. Scha, R. Bod, A. Zollmann, L. Buratto and D. Prescher)
  • inducing rich syntactic lexica from a mix of unannotated and annotated data (with T. Deoskar and M. Mylonakis),

Team Members

Khalil Sima'an (ILLC, UvA)


Gideon Wenniger (NWO, Open Comp).
2010-14 PhD student
Alignment and Hierarchical SMT
Sophie Arnoult (NWO, Open Comp).
PhD student
TAG and Hierarchical SMT
Milos' Stanojevic' (STW  DatAptor) March 2013-March 2017 PhD student MT and Hierarchical Alignments
Bart Mellebeek (STW  DatAptor)
Jan 2013-Jan 2016 Postdoc
MT and Domain Adaptation
Hoang Cuong (EXPERT ITN)
Oct 2013- Oct 2017
PhD student
Hierarchical MT with TMs 
Joachim Daiber (EXPERT ITN)
Oct 2013- Oct 2017
PhD student
Hierarchical MT with TMs
Philip Schulz (Vici)
Nov 2013-Nov 2017
PhD student
MT and Meaning Preserving Models
Jinhua Du (STW DatAptor)
Dec 2013-Dec 2016
MT and Domain Adaptation

Alumni Ph.D. students

19 Jan. 2012
Markos Mylonakis (UvA, NWO VIDI): graduation 19 January 2012
Xerox Research Centre Europe
Mar. 2010
Reut Tsarfaty (UvA, NWO MOSAIEK project): graduated 24 March 2010 Uppsala University
Jan. 2009
Hany Hassan co-supervision together with  Andy Way  at Dublin City University, Dublin, Ireland.
Microsoft Research


Ongoing projects

List of funded projects

Concluded projects
  • Priors for Estimating Probabilistic Grammars from Incomplete Data  (NWO-EW - VIDI)  [2007-2011]
    Innovative Research Incentive, Netherlands Organisation for Scientific Research (NWO) - Exact Sciences
    (766 kEuro), PI=Khalil Sima'an
  • Learning Stochastic Tree-Grammars from Treebanks (LeStoGram) NWO-EW Open Competitie [2003-2006], PI=Khalil Sima'an (approx. 200 kEuro)
    Project was concluded in October 2006
  • Modern Hebrew and Arabic  Processing [2000-2011]  NWO-Mosiaek Reut Tsarfaty, PI=Khalil Sima'an (approx. 200 kEuro)
    People involved:
    Khalil Sima'an (UvA), Reut Tsarfaty (UvA) and Hebrew Project (Technion).
    Roy-Bar Haim (Technion, Haifa), Saib Mansour (Technion, Haifa)
    2005 Johns Hopkins University (JHU) Summer Workshop 2005 on Parsing Arabic Dialect.
    In March 2010, Tsarfaty defended her Ph.D. thesis.
  • Beyond Treebank Annotations: Ambiguity Resolution by Similarity-Based Performance Models Personal innovation grant (KNAW Fellowship, Royal Dutch Academy for Arts and Sciences), awarded in 2002. (approx 200 kEuro). PI=Khalil Sima'an.
    Project was concluded in 2003