Priors for the estimation of probabilistic grammars from incomplete natural language data.

VIDI Program (2007-2011), Financed by NWO Exact Sciences and Faculty of Exact Sciences at the UvA)

  1. Khalil Sima'an (project leader)  
  2. Markos Mylonakis (Ph.D. student)
  3. Tejaswini Deoskar (part time post doctoral researcher - two years)
  4. Maxim Khalilov (full time post doctoral researcher - two years)

Affiliated researchers: Reut Tsarfaty.

External collaborators: Hany Hassan , Barbara Plank and Andy Way.

The VIDI program started in January 2007 and will continue until the end of December 2011.

Short description:
The VIDI program aims at studying the kinds of prior knowledge that we have regarding
mono- and multi-lingual (mainly parallel) natural language data for automatically inducing
structure that can capture unobserved regularities and thereby improve performance on tasks
such as machine translation and parsing. The prior knowledge could account for general properties
of the data or of the task. We consider both linguistic and non-linguistic kinds of knowledge.

Examples of research questions that fall within this category are: (1) inducing structure for a
generalized and better phrase-based translation model, (2) inducing ``meaning-flavoured"
syntactic structure from parallel data, (3) ``Exporting" syntactic structure from one
language to another using a parallel corpus and knowledge/assumptions about the way translation
preserves meaning, (4) inducing the structure of monolinual data as a distribution over different
subdomains and exploiting that for better adaptation to new domains, (5) Exploiting negative
examples in guiding the induction process in the preceding problems.

Project Publications and other output (2007--Now)