Priors for the estimation of probabilistic grammars from
incomplete natural language data.
VIDI Program (2007-2011), Financed by NWO Exact Sciences and
Faculty of Exact Sciences at the UvA)
- Khalil Sima'an
Mylonakis (Ph.D. student)
- Tejaswini Deoskar
(part time post doctoral researcher - two years)
- Maxim Khalilov (full
time post doctoral researcher - two years)
Affiliated researchers: Reut Tsarfaty.
External collaborators: Hany
Hassan , Barbara Plank
and Andy Way.
The VIDI program started in January 2007 and will continue until the
end of December 2011.
The VIDI program aims at studying the kinds of prior knowledge that we
mono- and multi-lingual (mainly parallel) natural language data for
structure that can capture unobserved regularities and thereby improve
performance on tasks
such as machine translation and parsing. The prior knowledge could
account for general properties
of the data or of the task. We consider both linguistic and
non-linguistic kinds of knowledge.
Examples of research questions that fall within this category are: (1)
inducing structure for a
generalized and better phrase-based translation model, (2) inducing
syntactic structure from parallel data, (3) ``Exporting" syntactic
structure from one
language to another using a parallel corpus and knowledge/assumptions
about the way translation
preserves meaning, (4) inducing the structure of monolinual data as a
distribution over different
subdomains and exploiting that for better adaptation to new domains,
(5) Exploiting negative
examples in guiding the induction process in the preceding problems.
Publications and other output (2007--Now)