More Structure for Better Statistical Machine Translation?
The workshop took place on 20 January 2012.
We wanted an informal, discussion-driven mini-workshop. The workshop attracted 42 participants including the speakers and local researchers. Most participants came from neighboring countries (including Germany, Switzerland, UK, Ireland, Belgium, France, Czech Republic). Informal programming and priority to open discussion were the main characteristics of the workshop.
Workshop Date 20 January 2012 (9:30-17:00).
Registration CLOSED The workshop is open and free but pls register by sending an email to k.simaan at uva.nl.
|Program (titles, abstracts t.b.a.)
How do we model the latent translation mapping between sentence pairs in parallel corpora? While earlier approaches aimed at compositional translation with a central role to monolingual syntax (in analogy to syntax-directed semantics), more recent statistical approaches show that the answer to this question is not clear-cut. On the one hand, translation data often seems resistant to compositional translation over monolingual syntactic structure. On the other, hierarchical and monolingual syntax offer certain advantages, particularly for target language modeling, models of reordering and phrase pair composition.
Current hierarchical approaches employ syntactically-enriched hierarchical phrase pairs extracted from word alignments and cast as a synchronous grammar. Compositionality (which was originally meaning-oriented) is reduced to a syntactic tool available in the model but considered latent in the word aligned training data and encapsulated inside the extracted phrase pairs. A few equations summarize the assumptions made:
- Translation equivalence == Word alignment
- Translation equivalence units == Hierarchical phrase pairs w/o monolingual syntax/semantic roles
- Composing translation units == Reordering phrases w/o monolingual syntax/semantic roles
In the intimate atmosphere of a small workshop/symposium we bring together different perspectives on SMT to discuss these and related matters such as: How do we learn translation equivalence patterns from monolingual and parallel data, and how are individual patterns composed together effectively? Are there any gains to be had from climbing the translation pyramid towards semantic conceptualization of the translation process?
Khalil Sima'an, ILLC, UvA
Markos Mylonakis, ILLC, UvA
Remko Scha, ILLC, UvA