More Structure for Better Statistical Machine Translation?

The workshop took place on 20 January 2012.
We wanted an informal, discussion-driven mini-workshop. The workshop attracted 42 participants including the speakers and local researchers. Most participants came from neighboring countries (including Germany, Switzerland, UK, Ireland, Belgium, France, Czech Republic). Informal programming and priority to open discussion were the main characteristics of the workshop.

Workshop Date
20 January 2012 (9:30-17:00).


Room P.017
Euclides Building
University of Amsterdam
Plantage Muidergracht 24
1018 TV Amsterdam

Contact Khalil Sima'an by email k.simaan at

Registration CLOSED The workshop is open and free but pls register by sending an email to k.simaan at
Program  (titles, abstracts t.b.a.)
  • 09:30-10:00  Coffee and gathering
  • 10:00-10:15  Opening
  • 10:15-10:45  Kevin Knight, Information Sciences Institute, University of Southern California
  • 10:45-11:15  Bill Byrne, Speech Research Group, University of Cambridge
  • 11:15-11:30  Discussion
  • 11:30-11:45  Coffee break
  • 11:45-12:15  Phil Blunsom, Comp. Ling. Group, Computing Laboratory, University of Oxford
  • 12:15-12:45  Markos Mylonakis, ILLC, Universiteit van Amsterdam
  • 12:45-13:00  Discussion
  • 13:00-14:15  Lunch time
  • 14:15-14:45  Dekai Wu, HLT Center, Hong Kong University of Science & Technology
  • 14:45-15:15  Khalil Sima'an, ILLC, Universiteit van Amsterdam
  • 15:15-15:45  Discussion and closing
How do we model the latent translation mapping between sentence pairs in parallel corpora? While earlier approaches aimed at compositional translation with a central role to monolingual syntax (in analogy to syntax-directed semantics), more recent statistical approaches show that the answer to this question is not clear-cut. On the one hand, translation data often seems resistant to compositional translation over monolingual syntactic structure. On the other, hierarchical and monolingual syntax offer certain advantages, particularly for target language modeling, models of reordering and phrase pair composition.

Current hierarchical approaches employ syntactically-enriched hierarchical phrase pairs extracted from word alignments and cast as a synchronous grammar. Compositionality (which was originally meaning-oriented) is reduced to a syntactic tool available in the model but considered latent in the word aligned training data and encapsulated inside the extracted phrase pairs. A few equations summarize the assumptions made:

 - Translation equivalence == Word alignment
 - Translation equivalence units == Hierarchical phrase pairs w/o monolingual syntax/semantic roles
 - Composing translation units == Reordering phrases w/o monolingual syntax/semantic roles

In the intimate atmosphere of a small workshop/symposium we bring together different perspectives on SMT to discuss these and related matters such as: How do we learn translation equivalence patterns from monolingual and parallel data, and how are individual patterns composed together effectively?  Are there any gains to be had from climbing the translation pyramid towards semantic conceptualization of the translation process?
Khalil Sima'an, ILLC, UvA
Markos Mylonakis, ILLC, UvA
Remko Scha, ILLC, UvA