Neural Machine Translation systems have difficulty translating idiom expressions. There is limited work on building a parallel corpus annotated with idioms, which is necessary to investigate this problem more systematically. We automatically build a new bilingual data set for idiom translation extracted from an existing general-purpose German↔English parallel corpus.

You can find the published work here.

This example showcases the problem of idiom translation:

We automatically build a new bilingual data set for idiom translation extracted from an existing general-purpose German↔English parallel corpus.

The statistics of the data:

The data is available here.

To cite this paper:

   @ARTICLE{2018arXiv180204681F,
author = {Fadaee, Marzieh  and  Bisazza, Arianna  and  Monz, Christof},
title = "{Examining the Tip of the Iceberg: A Data Set for Idiom Translation}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1802.04681},
primaryClass = "cs.CL",
keywords = {Computer Science - Computation and Language},
year = 2018,
month = feb,