Open library

This portal has been archived. Explore the next generation of this technology.

On Using Monolingual Corpora in Neural Machine Translation

lib:f8e24710b2811966 (v1.0.0)

Authors: Caglar Gulcehre,Orhan Firat,Kelvin Xu,Kyunghyun Cho,Loic Barrault,Huei-Chi Lin,Fethi Bougares,Holger Schwenk,Yoshua Bengio
ArXiv: 1503.03535
Document: PDF DOI

Abstract URL: http://arxiv.org/abs/1503.03535v2

Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation. Arguably, one of the major factors behind this success has been the availability of high quality parallel corpora. In this work, we investigate how to leverage abundant monolingual corpora for neural machine translation. Compared to a phrase-based and hierarchical baseline, we obtain up to $1.96$ BLEU improvement on the low-resource language pair Turkish-English, and $1.59$ BLEU on the focused domain task of Chinese-English chat messages. While our method was initially targeted toward such tasks with less parallel data, we show that it also extends to high resource languages such as Cs-En and De-En where we obtain an improvement of $0.39$ and $0.47$ BLEU scores over the neural machine translation baselines, respectively.

Relevant initiatives

Related knowledge about this paper

Search on this portal

Reproduced results (crowd-benchmarking and competitions)

Artifact and reproducibility checklists

Common formats for research projects and shared artifacts

Collective Knowledge (organizing research projects based on FAIR principles)

Reproducibility initiatives

Comments

Please log in to add your comments!

If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!

On Using Monolingual Corpora in Neural Machine Translation

Relevant initiatives Hide

Comments Hide

Relevant initiatives

Comments