Authors: Artidoro Pagnoni,Kevin Liu,Shangyan Li
ArXiv: 1812.04405
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1812.04405v1
We explore the performance of latent variable models for conditional text
generation in the context of neural machine translation (NMT). Similar to Zhang
et al., we augment the encoder-decoder NMT paradigm by introducing a continuous
latent variable to model features of the translation process. We extend this
model with a co-attention mechanism motivated by Parikh et al. in the inference
network. Compared to the vision domain, latent variable models for text face
additional challenges due to the discrete nature of language, namely posterior
collapse. We experiment with different approaches to mitigate this issue. We
show that our conditional variational model improves upon both discriminative
attention-based translation and the variational baseline presented in Zhang et
al. Finally, we present some exploration of the learned latent space to
illustrate what the latent variable is capable of capturing. This is the first
reported conditional variational model for text that meaningfully utilizes the
latent variable without weakening the translation model.