Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion

lib:43fbe1944f783534 (v1.0.0)

Authors: Orhan Ocal,Oguz H. Elibol,Gokce Keskin,Cory Stephenson,Anil Thomas,Kannan Ramchandran
ArXiv: 1905.03864
Document:  PDF  DOI 
Abstract URL: https://arxiv.org/abs/1905.03864v1


We present a method for converting the voices between a set of speakers. Our method is based on training multiple autoencoder paths, where there is a single speaker-independent encoder and multiple speaker-dependent decoders. The autoencoders are trained with an addition of an adversarial loss which is provided by an auxiliary classifier in order to guide the output of the encoder to be speaker independent. The training of the model is unsupervised in the sense that it does not require collecting the same utterances from the speakers nor does it require time aligning over phonemes. Due to the use of a single encoder, our method can generalize to converting the voice of out-of-training speakers to speakers in the training dataset. We present subjective tests corroborating the performance of our method.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!