loader image

How to Clone Voices with Machine Learning


The following tutorial is how I managed to install and use Corentin Jemine’s Real-Time Voice Cloning’s github repository to clone voices on Windows 10. You might have an easier and better experience installing the repository, I tried multiple options and this is the one that worked for me.

We start off by downloading the repository (zip file) from https://github.com/CorentinJ/Real-Time-Voice-Cloning. Next, we head over to https://docs.conda.io/en/latest/miniconda.html and download Miniconda2 for python 2.7. Select whatever OS works for you. If you are using Windows 10, you can just launch the .exe file and install it.

Next, we replace the python version with the command: conda install python=3.6.2

When I tried using python 3.8 or 3.7, I could not get the repository working. And when using conda version 3, I could not get the repository working with python 3.6.2 either, that’s why I turned to conda version 2.

Next, we head over to the unzipped Real-Time Voice Cloning folder. We have to install the following pre-requisites:

conda install pytorch

conda install ffmpeg

pip install -r requirements.txt

requirements.txt consists of the following packages: tensorflow==1.15.0; umap-learn; visdom; librosa>=0.5.1; matplotlib>=2.0.2; numpy>=1.14.0; scipy>=1.0.0; tqdm; sounddevice; SoundFile; Unidecode; inflect; PyQt5; multiprocess; numba==0.48

There were a view additional files I had to install that was missing on my Windows 10 machine, but this was really easy to sort out with a simple Google Search.

Next, we download the pretrained models zip file https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Pretrained-models which we extract into the encoder, vocoder and synthesizer folders separately.

Lastly, you need the datasets. Corentin has made available some datasets here: https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training#datasets. Because of my data cap, I could not manage to download the Vox Celeb datasets and had to resort to the LibriSpeech dataset, which in my opinion is not that great. After spending hours on trying to clone a voice, I can tell you that the LibriSpeech dataset is not really good. I hope in the near future I will be able to download the Vox datasets (I read that they work really well) and at that point I will update this post.

Once you have your dataset set up, you can now start with the voice cloning. You can start the GUI with the command:

python demo_toolbox.py -d /PATH/TO/REAL-TIME-VOICE-CLONING/FOLDER

This will launch the GUI for the voice cloning. Fair warning, there is no text-based operation for this repository, you need to export your display.

Once the GUI has launched, you now can either browse/load the reference audio file or you can record an audio file straight into the GUI. Next, you select your dataset, synthesizer, vocoder and encoder. Finally, type into the right-side box what you want the cloned voice to say (no more than 5 seconds) and end it off by clicking on synthesize and vocode. This will start the cloning.

And that’s it. If you have any questions related to the installation/usage, let me know in the comment section. I will try my best to answer you.

If you want to clone YOUR voice only, I suggest you rather use the website resemble.ai. It is totally free and is pretty easy to use.

How to Support Incrementum

Thank you very much for reading this post and visiting our website. If you would like to support Incrementum and keep it ad-free, you can donate via PayPal below.

Alternatively, you can help improve the website by purchasing equipment from our Amazon Wish List below.

Finally, you can purchase exclusive Incrementum merch here: https://www.redbubble.com/people/ICM-tabernam/shop and https://www.redbubble.com/people/ASSIDUO/shop


Enjoy the songs on the bottom bar of the website 🙂


Tagged: , ,