![]() ![]() This is a separate research problem, to which we do not have a solution yet. You might not get good results by training/fine-tuning on a few minutes of a single speaker.Please read the following before you raise an issue: Training on other datasets might require modifications to the code. You can also set additional less commonly-used hyper-parameters at the bottom of the hparams.py file. Look at python wav2lip_train.py -help for more details. In both the cases, you can resume training as well. The arguments for both the files are similar. To train with the visual quality discriminator, you should run hq_wav2lip_train.py instead. Python wav2lip_train.py -data_root lrs2_preprocessed/ -checkpoint_dir -syncnet_checkpoint_path See here for a few suggestions regarding training on other datasets. The Wav2Lip model without GAN usually needs more experimenting with the above two to get the most ideal results, and sometimes, can give you a better result as well.You might get better, visually pleasing results for 720p videos than for 1080p videos (in many cases, the latter works well too). ![]() Why? The models are trained on faces which were at a lower resolution. Experiment with the -resize_factor argument, to get a lower resolution video.Use the -nosmooth argument and give another try. If you see the mouth position dislocated or some weird artifacts such as two mouths, then it can be because of over-smoothing the face detections.You might need to increase the bottom padding to include the chin region. Experiment with the -pads argument to adjust the detected face bounding box.The audio source can be any file supported by FFMPEG containing audio data: *.wav, *.mp3 or even a video file, from which the code will automatically extract the audio. You can specify it as an argument, similar to several other available options. The result is saved (by default) in results/result_voice.mp4. Python inference.py -checkpoint_path -face -audio Lip-syncing videos using the pre-trained models (Inference) Weights of the visual disc trained in a GAN setup ![]() Slightly inferior lip-sync, but better visual quality ![]() Alternative link if the above does not work.
0 Comments
Leave a Reply. |