Skip to content

Multi-GPUS support#152

Open
MlWoo wants to merge 7 commits into
Rayhane-mamah:masterfrom
MlWoo:master
Open

Multi-GPUS support#152
MlWoo wants to merge 7 commits into
Rayhane-mamah:masterfrom
MlWoo:master

Conversation

@MlWoo

@MlWoo MlWoo commented Aug 14, 2018

Copy link
Copy Markdown

Many friends seem very to be interested in multi-gpus support when training the model. Maybe it is necessary to merge the branch into the master one.

@MlWoo

MlWoo commented Aug 14, 2018

Copy link
Copy Markdown
Author

@begeekmyfriend I have not modified the relative code in terms of the pattern.

@begeekmyfriend

begeekmyfriend commented Aug 14, 2018

Copy link
Copy Markdown
Contributor

@Rayhane-mamah Yes I agree. In multi-gpu mode we can set r=1 and expand the batch size to obtain smooth gradient. So please consider it as another branch.

@Rayhane-mamah

Copy link
Copy Markdown
Owner

Yes it seems like people are requesting that. :) well, your multi-gpu attempt @MlWoo is sure much helpful. Since the model content has been changed since you made this implementation, I will need to make few updates here and there, but yeah, I will probably make a new branch for both Wavenet and Tacotron multi-gpu or add those directly on master with optional use or something. (I don't like 4 spaces though hahaha..).

In the meantime, I am leaving this PR open in here so that people can quickly refer to a good multi-gpu implementation :)

Thanks for all your contributions @MlWoo and @begeekmyfriend ;)

@ghost

ghost commented Sep 17, 2018

Copy link
Copy Markdown

When I try to use this Fork as it is, I run into the following:

ValueError: Cannot feed value of shape (48, 408, 1025) for Tensor 'datafeeder/linear_targets:0', which has shape '(?, ?, 513)'

What could be the cause of this? I preprocessed LJSpeech with the given hyperparameters btw.

@MlWoo

MlWoo commented Sep 18, 2018

Copy link
Copy Markdown
Author

@tomse-h I have not modified the relative code in terms of the linear pattern. You can complete it with the solution of mel features

@shaktikshri

shaktikshri commented Aug 14, 2020

Copy link
Copy Markdown

I might be a bit late into this conversation, but did you guys also see a proportional increase in sec/step when using multiple GPUs? Here are my stats on V100 GPUs with outputs_per_step = 16
#GPU----batchsize----sec/step
1.................32......................~4
2.................64.....................~10
3.................96 ....................~15
4.................128....................~19

@MlWoo

MlWoo commented Aug 17, 2020

Copy link
Copy Markdown
Author

@shaktikshri No, it increases but does scale linearly. You would better check the time of loading data and the unbalance of length of data of each device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants