WER

Hi, thanks for the great work. 

When I test the pre-trained multi-speaker model on the LRW test set I get similar STOI and ESTOI values quoted in the paper but the best WER I can achieve is **79.6%** compared to the **34.2%** in the paper.

Could you specify the steps you used to achieve 34.2% WER with Google ASR? Do you crop the synthesised word and use a specific Google ASR model/configuration? Do you use the entire LRW test dataset or just a subset?

It would be great to know for fair comparison of future research.

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WER #38

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

WER #38

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions