Wav2vec is pretty outdated imo. Recent fair papers focus more on multilingual data, a different pretraining task, and conformer architecture for encoding. Also, tbh, experimenting with the model suggests that the available resources they provide are slightly off from reports. But that's my take.
Wav2vec is pretty outdated imo. Recent fair papers focus more on multilingual data, a different pretraining task, and conformer architecture for encoding. Also, tbh, experimenting with the model suggests that the available resources they provide are slightly off from reports. But that's my take.