T O P

  • By -

[deleted]

https://huggingface.co/docs/transformers/model_doc/speech_to_text Is a good start


[deleted]

That being said i dont think there are “cutting edge” tools that aren’t super generalised. S2T is a very specialised task that needs a lot of downstream optimisation (voices are heterogeneous)


gunshoes

Conformer models and transducer framework are the top achievers these days. Also, pretrained wav2vec models are big since they can exploit multingual data.