• Whisper is an open-source automatic speech recognition system, trained on 680,000 hours of multilingual and multitask supervised data. • Robust to accents, background noise, and technical language - able to transcribe and translate speech in multiple languages into English. • End-to-end approach, implemented as an encoder-decoder Transformer. • Capable of performing language identification and phrase-level timestamps. • Designed to be easy to use and have high accuracy, enabling developers to add voice interfaces to more applications.