Hervé Bredin

Chief Science Officer and co-founder at pyannoteAI, known for his work on speaker diarization and as the lead developer behind pyannote.audio; currently on leave from CNRS, where he has been a permanent research scientist focused on audio, speech, and NLP.

Voice AI Benchmarks Understate Errors in Real Multi-Speaker Audio

Hervé Bredin of pyannoteAI argues that voice AI benchmarks often make speech-to-text look more solved than it is by evaluating cleaner, more single-speaker-like audio. In his talk, he shows Nvidia Parakeet scoring 11.4% word error rate on AMI meeting audio in the Open ASR Leaderboard but 26% in pyannoteAI’s run on the same dataset using the table microphone rather than headset audio. Bredin’s broader case is that conversational AI needs fine-grained speaker diarization and speaker-attributed transcription, because words alone do not capture who spoke, when they overlapped, or how real multi-speaker conversations are structured.

AI EngineerJun 5, 202610 min read