Jul 30, 2025
Attendees
Invited Pete Bernard Mallik Moturi Jonathan Russo Pong Trairatvorakul Rosina Haberl John Marconi raashid.ansari@silabs.com Tomer Badug Elia Shenberger Eiman Kanjo Venkat Rayudu Eric Smiley
Attachments EAIF Audio WG meeting
Meeting records Transcript
Summary
Tomer Badug, Pete Bernard, Venkat Rayudu, Mallik Moturi, Jonathan Russo, and John Marconi discussed the “Sonic Scale,” a blind-testing, ELO-ranking system for speech enhancement models, which will be hosted by the AGI Foundation to ensure secure access and unbiased evaluation. They clarified that the Sonic Scale is an A/B testing system, unlike the ITU standard MOS, and that while subjective factors might influence individual rankings, a large number of participants would help average out these biases. The main talking points were the introduction of the Sonic Scale, its features and security, model access and user engagement, comparison to MOS and crowdsourcing benefits, implementation and future enhancements, and data management and causal models.
Details
- Meeting Attendance and One-Pager Review Tomer Badug noted that Eric would not be able to join, and John Marone had not yet responded to texts (00:00:00). Pete Bernard mentioned discussing leaderboard integration with John and asked Tomer about the prepared one-pager (00:00:59). Tomer offered to share the one-pager and reviewed it with the team, including newly joined Venkat Rayudu and Mallik Moturi (00:01:45).
- Introduction to Sonic Scale Tomer Badug introduced the “Sonic Scale,” a ranking system for speech enhancement models, which was previously discussed in Milan (00:02:41). The system uses an ELO ranking alongside objective metrics to evaluate sound quality and insight into the correlation between metrics and human preferences (00:04:05).
- Sonic Scale Features and Security Tomer Badug explained that the Sonic Scale uses blind testing to prevent bias, quickly converges due to a smart sampling system, and allows models to be added as TF light files, securing proprietary models by not requiring full logic (00:04:05). The system is intended to be hosted by the AGI Foundation, a neutral non-profit, ensuring models are not downloadable or previewable, only accessible via inference (00:05:39). Tomer also mentioned an API option for companies hesitant to upload models, although this could complicate causality assurances (00:07:02).
- Model Access and User Engagement Pete Bernard inquired about how users would access and verify models after seeing their rank. Tomer Badug responded that for proprietary models, users could directly contact the company that integrated the model, while open-source models would have links for direct download (00:08:21). Tomer suggested that anyone could be a user and provide input for ranking, likening it to Amazon reviews or the LLM arena’s crowdsourced ranking system (00:14:32).
- Comparison to MOS and Crowdsourcing Benefits Mallik Moturi inquired if the Sonic Scale was equivalent to MOS scores, and Tomer Badug clarified that it is an A/B testing system, unlike the ITU standard MOS which requires ranking by specific criteria (00:15:31). Jonathan Russo and Pete Bernard noted that while subjective factors like age or geography might influence individual rankings, a large number of participants would help average out these biases, similar to the LLM arena (00:17:43). Pete Bernard highlighted the large community of the Edji Foundation that could be leveraged for participation (00:19:03).
- Implementation and Future Enhancements John Marconi expressed enthusiasm for implementing the Sonic Scale on lab sites and requested Tomer Badug to share the one-pager document with their team (00:19:03). Tomer confirmed that the system, which their company has used and refined, would be contributed to the community to provide a better, more efficient evaluation method than traditional MOS scoring (00:19:45). John Marconi also inquired about the inclusion of proprietary and open-source models, and Tomer confirmed that both would be integrated, with proprietary models requiring direct company contact for access and open-source models being freely available (00:21:03).
- Data Management and Causal Models John Marconi raised questions about keeping metrics per sample and the potential for challenges based on noise conditions (00:21:03). Tomer Badug explained that the system currently uses a test set from the DNS challenge, but they are open to expanding it and providing insights into model performance under different noise types (00:22:24). Jonathan Russo brought up concerns about uploading proprietary models, especially open-weight ones, and suggested obfuscation or an API solution to address these concerns (00:28:26). Tomer acknowledged the API as a possibility despite potential complications with causality and varying processing methods, indicating that further technical discussions would be needed (00:29:48).
- Community Engagement and Outreach Pete Bernard concluded the discussion by praising the progress as a good example of a working group driving a topic forward and encouraging continued engagement to attract new participants. Tomer Badug encouraged everyone to reach out to more people they met at the Milan event to make a significant impact within the working group (00:32:01).
Suggested next steps
Jonathan Russo will send links to the conferences.
Tomer Badug will share the document and upload it to the audio working group shared folder.
John Marconi will talk to his team about the document and figure out when it could come on.
You should review Gemini’s notes to make sure they’re accurate. Get tips and learn how Gemini takes notes
Please provide feedback about using Gemini to take notes in a short survey.