Audio WG meeting 06.18.25

Jun 18, 2025

Attendees

Invited Pete Bernard Mallik Moturi Jonathan Russo Pong Trairatvorakul Rosina Haberl John Marconi raashid.ansari@silabs.com Tomer Badug Elia Shenberger Venkat Rayudu

Attachments EAIF Audio WG meeting 

Meeting records Transcript 

Summary

Pete Bernard highlighted Gemini’s AI note-taking capabilities, while Tomer Badug outlined the audio working group’s objectives to share audio models and datasets on the AJI Foundation platform and their progress with Adam’s dataset. Tomer Badug also introduced a new initiative for an audio model evaluation platform with benchmarking and a subjective ranking system, which led to discussions with Pete Bernard and Jonathan Russo about existing efforts, evaluation methodologies, and potential collaboration with ML Perf. Raashid Ansari from Silicon Labs joined later to discuss the role of DSP versus ML in audio processing for edge devices.

Details

  • AI Note Taking and Meeting Attendees Pete Bernard noted that Gemini’s AI note-taking is now integrated into Google Meets and believes it to be a key application of generative AI. Rosina Haberl informed the group that Eric and another colleague would be absent due to an important customer meeting. Tomer Badug planned to keep the meeting brief and mentioned a potential need to leave for a shelter. Raashid Ansari joined the meeting later (00:10:57).
  • Academic Co-chair Search Tomer Badug inquired about Iman’s availability to join the meeting and suggested looking for another academic co-chair if Iman was unable to participate. Pete Bernard mentioned Iman was very busy and proposed emailing the AIP (Academic Industry Partnership) alias, which includes several professors like Iman, to find someone interested in a co-chair role. Pete Bernard agreed to take action on this (00:00:56) (00:12:59).
  • Audio Working Group Deliverables Pete Bernard initiated a discussion about the desired outcomes or deliverables for the audio working group, referencing the live streams and white papers produced by the generative AI EDI group. Tomer Badug stated their goal is to share audio-related models and data sets on the AJI Foundation platform, with a data set from Adam nearing legal approval (00:02:02). The group is also planning white papers and pre-processing code for audio processing (00:03:14).
  • Audio Model Evaluation Platform Tomer Badug introduced a new initiative: a hub for evaluating audio models with benchmarks and metrics, similar to the chatbot arena for LLMs, and plans to present it in Milan (00:03:14). Pete Bernard inquired about existing benchmarking efforts in the audio space similar to Neurobench for neuromorphic computing. Tomer Badug indicated there isn’t one currently and hopes the AJI Foundation can host this. Pete Bernard mentioned a discussion with David Caner from ML Perf about potential coordination (00:04:41).
  • Ranking System for Audio Models Tomer Badug elaborated on the subjective ranking system for LLMs based on the ELO ranking from chess and proposed a similar system for audio, specifically starting with speech enhancement but with potential for expansion (00:05:39). Jonathan Russo inquired if this would involve audio samples for AB testing, similar to the chatbot arena (00:08:32). Tomer Badug confirmed this is a possibility for discussion within the working group (00:09:57).
  • Leveraging Existing Platforms and Partnerships Pete Bernard suggested starting with the current members and leveraging Edi Labs to host the audio model ranking platform to attract more participants (00:10:57). Pete Bernard also proposed getting promotion from David Caner and ML Perf, as he is generally supportive of such efforts. Pete Bernard thought a white paper on the direction of audio AI and best practices, with a call to action to engage with the new platform, would be beneficial once it’s running (00:12:07).
  • DSP and Machine Learning in Audio Processing Raashid Ansari, from Silicon Labs, shared their involvement with ML Perf tiny and the focus on platform performance and energy measurements (00:14:12). Raashid Ansari inquired if this group’s focus includes building the audio models themselves, which Tomer Badug confirmed, emphasizing it’s open to all audio-related aspects (00:15:15). Raashid Ansari raised the debate about using DSP versus direct ML processing for audio, especially in resource-constrained edge AI devices. Tomer Badug, with a DSP background, believes in a mix of DSP and deep learning for edge applications but sees existing DSP solutions as mature (00:16:31). Jonathan Russo suggested using DSP as a reference for comparison with AI models (00:18:02).
  • Benchmarking and Evaluation Details Jonathan Russo and Tomer Badug discussed details for the audio model leaderboard, including the possibility of both random sample AB testing and benchmark testing. Jonathan Russo suggested including audio samples for subjective evaluation and proposed various methods for this (00:09:57). Tomer Badug emphasized the goal of creating a fair and unified evaluation process to prevent manipulation (00:27:33). Jonathan Russo suggested reporting multiple metrics and allowing sorting by different criteria on the leaderboard (00:25:24).
  • Focus on Speech Enhancement and Evaluation Datasets Tomer Badug indicated that speech enhancement is a prioritized starting point for the evaluation platform due to existing needs and the availability of a foundational system (00:23:57). Jonathan Russo suggested using a pivot table or sortable table format for presenting evaluation results with multiple metrics (00:25:24). Jonathan Russo offered internal evaluation datasets, including those from vehicle interior and Wham, recommending the inclusion of diverse speech samples beyond LibriSpeech (00:30:23). They also discussed the number of evaluation samples and SNR ranges (00:31:24).
  • Action Items and Next Steps Tomer Badug planned to prepare and upload the work already done on the evaluation platform to the shared Google Drive and contact the group for feedback. For the next meeting, Tomer Badug requested Jonathan Russo and others to help compile a list of open-source speech enhancement models for initial benchmarking (00:28:33) (00:33:35). Pete Bernard offered to email the AIP alias to seek more academic involvement in audio AI research (00:12:59) (00:34:28). Rosina Haberl provided the link to the audio working group’s Google Drive folder (00:32:35) (00:34:28). The group decided to keep Gemini’s meeting notes in a subfolder on the shared drive (00:33:35) (00:35:09).

Suggested next steps

  • uncheckedJonathan Russo will inquire internally about using vehicle interior and WHAM/WHAMR evaluation datasets.
  • uncheckedJonathan Russo will reach out to potentially interested academic grad students.
  • uncheckedPete Bernard will email the AIP alias, CCing Tomer Badug, to seek more academic involvement (research, PhD students) for audio AI.
  • uncheckedJonathan Russo and Tomer Badug will compile a list of open-source speech enhancement models to benchmark.
  • uncheckedTomer Badug will upload leaderboard work to Google Drive and email Jonathan Russo for feedback.

You should review Gemini’s notes to make sure they’re accurate. Get tips and learn how Gemini takes notes

Please provide feedback about using Gemini to take notes in a short survey.