Reviewing the ICASSP 2025 papers, I’m reminded of a core truth: scalable, expert-led annotation isn’t just an enhancement for AI—it’s a necessity. This year’s conference spotlighted the growing demand for high-quality, curated datasets across fields like speech recognition, medical imaging, and dataset distillation.
From my perspective as an AI researcher, the issues raised in this year’s papers are not only about improving models—they are about rethinking how we prepare the data that feeds these systems. Without domain-specific expertise, adaptive tools, and human-in-the-loop governance, even the most advanced AI models can fall short. Below, I’ll dive into some of the most compelling ICASSP papers and discuss how they intersect with the growing need for smarter, expert-driven data infrastructure in AI.
1. Label Noise Isn’t Just a Bug—It’s a Benchmarking Opportunity
Let’s start with one of my favorite provocations from this year: "Can Quality Survive Scale?" by Song et al. This paper doesn’t just diagnose a problem—it proposes a way to systematically model and evaluate label noise through their equal-quality instance-dependent noise (EQ-IDN) framework. Label noise models like this create controlled noisy datasets, helping researchers evaluate how robust algorithms perform under varying levels and types of noise.
For me, this research reinforces a key point: label noise isn’t just something to fix; it’s a crucial factor in benchmarking AI systems. Song et al.’s framework emphasizes that high-quality datasets require intentional design, not just aggregation. It also highlights how “hard” noisy samples impact model generalization more than noise rate alone [Song et al., 2025]. This paper offers valuable insights into managing noise for better AI performance.
2. Annotation Innovation is Finally Getting the Spotlight.
LABEL-SAM by Cai et al. proposes a semi-automatic labeling method for aortic dissection in 3D CTA imagery, leveraging user input on boundary slices to infer intermediate segmentations. Their bidirectional weighting strategy and plug-in integration for 3D Slicer reflects the kind of high-efficiency tooling we need for medical and scientific domains. Experts should spend time where their insight matters most [Cai et al., 2025].
In parallel, Semi-Automatic Labeling for Action Recognition by Ando et al. proposes a smart, VLM-based pre-filtering mechanism to curate data before annotator review. Their diversity-preserving sampling improves dataset utility without increasing labeling burden—a playbook we know well. It reinforces our belief that curated complexity beats scale-by-default [Ando et al., 2025].
3. When Labels Reflect Reality, Models Learn Better
Na et al.'s Cohort-Sensitive Labeling explores how annotating ASR training data with cohort-specific tags (like gender) improves performance dramatically, especially in noisy or diverse environments. Their method reduced word error rates by over 11% on certain test sets and accurately inferred cohort labels at 97.21%.
We don’t need just crowdsource—we need to curate domain-aligned annotators who understand context. And we build annotation flows that capture metadata as meaning, whether it’s dialect, specialty, or semantic nuance [Na et al., 2025]. I believe this approach is quite helpful to reduce any danger of biases in the dataset.
4. Smart Learning from Noisy Labels Is the Future
SLDR by Dong et al. doesn’t just aim to clean data—it uses structured dual-regularization to extract value from both clean and noisy samples. This strategy combines supervised learning on presumed-clean subsets and unsupervised similarity-driven training on noisy data.
That hybrid thinking emphasizes this philosophy: No label left behind. Whether it’s uncertainty calibration or consensus estimation, the noisy or contested samples are not treated as liabilities, but as sources of insight [Dong et al., 2025].
5. Data Compression with Integrity: Distillation That Works
Kong et al.’s Low-Rank Space Sampling reframes dataset distillation by identifying shared low-dimensional subspaces rather than selecting independent synthetic samples. The result? Retaining dataset diversity and density with far fewer samples—and better performance. Their method improves training efficiency and boosts accuracy by nearly 10% on several benchmarks including synthetic noisy datasets, and real-world noisy datasets on computer vision tasks.
Minimizing dataset bloat while maximizing value. It shows that with the right architecture and human understanding, less really can be more [Kong et al., 2025].
6. Detecting Adversaries in the Data Supply Chain
The paper by Karaaslanli et al. tackles a challenging issue: adversarial manipulation in crowdsourced labeling where a subset of adversary annotators deliberately provide erroneous responses. By modeling annotator behavior as a bipartite graph and applying dense subgraph detection, they not only spot bad labels—they expose the annotators behind them.
This is quite important because of the pitfalls of crowd labor. You can’t scale trust. Instead, you could over-compensate experts and build oversight directly into the tooling, ensuring the benchmarks are secure, transparent, and tamper-resistant [Karaaslanli et al., 2025].
7. Benchmarking the Cost of Annotation Itself
Another intriguing contribution is The Interactive Machine Learning Metric (IMLM) from Lindsey et al., which introduces a unified measure that balances error rate with annotation cost. Since the cost of labeling the training data is typically substantial. This approach is a unified way to measure both accuracy and labeling cost. Especially in human-in-the-loop settings, this type of metric helps teams optimize for both model performance and human effort. This approach is validated using a Spoken Language Verification task
It is important to see more frameworks like this adopted by the industry. Cost-efficiency should include human time, attention, and context-switching costs [Lindsey et al., 2025].
Final Thought
What stood out to me from ICASSP 2025 isn’t just the progress in model architectures, it’s the attention being paid to the data pipeline itself. We’re finally entering the age of data-aware AI.. From cohort-specific labeling to smart dataset distillation, it’s clear: the next frontier isn’t just smarter models—it’s smarter data.
At Perle, we don’t just label data—we architect it. We shape the very substrate that machine learning models rely on, leveraging domain experts, adaptive tooling, and human-in-the-loop governance that scales without compromise.
If you’re building in this space, let’s talk.
References
Song et al., 2025. Can Quality Survive Scale?
Cai et al., 2025. LABEL-SAM
Ando et al., 2025. Semi-Automatic Labeling for Action Recognition
Na et al., 2025. Cohort-Sensitive Labeling
Dong et al., 2025. Sufficient Learning for Label Noise
Kong et al., 2025. Efficient Dataset Distillation through Low-Rank Sampling
Karaaslanli et al., 2025. Identifying Adversarial Attacks in Crowdsourcing
Lindsey et al., 2025. A Unified Metric for Simultaneous Evaluation of Error Rate and Annotation Cost
No matter how specific your needs, or how complex your inputs, we’re here to show you how our innovative approach to data labelling, preprocessing, and governance can unlock Perles of wisdom for companies of all shapes and sizes.