All news at TU Wien

S-Edge

Efficient and interpretable raw audio classification with diagonal state space models at ECMLPKDD’25

Researchers from the Embedded Machine Learning CD-Lab at ICT introduce S-Edge, a lightweight and interpretable State Space Model architecture for raw audio classification.

Link to publication:
https://link.springer.com/article/10.1007/s10994-025-06807-z#rightslink, opens an external URL in a new window 

Link to code:
https://github.com/embedded-machine-learning/S-Edge, opens an external URL in a new window 

Link to conference:
https://ecmlpkdd.org/2025/, opens an external URL in a new window 

At the forthcoming European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’25)

Researchers from TU Wien’s Christian Doppler Laboratory for Embedded Machine Learning will present S-Edge, a lightweight and interpretable State Space Model architecture.

State Space Models have achieved good performance on long sequence modeling tasks such as raw audio classification. Their definition in continuous time allows for discretization and operation of the network at different sampling rates. However, this property has not yet been utilized to decrease the computational demand on a per-layer basis. We propose a family of hardware-friendly S-Edge models with a layer-wise downsampling approach to adjust the temporal resolution between individual layers. Applying existing methods from linear control theory allows us to analyze state/memory dynamics and provides an understanding of how and where to downsample. Evaluated on the Google Speech Command dataset, our autoregressive/causal S-Edge models range from 8–141k parameters at 90–95% test accuracy in comparison to a causal S5 model with 208k parameters at 95.8% test accuracy. Using our C++17 header-only implementation on an ARM Cortex-M4F the largest model requires 103 sec. inference time with 95.19% test accuracy, and the smallest model with 88.01% test accuracy, requires 0.29 sec. Our solutions cover a design space that spans 17x in model size, 358x in inference latency, and 7.18 percentage points in accuracy.

The discrete recurrent forward path of our proposed S-Edge compared to the Original S5.

© Matthias Bittner, Daniel Schnöll

The discrete recurrent forward path of our proposed S-Edge compared to the Original S5.