Natural Language Processing Faculty Publications

Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

Sandipana Dowerah, Université de Lorraine
Ajinkya Kulkarni, Mohamed bin Zayed University of Artificial Intelligence UniversityFollow
Romain Serizel, Université de Lorraine
Denis Jouvet, Université de Lorraine

Document Type

Conference Proceeding

Publication Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Abstract

The paper introduces Diff-Filter, a multichannel speech enhancement approach based on the diffusion probabilistic model, for improving speaker verification performance under noisy and reverberant conditions. It also presents a new two-step training procedure that takes the benefit of self-supervised learning. In the first stage, the Diff-Filter is trained by conducting time-domain speech filtering using a scoring-based diffusion model. In the second stage, the Diff-Filter is jointly optimized with a pre-trained ECAPA-TDNN speaker verification model under a self-supervised learning framework. We present a novel loss based on equal error rate. This loss is used to conduct self-supervised learning on a dataset that is not labelled in terms of speakers. The proposed approach is evaluated on MultiSV, a multichannel speaker verification dataset, and shows significant improvements in performance under noisy multichannel conditions.

First Page

3849

Last Page

3853

DOI

10.21437/Interspeech.2023-1890

Publication Date

1-1-2023

Keywords

diffusion probabilistic models, multichannel speech enhancement, self-supervised learning, speaker verification

Recommended Citation

S. Dowerah et al., "Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions," Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2023-August, pp. 3849 - 3853, Jan 2023.

The definitive version is available at https://doi.org/10.21437/Interspeech.2023-1890

This document is currently not available here.

COinS

Natural Language Processing Faculty Publications

Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Recommended Citation

Browse

Contribute

Links

Natural Language Processing Faculty Publications

Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Recommended Citation

Share

Browse

Contribute

Links