Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

Document Type

Conference Proceeding

Publication Title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Abstract

The paper introduces Diff-Filter, a multichannel speech enhancement approach based on the diffusion probabilistic model, for improving speaker verification performance under noisy and reverberant conditions. It also presents a new two-step training procedure that takes the benefit of self-supervised learning. In the first stage, the Diff-Filter is trained by conducting time-domain speech filtering using a scoring-based diffusion model. In the second stage, the Diff-Filter is jointly optimized with a pre-trained ECAPA-TDNN speaker verification model under a self-supervised learning framework. We present a novel loss based on equal error rate. This loss is used to conduct self-supervised learning on a dataset that is not labelled in terms of speakers. The proposed approach is evaluated on MultiSV, a multichannel speaker verification dataset, and shows significant improvements in performance under noisy multichannel conditions.

First Page

3849

Last Page

3853

DOI

10.21437/Interspeech.2023-1890

Publication Date

1-1-2023

Keywords

diffusion probabilistic models, multichannel speech enhancement, self-supervised learning, speaker verification

This document is currently not available here.

Share

COinS