M4: Multi-Generator, Multi-Domain, and Multi-Lingual Black-Box Machine-Generated Text Detection
Document Type
Conference Proceeding
Publication Title
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Abstract
Pre-trained models such as BERT have achieved remarkable results in text matching tasks. However, existing models still suffer from the challenge of capturing local subtle differences when modeling complex semantic matching relationships. In this work, we find that the integration of local syntax awareness and global semantics is crucial for text matching. Meanwhile, we propose the Local and Global Syntax Graph Calibration (LG-SGC) module, which can explore local syntactic and global semantic information for the matching task. Specifically, we first introduce an auxiliary task inside BERT to capture local subtle grammatical differences. Then, we retain the original attention operation to capture global matching features. Finally, we design an information fusion layer to effectively combine local and global information to deepen the understanding of the matching task. We conduct extensive experiments on 10 benchmarks, and LG-SGC significantly outperforms previous models.
First Page
11571
Last Page
11575
DOI
10.1109/ICASSP48485.2024.10446461
Publication Date
1-1-2024
Keywords
attention calibration, neural language processing, semantic matching, syntax graph
Recommended Citation
L. Li et al., "M4: Multi-Generator, Multi-Domain, and Multi-Lingual Black-Box Machine-Generated Text Detection," ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 11571 - 11575, Jan 2024.
The definitive version is available at https://doi.org/10.1109/ICASSP48485.2024.10446461