Local and Global: Text Matching via Syntax Graph Calibration

Date of Award


Document Type


Degree Name

Master of Science in Machine Learning


Machine Learning

First Advisor

Dr. Shangsong Liang

Second Advisor

Dr. Abdulmotaleb Elsaddik


"Semantic matching is the process of evaluating how closely different linguistic elements, like words, phrases, or entire texts, are in meaning to one another. It measures if two pieces of text convey comparable meanings, intentions, or share the same information, which is essential for a wide array of uses such as internet search algorithms, content suggestion platforms, and conversational AI programs. The primary challenge in semantic matching is the accurate identification of semantic relations between text fragments, which requires a nuanced understanding of both local syntactic nuances and global semantic contexts. Traditional models, including the widely-used pre-trained models like BERT, often struggle to capture these intricate details, particularly when subtle syntactic differences are crucial for understanding complex semantic relationships. Thismthesis introduces a novel approach, termed Local and Global Syntax Graph Calibration (LG-SGC), aimed at bridging this gap by effectively integrating local syntax awareness with global semantic processing for improved text matching accuracy. The LG-SGC framework posits that the explicit modeling of syntactic structure, when combined with the semantic understanding capabilities of models like BERT, can significantly enhance performance on semantic matching tasks. To test this hypothesis, the research first undertakes a comprehensive analysis of BERT’s layer-wise representations, dentifying a syntax-related knowledge deficit in its lower layers. Addressing this, LG-SGC incorporates a specialized transformer module designed to inject syntax-aware features into the early stages of rocessing. This module employs a Syntax Graph Convolutional Network (S-GCN) that leverages syntax trees for capturing syntactic relations and integrates these with BERT’s semantic processing through a novel information fusion layer. Methodologically, the LG-SGC model enriches BERT’s attention mechanism with syntaxaware adjustments, enabling the model to pay closer attention to syntactic details that are critical for semantic distinction. An auxiliary task within BERT is introduced to fine-tune its sensitivity to local syntactic differences, while the model retains its global semantic matching capabilities. The LG-SGC’s effectiveness is validated through extensive experiments across ten benchmark datasets, where it consistently outperforms existing models, including vanilla BERT and its advanced counterparts. The contributions of this thesis are manifold. Firstly, it provides empirical evidence supporting the integration of syntax-aware processing in enhancing semantic matching, addressing a notable gap in existing NLP models. Secondly, the LG-SGC model represents a significant methodological advancement, demonstrating how local syntax and global semantics can be coherently fused within a pre-trained model framework. Thirdly, the research outcomes suggest a new direction for future NLP model development, emphasizing the importance of syntactic understanding in semantic tasks. In conclusion, this thesis underscores the critical role of syntax in semantic matching tasks and presents LG-SGC as a powerful new model that leverages the best of both syntactic and semantic worlds. The findings and methodology outlined herein not only contribute to advancing the state of the art in semantic matching but also offer a roadmap for future research in the domain, highlighting the potential for syntactic-aware models to revolutionize NLP applications."


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Shangsong Liang, Abdulmotaleb Elsaddik

with 1 year embargo period

This document is currently not available here.