Rethinking Model Re-Basin and Linear Mode Connectivity

Date of Award

4-30-2024

Document Type

Thesis

Degree Name

Master of Science in Machine Learning

Department

Machine Learning

First Advisor

Dr. Samuel Horvath

Second Advisor

Dr. Bin Gu

Abstract

"Extensive recent empirical works demonstrate the massive potential of model merging, i.e. merging different models with the same architecture in the parameter space, in improving the in-distribution as well as out-of-distribution performance and achieving multi-task learning and machine unlearning without suffering catastrophic forgetting. This framework is extremely simple and lightweight, maintaining merely a single model for inference. Yet, the performance of model merging is guaranteed only when the models to be merged are within the same basin in the loss landscape, i.e. is conditioned by the existence of linear mode connectivity, restricting general applications. Surprisingly, a series of recent studies suggest that with sufficiently wide models, most SGD solutions can, up to permutation, converge into the same basin. This phenomenon, known as the model re-basin regime, has significant implications for model averaging. However, current re-basin strategies are limited in effectiveness due to a lack of comprehensive understanding of underlying mechanisms. To address this gap, this work provides two different lines of effort. It starts by revisiting the standard practices in model re-basin and uncovers the frequent inadequacies of existing matching algorithms, including the poor performance of weight matching and multiple matching without the re-normalization and a sharp contrast between the same basin and re-basin scenario. These results suggest that contemporary matching algorithms failed to detach the optimal permutation but were rescued by certain post-processing techniques. The conjecture is supported by a novel finding that removing the biases term from the model structure can boost re-basin in a similar way as re-normalization. To the best knowledge, this is the first comprehensive analysis of re-basing independent SGD solutions trained on the same dataset, bringing insights towards a further understanding. With the ultimate objective of improving the practice, a more direct analytical approach is proposed based on the above observations. It exposes a two-stage interaction between matching algorithms and re-normalization processes. In the first stage, the application of matching algorithms alleviates severe mismatches between two end models and maintains necessary knowledge in the interpolated model. After that, the re-normalization amplifies the remained information and eventually recovers the model performance. This perspective not only clarifies and refines previous findings but also facilitates novel insights. For instance, it suggests that the implicit bias of training with a large learning rate and large weight decay is beneficial to model re-basin, which is empirically validated in extensive experiments. More intriguingly, it connects the model pruning to the linear mode connectivity by indicating a linear interpolation formulation of pruning. This analogy eventually motivates a lightweight yet effective post-pruning plug-in that can be directly merged with any existing pruning techniques. It’s expected that the novel observations and analysis provided in this work can bring meaningful insights and benefit the research of model re-basin and merging. Additionally, it’s believed that further exploration of the derived post-pruning technique can have a substantial practical impact. For this purpose, the experimental implementation of this work is publicly accompanied at https://github.com/XingyuQu/rethink-re-basin."

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Samuel Horvath, Bin Gu

Online access available for MBZUAI patrons

Share

COinS