Machine Learning Dissertations and Theses

Test-Time Adaptation for Zero-Shot Generalization of Large Vision Language Models

Raza Imam, Mohamed bin Zayed University of Artificial IntelligenceFollow

Date of Award

4-30-2024

Document Type

Thesis

Degree Name

Master of Science in Machine Learning

Department

Machine Learning

First Advisor

Dr. Karthik Nandakumar

Second Advisor

Dr. Mohammad Yaqub

Abstract

The conventional modus operandi for adapting pre-trained vision-language models (VLMs) during test-time involves tuning learnable prompts, i.e., test-time prompt tuning. This paper introduces Test-Time Low-rank adaptation (TTL) as an alternative to prompt tuning for zero-shot generalization of large-scale VLMs. Taking inspiration from recent advancements in efficiently fine-tuning large language models, TTL offers a test-time parameter-efficient adaptation approach that updates the attention weights of the transformer encoder by maximizing prediction confidence. The self-supervised confidence maximization objective is specified using a weighted entropy loss that enforces consistency among predictions of augmented samples. TTL introduces only a small amount of trainable parameters for low-rank adapters in the model space while keeping the prompts and backbone frozen. Extensive experiments on a variety of natural distribution and cross-domain tasks show that TTL can outperform other techniques for test-time optimization of VLMs in strict zero-shot settings. Specifically, TTL outperforms test-time prompt tuning baselines with a significant improvement on average. Our code and evaluation benchmark will be publicly released.

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Karthik Nandakumar, Mohammad Yaqub

with 1 year embargo period

Recommended Citation

R. Imam, "Test-Time Adaptation for Zero-Shot Generalization of Large Vision Language Models,", Apr 2024.

This document is currently not available here.

COinS

Machine Learning Dissertations and Theses

Test-Time Adaptation for Zero-Shot Generalization of Large Vision Language Models

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Abstract

Comments

Recommended Citation

Browse

Contribute

Links

Machine Learning Dissertations and Theses

Test-Time Adaptation for Zero-Shot Generalization of Large Vision Language Models

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Abstract

Comments

Recommended Citation

Share

Browse

Contribute

Links