Student Publications

Towards Learning Efficient Multilingual and Multimodal Representation

Gokul Karthik Kumar, Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Dissertation

Abstract

This thesis focuses on developing efficient representation methods for multilingual and multimodal data in machine learning. The research is divided into three stages, each focusing on specific tasks. The first stage investigates and improves multilingual representation approaches for question-answering and text-to-speech tasks. The second stage aims to improve the fusion strategies of multimodal representations for hateful meme classification. In the third stage, the previous stages are unified by exploring the image retrieval task and improving the performance of multilingual and multimodal representations. The thesis proposes various approaches using pre-trained models and multimodal fusion techniques to improve the performance and the cultural relevance of various machine learning applications. For example, the proposed Hate-CLIPper architecture achieves state-of-the-art performance on meme detection, while training using a natively multilingual and multimodal Wikipedia Image-Text dataset with English text augmentation enables retrieval of culturally relevant images in ten Indian languages. The research not only contributes to the development of efficient representation methods for multilingual and multimodal data, but also inspires further investigations into the use of pre-trained models and multimodal fusion techniques for machine learning in multilingual and multimodal settings.

First Page

Last Page

Publication Date

6-2023

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Computer Vision

Advisors: Dr. Karthik Nandakumar, Dr. Salman Khan

Online access for MBZUAI patrons

Recommended Citation

G.K. Kumar, "Towards Learning Efficient Multilingual and Multimodal Representation", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2023.

Link to Full Text

COinS

Student Publications

Towards Learning Efficient Multilingual and Multimodal Representation

Document Type

Abstract

First Page

Last Page

Publication Date

Comments

Recommended Citation

Browse

Contribute

Links

Student Publications

Towards Learning Efficient Multilingual and Multimodal Representation

Authors

Document Type

Abstract

First Page

Last Page

Publication Date

Comments

Recommended Citation

Share

Browse

Contribute

Links