Structural Identification of Partially Observed Linear Non-Gaussian Acyclic Model: Generalized Independent Noise Approach

Date of Award

4-30-2024

Document Type

Thesis

Degree Name

Master of Science in Machine Learning

Department

Machine Learning

First Advisor

Dr. Kun Zhang

Second Advisor

Dr. Martin Takac

Abstract

Conventional causal discovery approaches, which seek to uncover causal relationships among measured variables, are typically sensitive to the presence of latent variables. While various methods have been developed to address this confounding issue, they often rely on strong assumptions about the underlying causal structure. In this paper, we consider a general scenario where measured and latent variables collectively form a partially observed causally sufficient linear system and latent variables may be anywhere in the causal structure. Naturally, LiNGAM, a model without latent variables, is encompassed as a special case. We theoretically show that with the aid of high-order statistics, the causal graph is (almost) fully identifiable if, roughly speaking, each latent set has a sufficient number of pure children, which can be either latent or measured. To achieve this, we leverage the Generalized Independent Noise (GIN) condition to test for statistical independence involving only measured variables in specific manners. Specifically, we first illustrate the origins of the GIN condition in terms of the data-generating process and establish the necessary and sufficient graphical criteria in its most general context. Based on the graphical criteria, we further establish the identification theorems and accordingly develop a principled algorithm for identifying the entire causal graph. The algorithm is iterative and phased, which can be flexibly altered to suit specific identifiability conditions. Afterwards, we discuss the results when the identifiability conditions are violated. Finally, experimental results show that our method effectively recovers the causal structure, even when latent variables are influenced by measured variables.

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Kun Zhang, Dr. Martin Takac

Online access available for MBZUAI patrons

Share

COinS