Granger Causality using Neural Networks

Document Type


Publication Title



The Granger Causality (GC) test is a famous statistical hypothesis test for investigating if the past of one time series affects the future of the other. It helps in answering the question whether one time series is helpful in forecasting -another (Granger, 1969). Standard traditional approaches to Granger causality detection commonly assume linear dynamics, but such simplification does not hold in many real-world applications, e.g., neuroscience or genomics that are inherently non-linear. In such cases, imposing linear models such as Vector Autoregressive (VAR) models can lead to inconsistent estimation of true Granger Causal interactions. Machine Learning (ML) can learn the hidden patterns in the datasets specifically Deep Learning (DL) has shown tremendous promise in learning the non-linear dynamics of complex systems. Recent work of Tank et al. (2018) propose to overcome the issue of linear simplification in VAR models by using neural networks combined with sparsity-inducing penalties on the learn-able weights. The authors propose a class of non-linear methods by applying structured multilayer perceptrons (MLPs) or recurrent neural networks (RNNs) for prediction of each time series separately using Component Wise MLPs and RNNs. Sparsity is achieved through the use of convex group-lasso penalties. The sparsity penalties encourage specific sets of weights to be zero (or very small), allowing the Granger Causal structure to be extracted. In this work, we build upon ideas introduced by Tank et al. (2018). We propose several new classes of models that can handle underlying non-linearity. Firstly, we present the Learned Kernal VAR(LeKVAR) model—an extension of VAR models that also learns kernel parametrized by a neural net. Secondly, we show one can directly decouple lags and individual time series importance via decoupled penalties. This decoupling provides better scaling and allows us to embed lag selection into RNNs. Lastly, we propose a new training algorithm that supports mini-batching, and it is compatible with commonly used adaptive optimizers such as Adam (Kingma and Ba, 2014). The proposed techniques are evaluated on several simulated datasets inspired by real-world applications.We also apply these methods to the ElectroEncephalogram (EEG) data for an epilepsy patient to study the evolution of GC before, during and after seizure across the 19 EEG channels. Copyright © 2022, The Authors. All rights reserved.



Publication Date



Electro-Encephalogram, Epilepsy, Granger causality, Interpretability, Neural networks, Structured Sparsity, Time series


IR Deposit conditions: non-described