Can Transformers be Strong Treatment Effect Estimators?
Document Type
Article
Publication Title
arXiv
Abstract
In this paper, we develop a general framework based on the Transformer architecture to address a variety of challenging treatment effect estimation (TEE) problems. Our methods are applicable both when covariates are tabular and when they consist of sequences (e.g., in text), and can handle discrete, continuous, structured, or dosage-associated treatments. While Transformers have already emerged as dominant methods for diverse domains, including natural language and computer vision, our experiments with Transformers as Treatment Effect Estimators (TransTEE) demonstrate that these inductive biases are also effective on the sorts of estimation problems and datasets that arise in research aimed at estimating causal effects. Moreover, we propose a propensity score network that is trained with TransTEE in an adversarial manner to promote independence between covariates and treatments to further address selection bias. Through extensive experiments, we show that TransTEE significantly outperforms competitive baselines with greater parameter efficiency over a wide range of benchmarks and settings. Copyright © 2022, The Authors. All rights reserved.
DOI
10.48550/arXiv.2202.01336
Publication Date
2-2-2022
Keywords
Covariates, Discrete/continuous, Diverse domains, Estimation problem, Inductive bias, Natural languages, Propensity score, Selection bias, Treatment effects, Machine learning, Machine Learning (cs.LG)
Recommended Citation
Y.F. Zhang, H. Zhang, Z.C. Lipton, L.E. Li, and E. Xing, "Can Transformers be Strong Treatment Effect Estimators?", arXiv, Feb 2022, doi: 10.48550/arXiv.2202.01336
Comments
IR Deposit conditions: non-described
Preprint: arXiv