Publications
2024
- IJCAI 2024Learning Causally Disentangled Representations via the Principle of Independent Causal MechanismsAneesh Komanduri, Yongkai Wu, Feng Chen, and 1 more author
Learning disentangled causal representations is a challenging problem that has gained significant attention recently due to its implications for extracting meaningful information for downstream tasks. In this work, we define a new notion of causal disentanglement from the perspective of independent causal mechanisms. We propose ICM-VAE, a framework for learning causally disentangled representations supervised by causally related observed labels. We model causal mechanisms using learnable flow-based diffeomorphic functions to map noise variables to latent causal variables. Further, to promote the disentanglement of causal factors, we propose a causal disentanglement prior that utilizes the known causal structure to encourage learning a causally factorized distribution in the latent space. Under relatively mild conditions, we provide theoretical results showing the identifiability of causal factors and mechanisms up to permutation and elementwise reparameterization. We empirically demonstrate that our framework induces highly disentangled causal factors, improves interventional robustness, and is compatible with counterfactual generation.
- GCV@CVPRCausal Diffusion Autoencoders: Toward Representation-Enabled Counterfactual Generation via Diffusion Probabilistic ModelsAneesh Komanduri, Chen Zhao, Feng Chen, and 1 more author
Diffusion probabilistic models (DPMs) have become the state-of-the-art in high-quality image generation. However, DPMs have an arbitrary noisy latent space with no interpretable or controllable semantics. Although there has been significant research effort to improve image sample quality, there is little work on representation-enabled controllable generation using diffusion models. Specifically, controllable counterfactual generation using DPMs has been an underexplored area. In this work, we propose CausalDiffAE, a diffusion-based causal representation learning framework to enable counterfactual generation according to a specified causal model. We encode the high-dimensional image into a low-dimensional representation corresponding to causally related semantic factors. We model causal dependencies among latent variables using neural structural causal models and ensure their disentanglement via an alignment prior. Given a pre-trained CausalDiffAE, we propose a DDIM-based counterfactual generation procedure subject to do-interventions. We empirically show that CausalDiffAE learns a disentangled latent space and is capable of generating high-quality counterfactual images.
2023
- arXiv PreprintFrom Identifiable Causal Representations to Controllable Counterfactual Generation: A Survey on Causal Generative ModelingAneesh Komanduri, Xintao Wu, Yongkai Wu, and 1 more author
Deep generative models have shown tremendous success in data density estimation and data generation from finite samples. While these models have shown impressive performance by learning correlations among features in the data, some fundamental shortcomings are their lack of explainability, the tendency to induce spurious correlations, and poor out-of-distribution extrapolation. In an effort to remedy such challenges, one can incorporate the theory of causality in deep generative modeling. Structural causal models (SCMs) describe data-generating processes and model complex causal relationships and mechanisms among variables in a system. Thus, SCMs can naturally be combined with deep generative models. Causal models offer several beneficial properties to deep generative models, such as distribution shift robustness, fairness, and interpretability. We provide a technical survey on causal generative modeling categorized into causal representation learning and controllable counterfactual generation methods. We focus on fundamental theory, formulations, drawbacks, datasets, metrics, and applications of causal generative models in fairness, privacy, out-of-distribution generalization, and precision medicine. We also discuss open problems and fruitful research directions for future work in the field.
- CRL@NeurIPSLearning Causally Disentangled Representations via the Principle of Independent Causal MechanismsAneesh Komanduri, Yongkai Wu, Feng Chen, and 1 more author
Learning disentangled causal representations is a challenging problem that has gained significant attention recently due to its implications for extracting meaningful information for downstream tasks. In this work, we define a new notion of causal disentanglement from the perspective of independent causal mechanisms. We propose ICM-VAE, a framework for learning causally disentangled representations supervised by causally related observed labels. We model causal mechanisms using learnable flow-based diffeomorphic functions to map noise variables to latent causal variables. Further, to promote the disentanglement of causal factors, we propose a causal disentanglement prior that utilizes the known causal structure to encourage learning a causally factorized distribution in the latent space. Under relatively mild conditions, we provide theoretical results showing the identifiability of causal factors and mechanisms up to permutation and elementwise reparameterization. We empirically demonstrate that our framework induces highly disentangled causal factors, improves interventional robustness, and is compatible with counterfactual generation.
2022
- BigData 2022SCM-VAE: Learning Identifiable Causal Representations via Structural KnowledgeAneesh Komanduri, Yongkai Wu, Wen Huang, and 2 more authors
The goal of causal representation learning is to map low-level observations to high-level causal concepts to learn interpretable and robust representations for various downstream tasks. Latent variable models such as the variational autoencoder (VAE) are frequently leveraged to learn disentangled representations. However, there are often complex non-linear causal relationships underlying the observed data that cannot be captured through disentangled representations or linear dependence assumptions. Further, an independent conditional prior assumption can make learning causal dependencies in the latent space more challenging. We propose a framework, coined SCM-VAE, which uses apriori causal knowledge, a structural causal prior, and a non-linear additive noise structural causal model (SCM) to learn independent causal mechanisms and identifiable causal representations. We conduct theoretical analysis and perform experiments on synthetic and real-world datasets to show the improved quality of learned causal representations and robustness under interventions.
2021
- ICMLA 2021Neighborhood Random Walk Graph Sampling for Regularized Bayesian Graph Convolutional Neural NetworksAneesh Komanduri, and Justin Zhan
In the modern age of social media and networks, graph representations of real-world phenomena have become an incredibly useful source to mine insights. Often, we are interested in understanding how entities in a graph are interconnected. The Graph Neural Network (GNN) has proven to be a very useful tool in a variety of graph learning tasks including node classification, link prediction, and edge classification. However, in most of these tasks, the graph data we are working with may be noisy and may contain spurious edges. That is, there is a lot of uncertainty associated with the underlying graph structure. Recent approaches to modeling uncertainty have been to use a Bayesian framework and view the graph as a random variable with probabilities associated with model parameters. Introducing the Bayesian paradigm to graph-based models, specifically for semi-supervised node classification, has been shown to yield higher classification accuracies. However, the method of graph inference proposed in recent work does not take into account the structure of the graph. In this paper, we propose a novel algorithm called Bayesian Graph Convolutional Network using Neighborhood Random Walk Sampling (BGCN-NRWS), which uses a Markov Chain Monte Carlo (MCMC) based graph sampling algorithm utilizing graph structure, reduces overfitting by using a variational inference layer, and yields consistently competitive classification results compared to the state-of-the-art in semi-supervised node classification.
- arXiv PreprintA Comparative Study of Transformer-Based Language Models on Extractive Question AnsweringKate Pearce, Tiffany Zhan, Aneesh Komanduri, and 1 more author
Question Answering (QA) is a task in natural language processing that has seen considerable growth after the advent of transformers. There has been a surge in QA datasets that have been proposed to challenge natural language processing models to improve human and existing model performance. Many pre-trained language models have proven to be incredibly effective at the task of extractive question answering. However, generalizability remains as a challenge for the majority of these models. That is, some datasets require models to reason more than others. In this paper, we train various pre-trained language models and fine-tune them on multiple question answering datasets of varying levels of difficulty to determine which of the models are capable of generalizing the most comprehensively across different datasets. Further, we propose a new architecture, BERT-BiLSTM, and compare it with other language models to determine if adding more bidirectionality can improve model performance. Using the F1-score as our metric, we find that the RoBERTa and BART pre-trained models perform the best across all datasets and that our BERT-BiLSTM model outperforms the baseline BERT model.