Cancel
CreatorsPublishersAdvertisers
View more in
Science

Stochastic Intervention for Causal Effect Estimation

By Tri Dung Duong, Qian Li, Guandong Xu
arxiv.org
 22 days ago

Causal inference methods are widely applied in various decision-making domains such as precision medicine, optimal policy and economics. Central to these applications is the treatment effect estimation of intervention strategies. Current estimation methods are mostly restricted to the deterministic treatment, which however, is unable to address the stochastic space treatment policies. Moreover, previous methods can only make binary yes-or-no decisions based on the treatment effect, lacking the capability of providing fine-grained effect estimation degree to explain the process of decision making. In our study, we therefore advance the causal inference research to estimate stochastic intervention effect by devising a new stochastic propensity score and stochastic intervention effect estimator (SIE). Meanwhile, we design a customized genetic algorithm specific to stochastic intervention effect (Ge-SIO) with the aim of providing causal evidence for decision making. We provide the theoretical analysis and conduct an empirical study to justify that our proposed measures and algorithms can achieve a significant performance lift in comparison with state-of-the-art baselines.

arxiv.org
IN THIS ARTICLE
#Estimator#Ijcnn#Lg#Machine Learning
YOU MAY ALSO LIKE
News Break
Health
News Break
Artificial Intelligence
News Break
Science
Related
Sciencearxiv.org

Inferring Granger Causality from Irregularly Sampled Time Series

Continuous, automated surveillance systems that incorporate machine learning models are becoming increasingly more common in healthcare environments. These models can capture temporally dependent changes across multiple patient variables and can enhance a clinician's situational awareness by providing an early warning alarm of an impending adverse event such as sepsis. However, most commonly used methods, e.g., XGBoost, fail to provide an interpretable mechanism for understanding why a model produced a sepsis alarm at a given time. The black-box nature of many models is a severe limitation as it prevents clinicians from independently corroborating those physiologic features that have contributed to the sepsis alarm. To overcome this limitation, we propose a generalized linear model (GLM) approach to fit a Granger causal graph based on the physiology of several major sepsis-associated derangements (SADs). We adopt a recently developed stochastic monotone variational inequality-based estimator coupled with forwarding feature selection to learn the graph structure from both continuous and discrete-valued as well as regularly and irregularly sampled time series. Most importantly, we develop a non-asymptotic upper bound on the estimation error for any monotone link function in the GLM. We conduct real-data experiments and demonstrate that our proposed method can achieve comparable performance to popular and powerful prediction methods such as XGBoost while simultaneously maintaining a high level of interpretability.
Sciencearxiv.org

Graph Infomax Adversarial Learning for Treatment Effect Estimation with Networked Observational Data

Treatment effect estimation from observational data is a critical research topic across many domains. The foremost challenge in treatment effect estimation is how to capture hidden confounders. Recently, the growing availability of networked observational data offers a new opportunity to deal with the issue of hidden confounders. Unlike networked data in traditional graph learning tasks, such as node classification and link detection, the networked data under the causal inference problem has its particularity, i.e., imbalanced network structure. In this paper, we propose a Graph Infomax Adversarial Learning (GIAL) model for treatment effect estimation, which makes full use of the network structure to capture more information by recognizing the imbalance in network structure. We evaluate the performance of our GIAL model on two benchmark datasets, and the results demonstrate superiority over the state-of-the-art methods.
Sciencephysiciansweekly.com

Estimation of pro-renin

The aim is To estimate serum pro-renin, and its clinical significance, as a marker of chronic renal disease in posterior urethral valve (PUV) patients. Forty patients with a PUV that were admitted to the hospital between 2010 and 2012 were reviewed. Twenty age-matched patients who were admitted for other non-urological diseases were selected for control. Clinical parameters, serum creatinine, urea, eGFR (estimated glomerular filtration rate) and serum pro-renin were analysed before and after valve ablation. Forty patients with PUV were included in the study. Three groups were formed according to age: <1 year, 1–3 years, >3 years. Pro-renin was measured using an ELISA (enzyme linked immunosorbent assay) kit and ‘Graph Pad Prism’ Software. The Spearman’s rho test was used for correlation. Serum pro-renin had a negative correlation with the age group (correlation coefficient −0.395, P-value 0.012), eGFR (correlation coefficient −0.850, P-value < 0.001) and follow-up eGFR (correlation coefficient −0.471, P-value 0.002). The pro-renin level correlated positively with serum creatinine at presentation (correlation coefficient 0.671, P-value < 0.001), blood urea at initial presentation (correlation coefficient 0.684, P-value < 0.001), serum creatinine at follow-up (correlation coefficient 0.546, P-value < 0.001) and blood urea at follow-up (correlation 0.603, P-value < 0.001).
Sciencearxiv.org

On Inductive Biases for Heterogeneous Treatment Effect Estimation

We investigate how to exploit structural similarities of an individual's potential outcomes (POs) under different treatments to obtain better estimates of conditional average treatment effects in finite samples. Especially when it is unknown whether a treatment has an effect at all, it is natural to hypothesize that the POs are similar - yet, some existing strategies for treatment effect estimation employ regularization schemes that implicitly encourage heterogeneity even when it does not exist and fail to fully make use of shared structure. In this paper, we investigate and compare three end-to-end learning strategies to overcome this problem - based on regularization, reparametrization and a flexible multi-task architecture - each encoding inductive bias favoring shared behavior across POs. To build understanding of their relative strengths, we implement all strategies using neural networks and conduct a wide range of semi-synthetic experiments. We observe that all three approaches can lead to substantial improvements upon numerous baselines and gain insight into performance differences across various experimental settings.
Sciencearxiv.org

Bayesian graphical modelling for heterogeneous causal effects

Our motivation stems from current medical research aiming at personalized treatment using a molecular-based approach. The goal is to develop a more precise and targeted decision making process, relative to traditional treatments based primarily on clinical diagnoses. A challenge we address is evaluating treatment effects for individuals affected by Glioblastoma (GBM), a brain cancer where targeted therapy is essential to improve patients' prospects. Specifically, we consider the pathway associated to cytokine TGF-beta, whose abnormal signalling activity has been found to be linked to the progression of GBM and other tumors. We analyze treatment effects within a causal framework represented by a Directed Acyclic Graph (DAG) model, whose vertices are the variables belonging to the TGF-beta pathway. A major obstacle in implementing the above program is represented by individual heterogeneity, implying that patients will respond differently to the same therapy. We address this issue through an infinite mixture of Gaussian DAG-models where both the graphical structure as well as the allied model parameters are regarded as uncertain. Our procedure determines a clustering structure of the units reflecting the underlying heterogeneity, and produces subject-specific causal effects through Bayesian model averaging across a variety of model features. When applied to the GBM dataset, it reveals that regulation of TGF-beta proteins produces heterogeneous effects, represented by clusters of patients potentially benefiting from selective interventions.
ScienceAPS physics

Self-driven criticality in a stochastic epidemic model

We present a generic epidemic model with stochastic parameters in which the dynamics self-organize to a critical state with suppressed exponential growth. More precisely, the dynamics evolve into a quasi-steady state, where the effective reproduction rate fluctuates close to the critical value 1 for a long period, as indeed observed for different epidemics. The main assumptions underlying the model are that the rate at which each individual becomes infected changes stochastically in time with a heavy-tailed steady state. The critical regime is characterized by an extremely long duration of the epidemic. Its stability is analyzed both numerically and analytically in different models.
Sciencearxiv.org

Stochastic EM methods with Variance Reduction for Penalised PET Reconstructions

Expectation-maximization (EM) is a popular and well-established method for image reconstruction in positron emission tomography (PET) but it often suffers from slow convergence. Ordered subset EM (OSEM) is an effective reconstruction algorithm that provides significant acceleration during initial iterations, but it has been observed to enter a limit cycle. In this work, we investigate two classes of algorithms for accelerating OSEM based on variance reduction for penalised PET reconstructions. The first is a stochastic variance reduced EM algorithm, termed as SVREM, an extension of the classical EM to the stochastic context, by combining classical OSEM with insights from variance reduction techniques for gradient descent. The second views OSEM as a preconditioned stochastic gradient ascent, and applies variance reduction techniques, i.e., SAGA and SVRG, to estimate the update direction. We present several numerical experiments to illustrate the efficiency and accuracy of the approaches. The numerical results show that these approaches significantly outperform existing OSEM type methods for penalised PET reconstructions, and hold great potential.
Coding & Programmingarxiv.org

Minibatch and Momentum Model-based Methods for Stochastic Non-smooth Non-convex Optimization

Stochastic model-based methods have received increasing attention lately due to their appealing robustness to the stepsize selection and provable efficiency guarantee for non-smooth non-convex optimization. To further improve the performance of stochastic model-based methods, we make two important extensions. First, we propose a new minibatch algorithm which takes a set of samples to approximate the model function in each iteration. For the first time, we show that stochastic algorithms achieve linear speedup over the batch size even for non-smooth and non-convex problems. To this end, we develop a novel sensitivity analysis of the proximal mapping involved in each algorithm iteration. Our analysis can be of independent interests in more general settings. Second, motivated by the success of momentum techniques for convex optimization, we propose a new stochastic extrapolated model-based method to possibly improve the convergence in the non-smooth and non-convex setting. We obtain complexity guarantees for a fairly flexible range of extrapolation term. In addition, we conduct experiments to show the empirical advantage of our proposed methods.
Mathematicsarxiv.org

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of loss values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity.
Sciencearxiv.org

Causal aggregation: estimation and inference of causal effects by constraint-based data fusion

Randomized experiments are the gold standard for causal inference. In experiments, usually one variable is manipulated and its effect is measured on an outcome. However, practitioners may also be interested in the effect on a fixed target variable of simultaneous interventions on multiple covariates. We propose a novel method that allows to estimate the effect of joint interventions using data from different experiments in which only very few variables are manipulated. If the joint causal effect is linear, the proposed method can be used for estimation and inference of joint causal effects, and we characterize conditions for identifiability. The proposed method allows to combine data sets arising from randomized experiments as well as observational data sets for which IV assumptions or unconfoundedness hold: we indicate how to leverage all the available causal information to efficiently estimate the causal effects in the overidentified setting. If the dimension of the covariate vector is large, we may have data from experiments on every covariate, but only a few samples per randomized covariate. Under a sparsity assumption, we derive an estimator of the causal effects in this high-dimensional scenario. In addition, we show how to deal with the case where a lack of experimental constraints prevents direct estimation of the causal effects. When the joint causal effects are non-linear, we characterize conditions under which identifiability holds, and propose a non-linear causal aggregation methodology for experimental data sets similar to the gradient boosting algorithm where in each iteration we combine weak learners trained on different datasets using only unconfounded samples. We demonstrate the effectiveness of the proposed method on simulated and semi-synthetic data.
Sciencearxiv.org

Stochastic Models of Neural Plasticity: A Scaling Approach

In neuroscience, synaptic plasticity refers to the set of mechanisms driving the dynamics of neuronal connections, called synapses and represented by a scalar value, the synaptic weight. A Spike-Timing Dependent Plasticity (STDP) rule is a biologically-based model representing the time evolution of the synaptic weight as a functional of the past spiking activity of adjacent neurons. A general mathematical framework has been introduced in [37].
Coding & Programmingarxiv.org

An Online Riemannian PCA for Stochastic Canonical Correlation Analysis

We present an efficient stochastic algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices. We show how this reparametrization (into structured matrices), simple in hindsight, directly presents an opportunity to repurpose/adjust mature techniques for numerical optimization on Riemannian manifolds. Our developments nicely complement existing methods for this problem which either require $O(d^3)$ time complexity per iteration with $O(\frac{1}{\sqrt{t}})$ convergence rate (where $d$ is the dimensionality) or only extract the top $1$ component with $O(\frac{1}{t})$ convergence rate. In contrast, our algorithm offers a strict improvement for this classical problem: it achieves $O(d^2k)$ runtime complexity per iteration for extracting the top $k$ canonical components with $O(\frac{1}{t})$ convergence rate. While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising. We also explore a potential application in training fair models where the label of protected attribute is missing or otherwise unavailable.
Computersarxiv.org

BayesIMP: Uncertainty Quantification for Causal Data Fusion

While causal models are becoming one of the mainstays of machine learning, the problem of uncertainty quantification in causal inference remains challenging. In this paper, we study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable. As data arises from multiple sources and can vary in quality and quantity, principled uncertainty quantification becomes essential. To that end, we introduce Bayesian Interventional Mean Processes, a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space, while taking into account the uncertainty within each causal graph. To demonstrate the utility of our uncertainty estimation, we apply our method to the Causal Bayesian Optimisation task and show improvements over state-of-the-art methods.
Politicsarxiv.org

Methodological considerations for estimating policy effects in the context of co-occurring policies

Beth Ann Griffin, Megan S. Schuler, Joseph Pane, Stephen W. Patrick, Rosanna Smart, Bradley D. Stein, Geoffrey Grimm, Elizabeth A. Stuart. Objective. Understanding how best to estimate state-level policy effects is important, and several unanswered questions remain, particularly about optimal methods for disentangling the effects of concurrently implemented policies. In this paper, we examined the impact of co-occurring policies on the performance of commonly used models in state policy evaluations.
Coding & Programmingarxiv.org

Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

Alexander Camuto, George Deligiannidis, Murat A. Erdogdu, Mert Gürbüzbalaban, Umut Şimşekli, Lingjiong Zhu. Understanding generalization in deep learning has been one of the major challenges in statistical learning theory over the last decade. While recent work has illustrated that the dataset and the training algorithm must be taken into account in order to obtain meaningful generalization bounds, it is still theoretically not clear which properties of the data and the algorithm determine the generalization performance. In this study, we approach this problem from a dynamical systems theory perspective and represent stochastic optimization algorithms as random iterated function systems (IFS). Well studied in the dynamical systems literature, under mild assumptions, such IFSs can be shown to be ergodic with an invariant measure that is often supported on sets with a fractal structure. As our main contribution, we prove that the generalization error of a stochastic optimization algorithm can be bounded based on the `complexity' of the fractal structure that underlies its invariant measure. Leveraging results from dynamical systems theory, we show that the generalization error can be explicitly linked to the choice of the algorithm (e.g., stochastic gradient descent -- SGD), algorithm hyperparameters (e.g., step-size, batch-size), and the geometry of the problem (e.g., Hessian of the loss). We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden-layered neural networks) and algorithms (e.g., SGD and preconditioned variants), and obtain analytical estimates for our bound.For modern neural networks, we develop an efficient algorithm to compute the developed bound and support our theory with various experiments on neural networks.
Coding & Programmingarxiv.org

Safe Reinforcement Learning Using Advantage-Based Intervention

Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention. Our code is available at this https URL.
Mathematicsarxiv.org

Fréchet derivatives of expected functionals of solutions to stochastic differential equations

In the analysis of stochastic dynamical systems described by stochastic differential equations (SDEs), it is often of interest to analyse the sensitivity of the expected value of a functional of the solution of the SDE with respect to perturbations in the SDE parameters. In this paper, we consider path functionals that depend on the solution of the SDE up to a stopping time. We derive formulas for Fréchet derivatives of the expected values of these functionals with respect to bounded perturbations of the drift, using the Cameron-Martin-Girsanov theorem for the change of measure. Using these derivatives, we construct an example to show that the map that sends the change of drift to the corresponding relative entropy is not in general convex. We then analyse the existence and uniqueness of solutions to stochastic optimal control problems defined on possibly random time intervals, as well as gradient-based numerical methods for solving such problems.
Computersarxiv.org

Counterfactual Explanations as Interventions in Latent Space

Explainable Artificial Intelligence (XAI) is a set of techniques that allows the understanding of both technical and non-technical aspects of Artificial Intelligence (AI) systems. XAI is crucial to help satisfying the increasingly important demand of \emph{trustworthy} Artificial Intelligence, characterized by fundamental characteristics such as respect of human autonomy, prevention of harm, transparency, accountability, etc. Within XAI techniques, counterfactual explanations aim to provide to end users a set of features (and their corresponding values) that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations, and in particular they fall short of considering the causal impact of such actions. In this paper, we present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations capturing by design the underlying causal relations from the data, and at the same time to provide feasible recommendations to reach the proposed profile. Moreover, our methodology has the advantage that it can be set on top of existing counterfactuals generator algorithms, thus minimising the complexity of imposing additional causal constrains. We demonstrate the effectiveness of our approach with a set of different experiments using synthetic and real datasets (including a proprietary dataset of the financial domain).
Coding & Programmingarxiv.org

Augmented Tensor Decomposition with Stochastic Optimization

Tensor decompositions are powerful tools for dimensionality reduction and feature interpretation of multidimensional data such as signals. Existing tensor decomposition objectives (e.g., Frobenius norm) are designed for fitting raw data under statistical assumptions, which may not align with downstream classification tasks. Also, real-world tensor data are usually high-ordered and have large dimensions with millions or billions of entries. Thus, it is expensive to decompose the whole tensor with traditional algorithms. In practice, raw tensor data also contains redundant information while data augmentation techniques may be used to smooth out noise in samples. This paper addresses the above challenges by proposing augmented tensor decomposition (ATD), which effectively incorporates data augmentations to boost downstream classification. To reduce the memory footprint of the decomposition, we propose a stochastic algorithm that updates the factor matrices in a batch fashion. We evaluate ATD on multiple signal datasets. It shows comparable or better performance (e.g., up to 15% in accuracy) over self-supervised and autoencoder baselines with less than 5% of model parameters, achieves 0.6% ~ 1.3% accuracy gain over other tensor-based baselines, and reduces the memory footprint by 9X when compared to standard tensor decomposition algorithms.
Computersarxiv.org

Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

Aleksandr Beznosikov, Pavel Dvurechensky, Anastasia Koloskova, Valentin Samokhin, Sebastian U Stich, Alexander Gasnikov. We consider decentralized stochastic variational inequalities where the problem data is distributed across many participating devices (heterogeneous, or non-IID data setting). We propose a novel method - based on stochastic extra-gradient - where participating devices can communicate over arbitrary, possibly time-varying network topologies. This covers both the fully decentralized optimization setting and the centralized topologies commonly used in Federated Learning. Our method further supports multiple local updates on the workers for reducing the communication frequency between workers. We theoretically analyze the proposed scheme in the strongly monotone, monotone and non-monotone setting. As a special case, our method and analysis apply in particular to decentralized stochastic min-max problems which are being studied with increased interest in Deep Learning. For example, the training objective of Generative Adversarial Networks (GANs) are typically saddle point problems and the decentralized training of GANs has been reported to be extremely challenging. While SOTA techniques rely on either repeated gossip rounds or proximal updates, we alleviate both of these requirements. Experimental results for decentralized GAN demonstrate the effectiveness of our proposed algorithm.