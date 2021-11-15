ContributorsPublishersAdvertisers
Scalable Intervention Target Estimation in Linear Models

By Burak Varici, Karthikeyan Shanmugam, Prasanna Sattigeri, Ali Tajer
 5 days ago

This paper considers the problem of estimating the unknown intervention targets in a causal directed acyclic graph from observational and interventional data. The focus is on soft interventions in linear structural equation models (SEMs). Current approaches to causal structure learning either work with...

Enumerating Independent Linear Inferences

A linear inference is a valid inequality of Boolean algebra in which each variable occurs at most once on each side. Equivalently, it is a linear rewrite rule on Boolean terms that constitutes a valid implication. Linear inferences have played a significant role in structural proof theory, in particular in models of substructural logics and in normalisation arguments for deep inference proof systems.
Exponential Dichotomy for Noninvertible Linear Difference Equations

In this article we study exponential dichotomies for noninvertible linear difference equations in finite dimensions. After giving the definition, we study the extent to which the projection $P(k)$ in a dichotomy is unique. For equations on $\mathbb{Z}$ it is unique but for equations on $\mathbb{Z}_+$ only its range is unique and for $\mathbb{Z}_-$ only its nullspace.
Scalable Qubit Representations of Neutrino Mixing Matrices

Oscillating neutrino beams exhibit quantum coherence over distances of thousands of kilometers. Their unambiguously quantum nature suggests an appealing test system for direct quantum simulation. Such techniques may enable presently analytically intractable calculations involving multi-neutrino entanglements, such as collective neutrino oscillations in supernovae, but only once oscillation phenomenology is properly re-expressed in the language of quantum circuits. Here we resolve outstanding conceptual issues regarding encoding of arbitrarily mixed neutrino flavor states in the Hilbert space of an n-qubit quantum computer. We introduce algorithms to encode mixing and oscillation of any number of flavor-mixed neutrinos, both with and without CP-violation, with an efficient number of prescriptive input parameters in terms of sub-rotations of the PMNS matrix in standard form. Examples encoded for an IBM-Q quantum computer are shown to converge to analytic predictions both with and without CP-violation.
Function-on-function linear quantile regression

In this study, we propose a function-on-function linear quantile regression model that allows for more than one functional predictor to establish a more flexible and robust approach. The proposed model is first transformed into a finite-dimensional space via the functional principal component analysis paradigm in the estimation phase. It is then approximated using the estimated functional principal component functions, and the estimated parameter of the quantile regression model is constructed based on the principal component scores. In addition, we propose a Bayesian information criterion to determine the optimum number of truncation constants used in the functional principal component decomposition. Moreover, a stepwise forward procedure and the Bayesian information criterion are used to determine the significant predictors for including in the model. We employ a nonparametric bootstrap procedure to construct prediction intervals for the response functions. The finite sample performance of the proposed method is evaluated via several Monte Carlo experiments and an empirical data example, and the results produced by the proposed method are compared with the ones from existing models.
Metagenome2Vec: Building Contextualized Representations for Scalable Metagenome Analysis

Sathyanarayanan N. Aakur, Vineela Indla, Vennela Indla, Sai Narayanan, Arunkumar Bagavathi, Vishalini Laguduva Ramnath, Akhilesh Ramachandran. Advances in next-generation metagenome sequencing have the potential to revolutionize the point-of-care diagnosis of novel pathogen infections, which could help prevent potential widespread transmission of diseases. Given the high volume of metagenome sequences, there is a need for scalable frameworks to analyze and segment metagenome sequences from clinical samples, which can be highly imbalanced. There is an increased need for learning robust representations from metagenome reads since pathogens within a family can have highly similar genome structures (some more than 90%) and hence enable the segmentation and identification of novel pathogen sequences with limited labeled data. In this work, we propose Metagenome2Vec - a contextualized representation that captures the global structural properties inherent in metagenome data and local contextualized properties through self-supervised representation learning. We show that the learned representations can help detect six (6) related pathogens from clinical samples with less than 100 labeled sequences. Extensive experiments on simulated and clinical metagenome data show that the proposed representation encodes compositional properties that can generalize beyond annotations to segment novel pathogens in an unsupervised setting.
MLHarness: A Scalable Benchmarking System for MLCommons

With the society's growing adoption of machine learning (ML) and deep learning (DL) for various intelligent solutions, it becomes increasingly imperative to standardize a common set of measures for ML/DL models with large scale open datasets under common development practices and resources so that people can benchmark and compare models quality and performance on a common ground. MLCommons has emerged recently as a driving force from both industry and academia to orchestrate such an effort. Despite its wide adoption as standardized benchmarks, MLCommons Inference has only included a limited number of ML/DL models (in fact seven models in total). This significantly limits the generality of MLCommons Inference's benchmarking results because there are many more novel ML/DL models from the research community, solving a wide range of problems with different inputs and outputs modalities. To address such a limitation, we propose MLHarness, a scalable benchmarking harness system for MLCommons Inference with three distinctive features: (1) it codifies the standard benchmark process as defined by MLCommons Inference including the models, datasets, DL frameworks, and software and hardware systems; (2) it provides an easy and declarative approach for model developers to contribute their models and datasets to MLCommons Inference; and (3) it includes the support of a wide range of models with varying inputs/outputs modalities so that we can scalably benchmark these models across different datasets, frameworks, and hardware systems. This harness system is developed on top of the MLModelScope system, and will be open sourced to the community. Our experimental results demonstrate the superior flexibility and scalability of this harness system for MLCommons Inference benchmarking.
Fast and Scalable Spike and Slab Variable Selection in High-Dimensional Gaussian Processes

Variable selection in Gaussian processes (GPs) is typically undertaken by thresholding the inverse lengthscales of `automatic relevance determination' kernels, but in high-dimensional datasets this approach can be unreliable. A more probabilistically principled alternative is to use spike and slab priors and infer a posterior probability of variable inclusion. However, existing implementations in GPs are extremely costly to run in both high-dimensional and large-$n$ datasets, or are intractable for most kernels. As such, we develop a fast and scalable variational inference algorithm for the spike and slab GP that is tractable with arbitrary differentiable kernels. We improve our algorithm's ability to adapt to the sparsity of relevant variables by Bayesian model averaging over hyperparameters, and achieve substantial speed ups using zero temperature posterior restrictions, dropout pruning and nearest neighbour minibatching. In experiments our method consistently outperforms vanilla and sparse variational GPs whilst retaining similar runtimes (even when $n=10^6$) and performs competitively with a spike and slab GP using MCMC but runs up to $1000$ times faster.
Neutron star properties with careful parameterization in the (axial)vector meson extended linear sigma model

The existence of quark matter inside the cores of heavy neutron stars is a possibility which can be probed with modern astrophysical observations. We use an (axial)vector meson extended quark-meson model to describe quark matter in the core of neutron stars. We discover that an additional parameter constraint is necessary in the quark model to ensure chiral restoration at high densities. By investigating hybrid star sequences with various parameter sets we show that low sigma meson masses are needed to fulfill the upper radius constraints, and that the maximum mass of stable hybrid stars is only slightly dependent on the parameters of the crossover-type phase transition. Using this observation and results from recent astrophysical measurements a constraint of 2.6 < g_V < 4.3 is set for the constituent quark - vector meson coupling. The effect of a nonzero bag constant is also investigated and we observe that its effect is small for values adopted in previous works.
Inferential SIR-GN: Scalable Graph Representation Learning

Graph representation learning methods generate numerical vector representations for the nodes in a network, thereby enabling their use in standard machine learning models. These methods aim to preserve relational information, such that nodes that are similar in the graph are found close to one another in the representation space. Similarity can be based largely on one of two notions: connectivity or structural role. In tasks where node structural role is important, connectivity based methods show poor performance. Recent work has begun to focus on scalability of learning methods to massive graphs of millions to billions of nodes and edges. Many unsupervised node representation learning algorithms are incapable of scaling to large graphs, and are unable to generate node representations for unseen nodes. In this work, we propose Inferential SIR-GN, a model which is pre-trained on random graphs, then computes node representations rapidly, including for very large networks. We demonstrate that the model is able to capture node's structural role information, and show excellent performance at node and graph classification tasks, on unseen networks. Additionally, we observe the scalability of Inferential SIR-GN is comparable to the fastest current approaches for massive graphs.
Rethinking Deconvolution for 2D Human Pose Estimation Light yet Accurate Model for Real-time Edge Computing

In this study, we present a pragmatic lightweight pose estimation model. Our model can achieve real-time predictions using low-power embedded devices. This system was found to be very accurate and achieved a 94.5% accuracy of SOTA HRNet 256x192 using a computational cost of only 3.8% on COCO test dataset. Our model adopts an encoder-decoder architecture and is carefully downsized to improve its efficiency. We especially focused on optimizing the deconvolution layers and observed that the channel reduction of the deconvolution layers contributes significantly to reducing computational resource consumption without degrading the accuracy of this system. We also incorporated recent model agnostic techniques such as DarkPose and distillation training to maximize the efficiency of our model. Furthermore, we applied model quantization to exploit multi/mixed precision features. Our FP16'ed model (COCO AP 70.0) operates at ~60-fps on NVIDIA Jetson AGX Xavier and ~200 fps on NVIDIA Quadro RTX6000.
Comparison of linear Brill and Teukolsky waves

Motivated by studies of critical phenomena in the gravitational collapse of vacuum gravitational waves we compare, at the linear level, two common approaches to constructing gravitational-wave initial data. Specifically, we construct analytical, linear Brill wave initial data and compare these with Teukolsky waves in an attempt to understand the different numerical behavior observed in dynamical (nonlinear) evolutions of these two different sets of data. In general, the Brill waves indeed feature higher multipole moments than the quadrupolar Teukolsky waves, which might have provided an explanation for the differences observed in the dynamical evolution of the two types of waves. However, we also find that, for a common choice of the Brill-wave seed function, all higher-order moments vanish identically, rendering the (linear) Brill initial data surprisingly similar to the Teukolsky data for a similarly common choice of its seed function.
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm

We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector $b\in\mathbb{R}^n$ and output a near minimizer to $\min_{x\in\mathbb{R}^d}\|Ax-b\|$, where $A\in\mathbb{R}^{n \times d}$ is a design matrix and $\|\cdot\|$ is some loss function. For $\ell_p$ norm regression for any...
Papaya: Practical, Private, and Scalable Federated Learning

Dzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat, Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish Srinivas, Kaikai Wang, Anthony Shoumikhin, Jesik Min, Mani Malek. Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the...
What is so general about Generalized Linear Model?

Generalized form of regression model for linear and non-linear data. We have used Linear Regression is many use cases to model the linear relationship between a scalar response variable and one or more explanatory variable. Linear regression makes some key assumptions like normality and constant variance of the response variable. So, what will happen if the response variable does not follow the “usual” assumptions like normality and constant variance? Generalized Linear Model (GLM) is one of the commonly used approaches for data transformation to tackle that issue. But the problem is GLM consists of lot of terms, notations and components. So, sometimes it is little bit confusing to grasp the idea. But don’t worry, I’m here to help you understand all the concepts clearly.
Graphical Piecewise-Linear Algebra

Graphical (Linear) Algebra is a family of diagrammatic languages allowing to reason about different kinds of subsets of vector spaces compositionally. It has been used to model various application domains, from signal-flow graphs to Petri nets and electrical circuits. In this paper, we introduce to the family its most expressive member to date: Graphical Piecewise-Linear Algebra, a new language to specify piecewise-linear subsets of vector spaces. Like the previous members of the family, it comes with a complete axiomatisation, which means it can be used to reason about the corresponding semantic domain purely equationally, forgetting the set-theoretic interpretation. We show completeness using a single axiom on top of Graphical Polyhedral Algebra, and show that this extension is the smallest that can capture a variety of relevant constructs. Finally, we showcase its use by modelling the behaviour of stateless electronic circuits of ideal elements, a domain that had remained outside the remit of previous diagrammatic languages.
Universal and data-adaptive algorithms for model selection in linear contextual bandits

Model selection in contextual bandits is an important complementary problem to regret minimization with respect to a fixed model class. We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem. Even in this instance, current state-of-the-art methods explore in a suboptimal manner and require strong "feature-diversity" conditions. In this paper, we introduce new algorithms that a) explore in a data-adaptive manner, and b) provide model selection guarantees of the form $\mathcal{O}(d^{\alpha} T^{1- \alpha})$ with no feature diversity conditions whatsoever, where $d$ denotes the dimension of the linear model and $T$ denotes the total number of rounds. The first algorithm enjoys a "best-of-both-worlds" property, recovering two prior results that hold under distinct distributional assumptions, simultaneously. The second removes distributional assumptions altogether, expanding the scope for tractable model selection. Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
Can Information Flows Suggest Targets for Interventions in Neural Circuits?

Motivated by neuroscientific and clinical applications, we empirically examine whether observational measures of information flow can suggest interventions. We do so by performing experiments on artificial neural networks in the context of fairness in machine learning, where the goal is to induce fairness in the system through interventions. Using our recently developed $M$-information flow framework, we measure the flow of information about the true label (responsible for accuracy, and hence desirable), and separately, the flow of information about a protected attribute (responsible for bias, and hence undesirable) on the edges of a trained neural network. We then compare the flow magnitudes against the effect of intervening on those edges by pruning. We show that pruning edges that carry larger information flows about the protected attribute reduces bias at the output to a greater extent. This demonstrates that $M$-information flow can meaningfully suggest targets for interventions, answering the title's question in the affirmative. We also evaluate bias-accuracy tradeoffs for different intervention strategies, to analyze how one might use estimates of desirable and undesirable information flows (here, accuracy and bias flows) to inform interventions that preserve the former while reducing the latter.
Double Control Variates for Gradient Estimation in Discrete Latent Variable Models

Stochastic gradient-based optimisation for discrete latent variable models is challenging due to the high variance of gradients. We introduce a variance reduction technique for score function estimators that makes use of double control variates. These control variates act on top of a main control variate, and try to further reduce the variance of the overall estimator. We develop a double control variate for the REINFORCE leave-one-out estimator using Taylor expansions. For training discrete latent variable models, such as variational autoencoders with binary latent variables, our approach adds no extra computational cost compared to standard training with the REINFORCE leave-one-out estimator. We apply our method to challenging high-dimensional toy examples and training variational autoencoders with binary latent variables. We show that our estimator can have lower variance compared to other state-of-the-art estimators.
All-optical scalable spatial coherent Ising machine

Networks of optical oscillators simulating coupled Ising spins have been recently proposed as a heuristic platform to solve hard optimization problems. These networks, called coherent Ising machines (CIMs), exploit the fact that the collective nonlinear dynamics of coupled oscillators can drive the system close to the global minimum of the classical Ising Hamiltonian, encoded in the coupling matrix of the network. To date, realizations of large-scale CIMs have been demonstrated using hybrid optical-electronic setups, where optical oscillators simulating different spins are subject to electronic feedback mechanisms emulating their mutual interaction. While the optical evolution ensures an ultrafast computation, the electronic coupling represents a bottleneck that causes the computational time to severely depend on the system size. Here, we propose an all-optical scalable CIM with fully-programmable coupling. Our setup consists of an optical parametric amplifier with a spatial light modulator (SLM) within the parametric cavity. The spin variables are encoded in the binary phases of the optical wavefront of the signal beam at different spatial points, defined by the pixels of the SLM. We first discuss how different coupling topologies can be achieved by different configurations of the SLM, and then benchmark our setup with a numerical simulation that mimics the dynamics of the proposed machine. In our proposal, both the spin dynamics and the coupling are fully performed in parallel, paving the way towards the realization of size-independent ultrafast optical hardware for large-scale computation purposes.
Prioritizing Scalability, Reliability and Security in Engineering

As digital products and services are more deeply embedded in critical industries and infrastructure and the implications of problems grow in scope, engineering organizations are renewing their focus on building platforms that are scalable, reliable and secure. More importantly, they are recognizing that this isn’t the responsibility of an SRE or CSO, it’s an ethos that must be carried each day by every engineer on the team. This is an important shift—from teams and cultures that valued speed of delivery above all else to teams and cultures that make high-quality systems the foundation of everything they do.
