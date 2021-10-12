CreatorsPublishersAdvertisers
Implicit Bias of Linear Equivariant Networks

By Hannah Lawrence, Kristian Georgiev, Andrew Dienes, Bobak T. Kiani
Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) which excel in a wide range of scientific and technical applications by explicitly encoding group symmetries, such as rotations and permutations, in their architectures. Although the success of G-CNNs is

Building a Linear Regression by Hand

Let’s use Python to create all of the equations required to estimate our own line and validate our results without relying on libraries to train our model!. W e employ linear regression to forecast the value of Y based on the value(s) of X. Because we need to know Y, it is a supervised learning approach. Linear regression is classified into two types: basic and multiple. Let’s start with the easy one. The notebook with all the codes is here. All the equations were made with LaTeX.
Resolution of The Linear-Bounded Automata Question

This work resolve a longstanding open question in automata theory, i.e. the {\it linear-bounded automata question} ( shortly, {\it LBA question}), which can also be phrased succinctly in the language of computational complexity theory as $NSPACE[n]\overset{?}{=}DSPACE[n]$. We prove that $NSPACE[n]\neq DSPACE[n]$. Our proof technique is based on diagonalization against all deterministic Turing machines working in $O(n)$ space. Our proof also implies the following consequences:
A conservative implicit-PIC scheme for the hybrid kinetic-ion fluid-electron plasma model on curvilinear meshes

The hybrid kinetic-ion fluid-electron plasma model is widely used to study challenging multi-scale problems in space and laboratory plasma physics. Here, a novel conservative scheme for this model employing implicit particle-in-cell techniques is extended to arbitrary coordinate systems via curvilinear maps from logical to physical space. The scheme features a fully non-linear electromagnetic formulation with a multi-rate time advance - including sub-cycling and orbit-averaging for the kinetic ions. By careful choice of compatible particle-based kinetic-ion and mesh-based fluid-electron discretizations in curvilinear coordinates, as well as particle-mesh interpolations and implicit midpoint time advance, the scheme is proven to conserve total energy for arbitrary curvilinear meshes. In the electrostatic limit, the method is also proven to conserve total momentum for arbitrary curvilinear meshes. Although momentum is not conserved for arbitrary curvilinear meshes in the electromagnetic case, it is for an important subset of Cartesian tensor-packed meshes. The scheme and its novel conservation properties are demonstrated for several challenging numerical problems using different curvilinear meshes, including a merging flux-rope simulation for a space weather application, and a helical $m=1$ mode simulation for magnetic fusion energy application.
Nonparametric Functional Analysis of Generalized Linear Models Under Nonlinear Constraints

This article introduces a novel nonparametric methodology for Generalized Linear Models which combines the strengths of the binary regression and latent variable formulations for categorical data, while overcoming their disadvantages. Requiring minimal assumptions, it extends recently published parametric versions of the methodology and generalizes it. If the underlying data generating process is asymmetric, it gives uniformly better prediction and inference performance over the parametric formulation. Furthermore, it introduces a new classification statistic utilizing which I show that overall, it has better model fit, inference and classification performance than the parametric version, and the difference in performance is statistically significant especially if the data generating process is asymmetric. In addition, the methodology can be used to perform model diagnostics for any model specification. This is a highly useful result, and it extends existing work for categorical model diagnostics broadly across the sciences. The mathematical results also highlight important new findings regarding the interplay of statistical significance and scientific significance. Finally, the methodology is applied to various real-world datasets to show that it may outperform widely used existing models, including Random Forests and Deep Neural Networks with very few iterations.
Sparsity in Partially Controllable Linear Systems

A fundamental concept in control theory is that of controllability, where any system state can be reached through an appropriate choice of control inputs. Indeed, a large body of classical and modern approaches are designed for controllable linear dynamical systems. However, in practice, we often encounter systems in which a large set of state variables evolve exogenously and independently of the control inputs; such systems are only \emph{partially controllable}. The focus of this work is on a large class of partially controllable linear dynamical systems, specified by an underlying sparsity pattern. Our main results establish structural conditions and finite-sample guarantees for learning to control such systems. In particular, our structural results characterize those state variables which are irrelevant for optimal control, an analysis which departs from classical control techniques. Our algorithmic results adapt techniques from high-dimensional statistics -- specifically soft-thresholding and semiparametric least-squares -- to exploit the underlying sparsity pattern in order to obtain finite-sample guarantees that significantly improve over those based on certainty-equivalence. We also corroborate these theoretical improvements over certainty-equivalent control through a simulation study.
Can Facebook’s smart glasses be smart about security and privacy?

Facebook’s smart glasses ambitions are in the news again. The company has launched a worldwide project dubbed Ego4D to research new uses for smart glasses. In September, Facebook unveiled its Ray-Ban Stories glasses, which have two cameras and three microphones built in. The glasses capture audio and video so wearers can record their experiences and interactions. The research project aims to add augmented reality features to smart glasses using artificial intelligence technologies that could provide wearers with a wealth of information, including the ability to get answers to questions like “Where did I leave my keys?” Facebook’s vision also includes...
Learning Atomic Multipoles: Prediction of the Electrostatic Potential with Equivariant Graph Neural Networks

The accurate description of electrostatic interactions remains a challenging problem for fitted potential-energy functions. The commonly used fixed partial-charge approximation fails to reproduce the electrostatic potential at short range due to its insensitivity to conformational changes and anisotropic effects. At the same time, possibly more accurate machine-learned (ML) potentials struggle with the long-range behaviour due to their inherent locality ansatz. Employing a multipole expansion offers in principle an exact treatment of the electrostatic potential such that the long-range and short-range electrostatic interaction can be treated simultaneously with high accuracy. However, such an expansion requires the calculation of the electron density using computationally expensive quantum-mechanical (QM) methods. Here, we introduce an equivariant graph neural network (GNN) to address this issue. The proposed model predicts atomic multipoles up to the quadrupole, circumventing the need of expensive QM computations. By using an equivariant architecture, the model enforces the correct symmetry by design without relying on local reference frames. The GNN reproduces the electrostatic potential of various systems with high fidelity. Possible use cases for such an approach include the separate treatment of long-range interactions in ML potentials, the analysis of electrostatic potential surfaces, and the application in polarizable force fields.
HyperCube: Implicit Field Representations of Voxelized 3D Models

Recently introduced implicit field representations offer an effective way of generating 3D object shapes. They leverage implicit decoder trained to take a 3D point coordinate concatenated with a shape encoding and to output a value which indicates whether the point is outside the shape or not. Although this approach enables efficient rendering of visually plausible objects, it has two significant limitations. First, it is based on a single neural network dedicated for all objects from a training set which results in a cumbersome training procedure and its application in real life. More importantly, the implicit decoder takes only points sampled within voxels (and not the entire voxels) which yields problems at the classification boundaries and results in empty spaces within the rendered mesh.
Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs). In this setting, the agent works in two phases. In the exploration phase, the agent interacts with the environment and collects samples without the reward. In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy. We propose a new provably efficient algorithm, called UCRL-RFE under the Linear Mixture MDP assumption, where the transition probability kernel of the MDP can be parameterized by a linear function over certain feature mappings defined on the triplet of state, action, and next state. We show that to obtain an $\epsilon$-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most $\tilde O(H^5d^2\epsilon^{-2})$ episodes during the exploration phase. Here, $H$ is the length of the episode, $d$ is the dimension of the feature mapping. We also propose a variant of UCRL-RFE using Bernstein-type bonus and show that it needs to sample at most $\tilde O(H^4d(H + d)\epsilon^{-2})$ to achieve an $\epsilon$-optimal policy. By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least $\tilde \Omega(H^2d\epsilon^{-2})$ episodes to obtain an $\epsilon$-optimal policy. Our upper bound matches the lower bound in terms of the dependence on $\epsilon$ and the dependence on $d$ if $H \ge d$.
Modelling wave propagation in elastic solids via high-order accurate implicit-mesh discontinuous Galerkin methods

A high-order accurate implicit-mesh discontinuous Galerkin framework for wave propagation in single-phase and bi-phase solids is presented. The framework belongs to the embedded-boundary techniques and its novelty regards the spatial discretization, which enables boundary and interface conditions to be enforced with high-order accuracy on curved embedded geometries. High-order accuracy is achieved via high-order quadrature rules for implicitly-defined domains and boundaries, whilst a cell-merging strategy addresses the presence of small cut cells. The framework is used to discretize the governing equations of elastodynamics, written using a first-order hyperbolic momentum-strain formulation, and an exact Riemann solver is employed to compute the numerical flux at the interface between dissimilar materials with general anisotropic properties. The space-discretized equations are then advanced in time using explicit high-order Runge-Kutta algorithms. Several two- and three-dimensional numerical tests including dynamic adaptive mesh refinement are presented to demonstrate the high-order accuracy and the capability of the method in the elastodynamic analysis of single- and bi-phases solids containing complex geometries.
Sparse Implicit Processes for Approximate Inference

Implicit Processes (IPs) are flexible priors that can describe models such as Bayesian neural networks, neural samplers and data generators. IPs allow for approximate inference in function-space. This avoids some degenerate problems of parameter-space approximate inference due to the high number of parameters and strong dependencies. For this, an extra IP is often used to approximate the posterior of the prior IP. However, simultaneously adjusting the parameters of the prior IP and the approximate posterior IP is a challenging task. Existing methods that can tune the prior IP result in a Gaussian predictive distribution, which fails to capture important data patterns. By contrast, methods producing flexible predictive distributions by using another IP to approximate the posterior process cannot fit the prior IP to the observed data. We propose here a method that can carry out both tasks. For this, we rely on an inducing-point representation of the prior IP, as often done in the context of sparse Gaussian processes. The result is a scalable method for approximate inference with IPs that can tune the prior IP parameters to the data, and that provides accurate non-Gaussian predictive distributions.
Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

Stochastic gradient descent (SGD) has been demonstrated to generalize well in many deep learning applications. In practice, one often runs SGD with a geometrically decaying stepsize, i.e., a constant initial stepsize followed by multiple geometric stepsize decay, and uses the last iterate as the output. This kind of SGD is known to be nearly minimax optimal for classical finite-dimensional linear regression problems (Ge et al., 2019), and provably outperforms SGD with polynomially decaying stepsize in terms of the statistical minimax rates. However, a sharp analysis for the last iterate of SGD with decaying step size in the overparameterized setting is still open. In this paper, we provide problem-dependent analysis on the last iterate risk bounds of SGD with decaying stepsize, for (overparameterized) linear regression problems. In particular, for SGD with geometrically decaying stepsize (or tail geometrically decaying stepsize), we prove nearly matching upper and lower bounds on the excess risk. Our results demonstrate the generalization ability of SGD for a wide class of overparameterized problems, and can recover the minimax optimal results up to logarithmic factors in the classical regime. Moreover, we provide an excess risk lower bound for SGD with polynomially decaying stepsize and illustrate the advantage of geometrically decaying stepsize in an instance-wise manner, which complements the minimax rate comparison made in previous work.
De-biased Lasso for Generalized Linear Models with A Diverging Number of Covariates

Modeling and drawing inference on the joint associations between single nucleotide polymorphisms and a disease has sparked interest in genome-wide associations studies. In the motivating Boston Lung Cancer Survival Cohort (BLCSC) data, the presence of a large number of single nucleotide polymorphisms of interest, though smaller than the sample size, challenges inference on their joint associations with the disease outcome. In similar settings, we find that neither the de-biased lasso approach (van de Geer et al. 2014), which assumes sparsity on the inverse information matrix, nor the standard maximum likelihood method can yield confidence intervals with satisfactory coverage probabilities for generalized linear models. Under this "large $n$, diverging $p$" scenario, we propose an alternative de-biased lasso approach by directly inverting the Hessian matrix without imposing the matrix sparsity assumption, which further reduces bias compared to the original de-biased lasso and ensures valid confidence intervals with nominal coverage probabilities. We establish the asymptotic distributions of any linear combinations of the parameter estimates, which lays the theoretical ground for drawing inference. Simulations show that the proposed refined de-biased estimating method performs well in removing bias and yields honest confidence interval coverage. We use the proposed method to analyze the aforementioned BLCSC data, a large scale hospital-based epidemiology cohort study, that investigates the joint effects of genetic variants on lung cancer risks.
AIR-Net: Adaptive and Implicit Regularization Neural Network for Matrix Completion

Conventionally, the matrix completion (MC) model aims to recover a matrix from partially observed elements. Accurate recovery necessarily requires a regularization encoding priors of the unknown matrix/signal properly. However, encoding the priors accurately for the complex natural signal is difficult, and even then, the model might not generalize well outside the particular matrix type. This work combines adaptive and implicit low-rank regularization that captures the prior dynamically according to the current recovered matrix. Furthermore, we aim to answer the question: how does adaptive regularization affect implicit regularization? We utilize neural networks to represent Adaptive and Implicit Regularization and named the proposed model \textit{AIR-Net}. Theoretical analyses show that the adaptive part of the AIR-Net enhances implicit regularization. In addition, the adaptive regularizer vanishes at the end, thus can avoid saturation issues. Numerical experiments for various data demonstrate the effectiveness of AIR-Net, especially when the locations of missing elements are not randomly chosen. With complete flexibility to select neural networks for matrix representation, AIR-Net can be extended to solve more general inverse problems.
Toward Annotator Group Bias in Crowdsourcing

Crowdsourcing has emerged as a popular approach for collecting annotated data to train supervised machine learning models. However, annotator bias can lead to defective annotations. Though there are a few works investigating individual annotator bias, the group effects in annotators are largely overlooked. In this work, we reveal that annotators within the same demographic group tend to show consistent group bias in annotation tasks and thus we conduct an initial study on annotator group bias. We first empirically verify the existence of annotator group bias in various real-world crowdsourcing datasets. Then, we develop a novel probabilistic graphical framework GroupAnno to capture annotator group bias with a new extended Expectation Maximization (EM) training algorithm. We conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the effectiveness of our model in modeling annotator group bias in label aggregation and model learning over competitive baselines.
Sparse Linear Mixed Model Selection via Streamlined Variational Bayes

Linear mixed models are a versatile statistical tool to study data by accounting for fixed effects and random effects from multiple sources of variability. In many situations, a large number of candidate fixed effects is available and it is of interest to select a parsimonious subset of those being effectively relevant for predicting the response variable. Variational approximations facilitate fast approximate Bayesian inference for the parameters of a variety of statistical models, including linear mixed models. However, for models having a high number of fixed or random effects, simple application of standard variational inference principles does not lead to fast approximate inference algorithms, due to the size of model design matrices and inefficient treatment of sparse matrix problems arising from the required approximating density parameters updates. We illustrate how recently developed streamlined variational inference procedures can be generalized to make fast and accurate inference for the parameters of linear mixed models with nested random effects and global-local priors for Bayesian fixed effects selection. Our variational inference algorithms achieve convergence to the same optima of their standard implementations, although with significantly lower computational effort, memory usage and time, especially for large numbers of random effects. Using simulated and real data examples, we assess the quality of automated procedures for fixed effects selection that are free from hyperparameters tuning and only rely upon variational posterior approximations. Moreover, we show high accuracy of variational approximations against model fitting via Markov Chain Monte Carlo sampling.
Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?

Equivariance has emerged as a desirable property of representations of objects subject to identity-preserving transformations that constitute a group, such as translations and rotations. However, the expressivity of a representation constrained by group equivariance is still not fully understood. We address this gap by providing a generalization of Cover's Function Counting Theorem that quantifies the number of linearly separable and group-invariant binary dichotomies that can be assigned to equivariant representations of objects. We find that the fraction of separable dichotomies is determined by the dimension of the space that is fixed by the group action. We show how this relation extends to operations such as convolutions, element-wise nonlinearities, and global and local pooling. While other operations do not change the fraction of separable dichotomies, local pooling decreases the fraction, despite being a highly nonlinear operation. Finally, we test our theory on intermediate representations of randomly initialized and fully trained convolutional neural networks and find perfect agreement.
Understanding Lyapunov Equation through Kronecker Product and Linear Equation

The Lyapunov equation is a certain type of matrix equation, and it is very famous in many branches of control theory, such as stability analysis and optimal control. The terminology of the Lyapunov equation originates from the name of the Russian mathematician Aleksandr Lyapunov. He is known for his development and achievement of the stability theory of a dynamical system, as well as for his many contributions to mathematical physics and probability theory [1].
LPRules: Rule Induction in Knowledge Graphs Using Linear Programming

Knowledge graph (KG) completion is a well-studied problem in AI. Rule-based methods and embedding-based methods form two of the solution techniques. Rule-based methods learn first-order logic rules that capture existing facts in an input graph and then use these rules for reasoning about missing facts. A major drawback of such methods is the lack of scalability to large datasets. In this paper, we present a simple linear programming (LP) model to choose rules from a list of candidate rules and assign weights to them. For smaller KGs, we use simple heuristics to create the candidate list. For larger KGs, we start with a small initial candidate list, and then use standard column generation ideas to add more rules in order to improve the LP model objective value. To foster interpretability and generalizability, we limit the complexity of the set of chosen rules via explicit constraints, and tune the complexity hyperparameter for individual datasets. We show that our method can obtain state-of-the-art results for three out of four widely used KG datasets, while taking significantly less computing time than other popular rule learners including some based on neuro-symbolic methods. The improved scalability of our method allows us to tackle large datasets such as YAGO3-10.
A Physicist Quantified The Amount of Information in The Entire Observable Universe

In attempts to understand the very nature of our reality, physicists sure have some mind-bending theories. Like what if information is a tangible and fundamental aspect of physical reality itself – alongside matter and energy? Or, alternatively, what if information is the fifth state of matter? Information is, after all, something all matter and energy measurably possess. The rules that govern their existence, like their mass, speed, or charge, are all bits of information they contain. So to allow experimental probing of such ideas, physicist Melvin Vopson from the University of Portsmouth in the UK estimated how much information a single elementary...
