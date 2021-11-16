ContributorsPublishersAdvertisers
Mathematics

Sparse Regularization with the $\ell_0$ Norm

By Yuesheng Xu
arxiv.org
 8 days ago

We consider a minimization problem whose objective function is the sum of a fidelity term, not necessarily convex, and a regularization term defined by a positive regularization parameter $\lambda$ multiple of the $\ell_0$ norm composed with a linear transform. This...

arxiv.org

Comments / 0

Related
arxiv.org

Metric dimension on sparse graphs and its applications to zero forcing sets

The metric dimension dim(G) of a graph $G$ is the minimum cardinality of a subset $S$ of vertices of $G$ such that each vertex of $G$ is uniquely determined by its distances to $S$. It is well-known that the metric dimension of a graph can be drastically increased by the modification of a single edge. Our main result consists in proving that the increase of the metric dimension of an edge addition can be amortized in the sense that if the graph consists of a spanning tree $T$ plus $c$ edges, then the metric dimension of $G$ is at most the metric dimension of $T$ plus $6c$. We then use this result to prove a weakening of a conjecture of Eroh et al. The zero forcing number $Z(G)$ of $G$ is the minimum cardinality of a subset $S$ of black vertices (whereas the other vertices are colored white) of $G$ such that all the vertices will turned black after applying finitely many times the following rule: a white vertex is turned black if it is the only white neighbor of a black vertex. Eroh et al. conjectured that, for any graph $G$, $dim(G)\leq Z(G) + c(G)$, where $c(G)$ is the number of edges that have to be removed from $G$ to get a forest. They proved the conjecture is true for trees and unicyclic graphs. We prove a weaker version of the conjecture: $dim(G)\leq Z(G)+6c(G)$ holds for any graph. We also prove that the conjecture is true for graphs with edge disjoint cycles, widely generalizing the unicyclic result of Eroh et al.
MATHEMATICS
github.blog

Make your monorepo feel small with Git’s sparse index

One way that Git scales to the largest monorepos is the sparse-checkout feature, which allows you to focus on a subset of the files. This is supposed to make it feel like you are actually in a small repository, even though you are contributing to a large repository. There’s only...
SOFTWARE
arxiv.org

Spectral norm bounds for block Markov chain random matrices

This paper quantifies the asymptotic order of the largest singular value of a centered random matrix built from the path of a Block Markov Chain (BMC). In a BMC there are $n$ labeled states, each state is associated to one of $K$ clusters, and the probability of a jump depends only on the clusters of the origin and destination. Given a path $X_0, X_1, \ldots, X_{T_n}$ started from equilibrium, we construct a random matrix $\hat{N}$ that records the number of transitions between each pair of states. We prove that if $\omega(n) = T_n = o(n^2)$, then $\| \hat{N} - \mathbb{E}[\hat{N}] \| = \Omega_{\mathbb{P}}(\sqrt{T_n/n})$. We also prove that if $T_n = \Omega(n \ln{n})$, then $\| \hat{N} - \mathbb{E}[\hat{N}] \| = O_{\mathbb{P}}(\sqrt{T_n/n})$ as $n \to \infty$; and if $T_n = \omega(n)$, a sparser regime, then $\| \hat{N}_\Gamma - \mathbb{E}[\hat{N}] \| = O_{\mathbb{P}}(\sqrt{T_n/n})$. Here, $\hat{N}_{\Gamma}$ is a regularization that zeroes out entries corresponding to jumps to and from most-often visited states. Together this establishes that the order is $\Theta_{\mathbb{P}}(\sqrt{T_n/n})$ for BMCs.
MATHEMATICS
arxiv.org

Prune Once for All: Sparse Pre-Trained Language Models

Transformer-based language models are applied to a wide range of applications in natural language processing. However, they are inefficient and difficult to deploy. In recent years, many compression algorithms have been proposed to increase the implementation efficiency of large Transformer-based models on target hardware. In this work we present a new method for training sparse pre-trained Transformer language models by integrating weight pruning and model distillation. These sparse pre-trained models can be used to transfer learning for a wide range of tasks while maintaining their sparsity pattern. We demonstrate our method with three known architectures to create sparse pre-trained BERT-Base, BERT-Large and DistilBERT. We show how the compressed sparse pre-trained models we trained transfer their knowledge to five different downstream natural language tasks with minimal accuracy loss. Moreover, we show how to further compress the sparse models' weights to 8bit precision using quantization-aware training. For example, with our sparse pre-trained BERT-Large fine-tuned on SQuADv1.1 and quantized to 8bit we achieve a compression ratio of $40$X for the encoder with less than $1\%$ accuracy loss. To the best of our knowledge, our results show the best compression-to-accuracy ratio for BERT-Base, BERT-Large, and DistilBERT.
COMPUTERS
IN THIS ARTICLE
#Sparse#Regularization#Ell
arxiv.org

On Sparse High-Dimensional Graphical Model Learning For Dependent Time Series

We consider the problem of inferring the conditional independence graph (CIG) of a sparse, high-dimensional stationary multivariate Gaussian time series. A sparse-group lasso-based frequency-domain formulation of the problem based on frequency-domain sufficient statistic for the observed time series is presented. We investigate an alternating direction method of multipliers (ADMM) approach for optimization of the sparse-group lasso penalized log-likelihood. We provide sufficient conditions for convergence in the Frobenius norm of the inverse PSD estimators to the true value, jointly across all frequencies, where the number of frequencies are allowed to increase with sample size. This results also yields a rate of convergence. We also empirically investigate selection of the tuning parameters based on Bayesian information criterion, and illustrate our approach using numerical examples utilizing both synthetic and real data.
MATHEMATICS
arxiv.org

Distributed Sparse Regression via Penalization

We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been extensively studied in the optimization literature, their statistical and computational guarantees in the high dimensional setting remain unclear. This work provides an answer to this open problem. Our contribution is two-fold. First, we establish statistical consistency of the estimator: under a suitable choice of the penalty parameter, the optimal solution of the penalized problem achieves near optimal minimax rate $\mathcal{O}(s \log d/N)$ in $\ell_2$-loss, where $s$ is the sparsity value, $d$ is the ambient dimension, and $N$ is the total sample size in the network -- this matches centralized sample rates. Second, we show that the proximal-gradient algorithm applied to the penalized problem, which naturally leads to distributed implementations, converges linearly up to a tolerance of the order of the centralized statistical error -- the rate scales as $\mathcal{O}(d)$, revealing an unavoidable speed-accuracy dilemma.Numerical results demonstrate the tightness of the derived sample rate and convergence rate scalings.
SCIENCE
businessnewsdaily.com

Social Recruiting Becomes the Norm

Social recruiting, the process of hiring people through social media sites rather than traditional help wanted listings, has become the norm for most companies, according to a recent survey of 1,000 human resources and recruitment professionals. A survey by Jobvite revealed 92 percent of U.S. companies this year are using...
INTERNET
YOU MAY ALSO LIKE
NewsBreak
Technology
NewsBreak
Mathematics
NewsBreak
Computers
NewsBreak
Science
NewsBreak
Coding & Programming
NewsBreak
Computer Science
arxiv.org

Sparse Graph Learning Under Laplacian-Related Constraints

We consider the problem of learning a sparse undirected graph underlying a given set of multivariate data. We focus on graph Laplacian-related constraints on the sparse precision matrix that encodes conditional dependence between the random variables associated with the graph nodes. Under these constraints the off-diagonal elements of the precision matrix are non-positive (total positivity), and the precision matrix may not be full-rank. We investigate modifications to widely used penalized log-likelihood approaches to enforce total positivity but not the Laplacian structure. The graph Laplacian can then be extracted from the off-diagonal precision matrix. An alternating direction method of multipliers (ADMM) algorithm is presented and analyzed for constrained optimization under Laplacian-related constraints and lasso as well as adaptive lasso penalties. Numerical results based on synthetic data show that the proposed constrained adaptive lasso approach significantly outperforms existing Laplacian-based approaches. We also evaluate our approach on real financial data.
COMPUTERS
arxiv.org

L4-Norm Weight Adjustments for Converted Spiking Neural Networks

Spiking Neural Networks (SNNs) are being explored for their potential energy efficiency benefits due to sparse, event-driven computation. Non-spiking artificial neural networks are typically trained with stochastic gradient descent using backpropagation. The calculation of true gradients for backpropagation in spiking neural networks is impeded by the non-differentiable firing events of spiking neurons. On the other hand, using approximate gradients is effective, but computationally expensive over many time steps. One common technique, then, for training a spiking neural network is to train a topologically-equivalent non-spiking network, and then convert it to an spiking network, replacing real-valued inputs with proportionally rate-encoded Poisson spike trains. Converted SNNs function sufficiently well because the mean pre-firing membrane potential of a spiking neuron is proportional to the dot product of the input rate vector and the neuron weight vector, similar to the functionality of a non-spiking network. However, this conversion only considers the mean and not the temporal variance of the membrane potential. As the standard deviation of the pre-firing membrane potential is proportional to the L4-norm of the neuron weight vector, we propose a weight adjustment based on the L4-norm during the conversion process in order to improve classification accuracy of the converted network.
COMPUTERS
arxiv.org

Faster Sparse Minimum Cost Flow by Electrical Flow Localization

We give an $\widetilde{O}({m^{3/2 - 1/762} \log (U+W))}$ time algorithm for minimum cost flow with capacities bounded by $U$ and costs bounded by $W$. For sparse graphs with general capacities, this is the first algorithm to improve over the $\widetilde{O}({m^{3/2} \log^{O(1)} (U+W)})$ running time obtained by an appropriate instantiation of an interior point method [Daitch-Spielman, 2008].
SCIENCE
arxiv.org

Parallel Algorithms for Masked Sparse Matrix-Matrix Products

Computing the product of two sparse matrices (SpGEMM) is a fundamental operation in various combinatorial and graph algorithms as well as various bioinformatics and data analytics applications for computing inner-product similarities. For an important class of algorithms, only a subset of the output entries are needed, and the resulting operation is known as Masked SpGEMM since a subset of the output entries is considered to be "masked out". Existing algorithms for Masked SpGEMM usually do not consider mask as part of multiplication and either first compute a regular SpGEMM followed by masking, or perform a sparse inner product only for output elements that are not masked out. In this work, we investigate various novel algorithms and data structures for this rather challenging and important computation, and provide guidelines on how to design a fast Masked-SpGEMM for shared-memory architectures. Our evaluations show that factors such as matrix and mask density, mask structure and cache behavior play a vital role in attaining high performance for Masked SpGEMM. We evaluate our algorithms on a large number of matrices using several real-world benchmarks and show that our algorithms in most cases significantly outperform the state of the art for Masked SpGEMM implementations.
CODING & PROGRAMMING
arxiv.org

Compactness in normed spaces: a unified approach through semi-norms

In this paper we prove two new abstract compactness criteria in normed spaces. To this end we first introduce the notion of an equinormed set using a suitable family of semi-norms on the given normed space satisfying some natural conditions. Those conditions, roughly speaking, state that the norm can be approximated (on the equinormed sets even uniformly) by the elements of this family. As we are given some freedom of choice of the underlying semi-normed structure that is used to define equinormed sets, our approach opens a new perspective for building compactness criteria in specific normed spaces. As an example we show that natural selections of families of semi-norms in spaces $C(X,\mathbb{R})$ and $l^p$ for $p\in[1,+\infty)$ lead to the well-known compactness criteria (including the Arzelà-Ascoli theorem). In the second part of the paper, applying the abstract theorems, we construct a simple compactness criterion in the space of functions of bounded Schramm variation and provide a full characterization of linear integral operators acting from the space of functions of bounded Jordan variation to the space of functions of bounded Schramm variation in therms of their generating kernels.
MATHEMATICS
arxiv.org

Sparse Tensor-based Multiscale Representation for Point Cloud Geometry Compression

This study develops a unified Point Cloud Geometry (PCG) compression method through Sparse Tensor Processing (STP) based multiscale representation of voxelized PCG, dubbed as the SparsePCGC. Applying the STP reduces the complexity significantly because it only performs the convolutions centered at Most-Probable Positively-Occupied Voxels (MP-POV). And the multiscale representation facilitates us to compress scale-wise MP-POVs progressively. The overall compression efficiency highly depends on the approximation accuracy of occupancy probability of each MP-POV. Thus, we design the Sparse Convolution based Neural Networks (SparseCNN) consisting of sparse convolutions and voxel re-sampling to extensively exploit priors. We then develop the SparseCNN based Occupancy Probability Approximation (SOPA) model to estimate the occupancy probability in a single-stage manner only using the cross-scale prior or in multi-stage by step-wisely utilizing autoregressive neighbors. Besides, we also suggest the SparseCNN based Local Neighborhood Embedding (SLNE) to characterize the local spatial variations as the feature attribute to improve the SOPA. Our unified approach shows the state-of-art performance in both lossless and lossy compression modes across a variety of datasets including the dense PCGs (8iVFB, Owlii) and the sparse LiDAR PCGs (KITTI, Ford) when compared with the MPEG G-PCC and other popular learning-based compression schemes. Furthermore, the proposed method presents lightweight complexity due to point-wise computation, and tiny storage desire because of model sharing across all scales. We make all materials publicly accessible at this https URL for reproducible research.
COMPUTERS
arxiv.org

Feasibility of sparse large Lotka-Volterra ecosystems

Consider a large ecosystem (foodweb) with n species, where the abundances follow a Lotka-Volterra system of coupled differential equations. We assume that each species interacts with d other species and that their interaction coefficients are independent random variables. This parameter d reflects the connectance of the foodweb and the sparsity of its interactions especially if d is much smaller that n. We address the question of feasibility of the foodweb, that is the existence of an equilibrium solution of the Lotka-Volterra system with no vanishing species. We establish that for a given range of d with an extra condition on the sparsity structure, there exists an explicit threshold depending on n and d and reflecting the strength of the interactions, which guarantees the existence of a positive equilibrium as the number of species n gets large. From a mathematical point of view, the study of feasibility is equivalent to the existence of a positive solution (component-wise) to the equilibrium linear equation. The analysis of such positive solutions essentially relies on large random matrix theory for sparse matrices and Gaussian concentration of measure. The stability of the equilibrium is established. The results in this article extend to a sparse setting the results obtained by Bizeul and Najim in Proc. AMS 2021.
SCIENCE
arxiv.org

A Geometric Approach to Optimal Control of Hybrid and Impulsive Systems

Hybrid dynamical systems are systems which undergo both continuous and discrete transitions. The Bolza problem from optimal control theory is applied to these systems and a hybrid version of Pontryagin's maximum principle is presented. This hybrid maximum principle is presented to emphasize its geometric nature which makes its study amenable to the tools of geometric mechanics and symplectic geometry. One explicit benefit of this geometric approach is that Zeno behavior can be strongly controlled for "generic" control problems. Moreover, when the underlying control system is a mechanical impact system, additional structure is present which can be exploited and is thus explored. Multiple examples are presented for both mechanical and non-mechanical systems.
MATHEMATICS
arxiv.org

Quantum process tomography of adiabatic and superadiabatic stimulated Raman passage

Quantum control methods for three-level systems have become recently an important direction of research in quantum information science and technology. Here we present numerical simulations using realistic experimental parameters for quantum process tomography in STIRAP (stimulated Raman adiabatic passage) and saSTIRAP (superadiabatic STIRAP). Specifically, we identify a suitable basis in the operator space as the identity operator together with the 8 Gell-Mann operators, and we calculate the corresponding process matrices, which have $9\times 9=81$ elements. We discuss these results for the ideal decoherence-free case, as well as for the experimentally-relevant case with decoherence included.
SCIENCE
arxiv.org

Variational Hamiltonian Ansatz for 1D Hubbard chains in a broad range of parameter values

Hybrid quantum-classical algorithms have been proposed to circumvent noise limitations in quantum computers. Such algorithms delegate only a calculation of the expectation value to the quantum computer. Among them, the Variational Quantum Eigensolver (VQE) has been implemented to study molecules and condensed matter systems on small size quantum computers. Condensed matter systems described by the Hubbard model exhibit a rich phase diagram alongside exotic states of matter. In this manuscript, we try to answer the question: how much of the underlying physics of a 1D Hubbard chain is described by a problem-inspired Variational Hamiltonian Ansatz (VHA) in a broad range of parameter values ? We start by probing how much does the solution increases fidelity with increasing ansatz complexity. Our findings suggest that even low fidelity solutions capture energy and number of doubly occupied sites well, while spin-spin correlations are not well captured even when the solution is of high fidelity. Our powerful simulation platform allows us to incorporate a realistic noise model and show a successful implementation of a noise-mitigation strategy - the Richardson extrapolation.
COMPUTERS
arxiv.org

State Estimation of the Stefan PDE: A Tutorial on Design and Applications to Polar Ice and Batteries

The Stefan PDE system is a representative model for thermal phase change phenomena, such as melting and solidification, arising in numerous science and engineering processes. The mathematical description is given by a Partial Differential Equation (PDE) of the temperature distribution defined on a spatial interval with a moving boundary, where the boundary represents the liquid-solid interface and its dynamics are governed by an Ordinary Differential Equation (ODE). The PDE-ODE coupling at the boundary is nonlinear and creates a significant challenge for state estimation with provable convergence and robustness. This tutorial article presents a state estimation method based on PDE backstepping for the Stefan system, using measurements only at the moving boundary. PDE backstepping observer design generates an observer gain by employing a Volterra transformation of the observer error state into a desirable target system, solving a Goursat-form PDE for the transformation's kernel, and performing a Lyapunov analysis of the target observer error system. The observer is applied to models of problems motivated by climate change and the need for renewable energy storage: a model of polar ice dynamics and a model of charging and discharging in lithium-ion batteries. The numerical results for polar ice demonstrate a robust performance of the designed estimator with respect to the unmodeled salinity effect in sea ice. The results for an electrochemical PDE model of a lithium-ion battery with a phase transition material show the elimination of more than 15 \% error in State-of-Charge estimate within 5 minutes even in the presence of sensor noise.
SCIENCE

Comments / 0

Community Policy