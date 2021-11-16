ContributorsPublishersAdvertisers
Computers

Robust recovery for stochastic block models

By Jingqiu Ding, Tommaso d'Orsi, Rajai Nasser, David Steurer
arxiv.org
 8 days ago

We develop an efficient algorithm for weak recovery in a robust version of the stochastic block model. The algorithm matches the statistical guarantees of the best known algorithms for the vanilla version of the stochastic block model. In this sense, our results show that there is no price of robustness in...

arxiv.org

Comments / 0

Related
arxiv.org

Are Transformers More Robust Than CNNs?

Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared at different scales and are applied with distinct training frameworks. In this paper, we aim to provide the first fair & in-depth comparisons between Transformers and CNNs, focusing on robustness evaluations.
VIDEO GAMES
arxiv.org

Deep ReLU neural network approximation of parametric and stochastic elliptic PDEs with lognormal inputs

We investigate non-adaptive methods of deep ReLU neural network approximation of the solution $u$ to parametric and stochastic elliptic PDEs with lognormal inputs on non-compact set $\mathbb{R}^\infty$. The approximation error is measured in the norm of the Bochner space $L_2(\mathbb{R}^\infty, V, \gamma)$, where $\gamma$ is the tensor product standard Gaussian probability on $\mathbb{R}^\infty$ and $V$ is the energy space. The approximation is based on an $m$-term truncation of the Hermite generalized polynomial chaos expansion (gpc) of $u$. Under a certain assumption on $\ell_q$-summability condition for lognormal inputs ($0< q <\infty$), we proved that for every integer $n > 1$, one can construct a non-adaptive compactly supported deep ReLU neural network $\boldsymbol{\phi}_n$ of size not greater than $n$ on $\mathbb{R}^m$ with $m = \mathcal{O} (n/\log n)$, having $m$ outputs so that the summation constituted by replacing polynomials in the $m$-term truncation of Hermite gpc expansion by these $m$ outputs approximates $u$ with an error bound $\mathcal{O}\left(\left(n/\log n\right)^{-1/q}\right)$. This error bound is comparable to the error bound of the best approximation of $u$ by $n$-term truncations of Hermite gpc expansion which is $\mathcal{O}(n^{-1/q})$. We also obtained some results on similar problems for parametric and stochastic elliptic PDEs with affine inputs, based on the Jacobi and Taylor gpc expansions.
CODING & PROGRAMMING
arxiv.org

Robust Integrative Biclustering for Multi-view Data

In many biomedical research, multiple views of data (e.g., genomics, proteomics) are available, and a particular interest might be the detection of sample subgroups characterized by specific groups of variables. Biclustering methods are well-suited for this problem as they assume that specific groups of variables might be relevant only to specific groups of samples. Many biclustering methods exist for identifying row-column clusters in a view but few methods exist for data from multiple views. The few existing algorithms are heavily dependent on regularization parameters for getting row-column clusters, and they impose unnecessary burden on users thus limiting their use in practice. We extend an existing biclustering method based on sparse singular value decomposition for single-view data to data from multiple views. Our method, integrative sparse singular value decomposition (iSSVD), incorporates stability selection to control Type I error rates, estimates the probability of samples and variables to belong to a bicluster, finds stable biclusters, and results in interpretable row-column associations. Simulations and real data analyses show that iSSVD outperforms several other single- and multi-view biclustering methods and is able to detect meaningful biclusters. iSSVD is a user-friendly, computationally efficient algorithm that will be useful in many disease subtyping applications.
SCIENCE
arxiv.org

AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator

Chuteng Zhou, Fernando Garcia Redondo, Julian Büchel, Irem Boybat, Xavier Timoneda Comas, S. R. Nandakumar, Shidhartha Das, Abu Sebastian, Manuel Le Gallo, Paul N. Whatmough. Always-on TinyML perception tasks in IoT applications require very high energy efficiency. Analog compute-in-memory (CiM) using non-volatile memory (NVM) promises high efficiency and also provides self-contained on-chip model storage. However, analog CiM introduces new practical considerations, including conductance drift, read/write noise, fixed analog-to-digital (ADC) converter gain, etc. These additional constraints must be addressed to achieve models that can be deployed on analog CiM with acceptable accuracy loss. This work describes $\textit{AnalogNets}$: TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW). The model architectures are specifically designed for analog CiM, and we detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities, and low-precision data converters at inference time. We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator, with a novel layer-serial approach to remove the cost of complex interconnects associated with a fully-pipelined design. We evaluate the AnalogNets on a calibrated simulator, as well as real hardware, and find that accuracy degradation is limited to 0.8$\%$/1.2$\%$ after 24 hours of PCM drift (8-bit) for KWS/VWW. AnalogNets running on the 14nm AON-CiM accelerator demonstrate 8.58/4.37 TOPS/W for KWS/VWW workloads using 8-bit activations, respectively, and increasing to 57.39/25.69 TOPS/W with $4$-bit activations.
COMPUTERS
IN THIS ARTICLE
#Robustness#Stochastic Block Model#Soda#Bbp#Pca#Lg#Machine Learning
arxiv.org

Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

Joel Dapello, Jenelle Feather, Hang Le, Tiago Marques, David D. Cox, Josh H. McDermott, James J. DiCarlo, SueYeon Chung. Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response stochasticity, like that exhibited by biological neurons. Here, using recently developed geometrical techniques from computational neuroscience, we investigate how adversarial perturbations influence the internal representations of standard, adversarially trained, and biologically-inspired stochastic networks. We find distinct geometric signatures for each type of network, revealing different mechanisms for achieving robust representations. Next, we generalize these results to the auditory domain, showing that neural stochasticity also makes auditory models more robust to adversarial perturbations. Geometric analysis of the stochastic networks reveals overlap between representations of clean and adversarially perturbed stimuli, and quantitatively demonstrates that competing geometric effects of stochasticity mediate a tradeoff between adversarial and clean performance. Our results shed light on the strategies of robust perception utilized by adversarially trained and stochastic networks, and help explain how stochasticity may be beneficial to machine and biological computation.
SCIENCE
arxiv.org

The Most Likely Transition Path for a Class of Distribution-Dependent Stochastic Systems

Distribution-dependent stochastic dynamical systems arise widely in engineering and science. We consider a class of such systems which model the limit behaviors of interacting particles moving in a vector field with random fluctuations. We aim to examine the most likely transition path between equilibrium stable states of the vector field. In the small noise regime, we find that the rate function (or action functional) does not involve with the solution of the skeleton equation, which describes unperturbed deterministic flow of the vector field shifted by the interaction at zero distance. As a result, we are led to study the most likely transition path for a stochastic differential equation without distribution-dependency. This enables the computation of the most likely transition path for these distribution-dependent stochastic dynamical systems by the adaptive minimum action method and we illustrate our approach in two examples.
SCIENCE
arxiv.org

An Algebraic and Microlocal Approach to the Stochastic Non-linear Schrodinger Equation

In a recent work [DDRZ20], it has been developed a novel framework aimed at studying at a perturbative level a large class of non-linear, scalar, real, stochastic PDEs and inspired by the algebraic approach to quantum field theory. The main advantage is the possibility of computing the expectation value and the correlation functions of the underlying solutions accounting for renormalization intrinsically and without resorting to any specific regularization scheme. In this work we prove that it is possible to extend the range of applicability of this framework to cover also the stochastic non-linear Schroedinger equation in which randomness is codified by an additive, Gaussian, complex white noise.
MATHEMATICS
arxiv.org

Differential privacy and robust statistics in high dimensions

We introduce a universal framework for characterizing the statistical efficiency of a statistical estimation problem with differential privacy guarantees. Our framework, which we call High-dimensional Propose-Test-Release (HPTR), builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism. Gluing all these together is the concept of resilience, which is central to robust statistical estimation. Resilience guides the design of the algorithm, the sensitivity analysis, and the success probability analysis of the test step in Propose-Test-Release. The key insight is that if we design an exponential mechanism that accesses the data only via one-dimensional robust statistics, then the resulting local sensitivity can be dramatically reduced. Using resilience, we can provide tight local sensitivity bounds. These tight bounds readily translate into near-optimal utility guarantees in several cases. We give a general recipe for applying HPTR to a given instance of a statistical estimation problem and demonstrate it on canonical problems of mean estimation, linear regression, covariance estimation, and principal component analysis. We introduce a general utility analysis technique that proves that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
COMPUTERS
YOU MAY ALSO LIKE
NewsBreak
Technology
NewsBreak
Computers
NewsBreak
Science
NewsBreak
Computer Science
arxiv.org

Robust Eigenvectors of Symmetric Tensors

The {\em tensor power method} generalizes the matrix power method to higher order arrays, or tensors. Like in the matrix case, the fixed points of the tensor power method are the eigenvectors of the tensor. While every real symmetric matrix has an eigendecomposition, the vectors generating a symmetric decomposition of a real symmetric tensor are not always eigenvectors of the tensor.
MATHEMATICS
arxiv.org

A Fully Anisotropic Formulation of Stochastic Cell Rescaling

Anisotropic barostats are employed to carry out Molecular Dynamics simulations where the volume is allowed to fluctuate with no constraints on the shape of the simulation cell. Most of these algorithms are based on second-order differential equations and share some common drawbacks, namely they can lead to slowly damped oscillations in the equilibration phase, and they do not allow to control efficiently the volume autocorrelation time. This work develops the anisotropic version of stochastic cell rescaling, a first-order stochastic barostat that overcomes these limits and can also be employed in the production phase, resulting in the correct physical fluctuations of the cell. The algorithm can be easily implemented in the existing codes on top of the anisotropic Berendsen barostat. The validation tests, performed on a number of crystal systems, show that the method is robust against wide variations of the input parameter, which allows an efficient control of the volume autocorrelation time.
SCIENCE
arxiv.org

Heteroclinic cycling and extinction in May-Leonard models with demographic stochasticity

May and Leonard (SIAM J. Appl. Math 1975) introduced a three-species Lotka-Volterra type population model that exhibits heteroclinic cycling. Rather than producing a periodic limit cycle, the trajectory takes longer and longer to complete each "cycle", passing closer and closer to unstable fixed points in which one population dominates and the others approach zero. Aperiodic heteroclinic dynamics have subsequently been studied in ecological systems (side-blotched lizards; colicinogenic E. coli), in the immune system, in neural information processing models ("winnerless competition"), and in models of neural central pattern generators. Yet as May and Leonard observed "Biologically, the behavior (produced by the model) is nonsense. Once it is conceded that the variables represent animals, and therefore cannot fall below unity, it is clear that the system will, after a few cycles, converge on some single population, extinguishing the other two." Here, we explore different ways of introducing discrete stochastic dynamics based on May and Leonard's ODE model, with application to ecological population dynamics, and to a neuromotor central pattern generator system. We study examples of several quantitatively distinct asymptotic behaviors, including total extinction of all species, extinction to a single species, and persistent cyclic dominance with finite mean cycle length.
SCIENCE
arxiv.org

Robust and Optimal Contention Resolution without Collision Detection

We consider the classical contention resolution problem where nodes arrive over time, each with a message to send. In each synchronous slot, each node can send or remain idle. If in a slot one node sends alone, it succeeds; otherwise, if multiple nodes send simultaneously, messages collide and none succeeds. Nodes can differentiate collision and silence only if collision detection is available. Ideally, a contention resolution algorithm should satisfy three criteria: low time complexity (or high throughput); low energy complexity, meaning each node does not make too many broadcast attempts; strong robustness, meaning the algorithm can maintain good performance even if slots can be jammed. Previous work has shown, with collision detection, there are "perfect" contention resolution algorithms satisfying all three criteria. On the other hand, without collision detection, it was not until 2020 that an algorithm was discovered which can achieve optimal time complexity and low energy cost, assuming there is no jamming. More recently, the trade-off between throughput and robustness was studied. However, an intriguing and important question remains unknown: without collision detection, are there robust algorithms achieving both low total time complexity and low per-node energy cost? In this paper, we answer the above question affirmatively. Specifically, we develop a new randomized algorithm for robust contention resolution without collision detection. Lower bounds show that it has both optimal time and energy complexity. If all nodes start execution simultaneously, we design another algorithm that is even faster, with similar energy complexity as the first algorithm. The separation on time complexity suggests for robust contention resolution without collision detection, ``batch'' instances (nodes start simultaneously) are inherently easier than ``scattered'' ones (nodes arrive over time).
COMPUTERS
arxiv.org

BSC: Block-based Stochastic Computing to Enable Accurate and Efficient TinyML

Along with the progress of AI democratization, machine learning (ML) has been successfully applied to edge applications, such as smart phones and automated driving. Nowadays, more applications require ML on tiny devices with extremely limited resources, like implantable cardioverter defibrillator (ICD), which is known as TinyML. Unlike ML on the edge, TinyML with a limited energy supply has higher demands on low-power execution. Stochastic computing (SC) using bitstreams for data representation is promising for TinyML since it can perform the fundamental ML operations using simple logical gates, instead of the complicated binary adder and multiplier. However, SC commonly suffers from low accuracy for ML tasks due to low data precision and inaccuracy of arithmetic units. Increasing the length of the bitstream in the existing works can mitigate the precision issue but incur higher latency. In this work, we propose a novel SC architecture, namely Block-based Stochastic Computing (BSC). BSC divides inputs into blocks, such that the latency can be reduced by exploiting high data parallelism. Moreover, optimized arithmetic units and output revision (OUR) scheme are proposed to improve accuracy. On top of it, a global optimization approach is devised to determine the number of blocks, which can make a better latency-power trade-off. Experimental results show that BSC can outperform the existing designs in achieving over 10% higher accuracy on ML tasks and over 6 times power reduction.
SOFTWARE
arxiv.org

Hybrid Acceleration Scheme for Variance Reduced Stochastic Optimization Algorithms

Stochastic variance reduced optimization methods are known to be globally convergent while they suffer from slow local convergence, especially when moderate or high accuracy is needed. To alleviate this problem, we propose an optimization algorithm -- which we refer to as a hybrid acceleration scheme -- for a class of proximal variance reduced stochastic optimization algorithms. The proposed optimization scheme combines a fast locally convergent algorithm, such as a quasi--Newton method, with a globally convergent variance reduced stochastic algorithm, for instance SAGA or L--SVRG. Our global convergence result of the hybrid acceleration method is based on specific safeguard conditions that need to be satisfied for a step of the locally fast convergent method to be accepted.
CODING & PROGRAMMING
towardsdatascience.com

Training Provably-Robust Neural Networks

Defending against adversarial examples with GloRo Nets. Over the last several years, deep networks have extensively been shown to be vulnerable to attackers that can cause the network to make perplexing mistakes, simply by feeding maliciously-perturbed inputs to the network. Clearly, this raises concrete safety concerns for neural networks deployed in the wild, especially in safety-critical settings, e.g., in autonomous vehicles. In turn, this has motivated a volume of work on practical defenses, ranging from attack detection strategies to modified training routines that aim to produce networks that are difficult — or impossible — to attack. In this article, we’ll take a look at an elegant and effective defense I designed with my colleagues at CMU (appearing in ICML 2021) that modifies the architecture of a neural network to naturally provide provable guarantees of robustness against certain classes of attacks — at no additional cost during test time.
SOFTWARE
arxiv.org

Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination

This paper investigates the problem of best arm identification in $\textit{contaminated}$ stochastic multi-arm bandits. In this setting, the rewards obtained from any arm are replaced by samples from an adversarial model with probability $\varepsilon$. A fixed confidence (infinite-horizon) setting is considered, where the goal of the learner is to identify the arm with the largest mean. Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable. This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. These algorithms involve mean estimates that achieve the optimal error guarantee on the deviation of the true mean from the estimate asymptotically. Furthermore, these algorithms asymptotically achieve the optimal sample complexity. Specifically, for the gap-based algorithm, the sample complexity is asymptotically optimal up to constant factors, while for the successive elimination-based algorithm, it is optimal up to logarithmic factors. Finally, numerical experiments are provided to illustrate the gains of the algorithms compared to the existing baselines.
SCIENCE
arxiv.org

Gaussian Process based Stochastic Model Predictive Control for Cooperative Adaptive Cruise Control

Cooperative driving relies on communication among vehicles to create situational awareness. One application of cooperative driving is Cooperative Adaptive Cruise Control (CACC) that aims at enhancing highway transportation safety and capacity. Model-based communication (MBC) is a new paradigm with a flexible content structure for broadcasting joint vehicle-driver predictive behavioral models. The vehicle's complex dynamics and diverse driving behaviors add complexity to the modeling process. Gaussian process (GP) is a fully data-driven and non-parametric Bayesian modeling approach which can be used as a modeling component of MBC. The knowledge about the uncertainty is propagated through predictions by generating local GPs for vehicles and broadcasting their hyper-parameters as a model to the neighboring vehicles. In this research study, GP is used to model each vehicle's speed trajectory, which allows vehicles to access the future behavior of their preceding vehicle during communication loss and/or low-rate communication. Besides, to overcome the safety issues in a vehicle platoon, two operating modes for each vehicle are considered; free following and emergency braking. This paper presents a discrete hybrid stochastic model predictive control, which incorporates system modes as well as uncertainties captured by GP models. The proposed control design approach finds the optimal vehicle speed trajectory with the goal of achieving a safe and efficient platoon of vehicles with small inter-vehicle gap while reducing the reliance of the vehicles on a frequent communication. Simulation studies demonstrate the efficacy of the proposed controller considering the aforementioned communication paradigm with low-rate intermittent communication.
CARS
arxiv.org

Stochastic Gradient Line Bayesian Optimization: Reducing Measurement Shots in Optimizing Parameterized Quantum Circuits

Optimization of parameterized quantum circuits is indispensable for applications of near-term quantum devices to computational tasks with variational quantum algorithms (VQAs). However, the existing optimization algorithms for VQAs require an excessive number of quantum-measurement shots in estimating expectation values of observables or iterating updates of circuit parameters, whose cost has been a crucial obstacle for practical use. To address this problem, we develop an efficient framework, \textit{stochastic gradient line Bayesian optimization} (SGLBO), for the circuit optimization with fewer measurement shots. The SGLBO reduces the cost of measurement shots by estimating an appropriate direction of updating the parameters based on stochastic gradient descent (SGD) and further by utilizing Bayesian optimization (BO) to estimate the optimal step size in each iteration of the SGD. We formulate an adaptive measurement-shot strategy to achieve the optimization feasibly without relying on precise expectation-value estimation and many iterations; moreover, we show that a technique of suffix averaging can significantly reduce the effect of statistical and hardware noise in the optimization for the VQAs. Our numerical simulation demonstrates that the SGLBO augmented with these techniques can drastically reduce the required number of measurement shots, improve the accuracy in the optimization, and enhance the robustness against the noise compared to other state-of-art optimizers in representative tasks for the VQAs. These results establish a framework of quantum-circuit optimizers integrating two different optimization approaches, SGD and BO, to reduce the cost of measurement shots significantly.
COMPUTERS
959theriver.com

Apple Computers Holding Their Own In Value Over Time.

So I’ve been a Mac guy for quite a while. A friend who was my go to tech person was a Mac person and I sort evolved from that. Also my son Jeff worked for Apple for about 12 or 13 years. So there’s a family connection too! Their products are pricey but very sound and well protected against outside invasions. I’ve seen Mac computers advertised for under $300.00 and MacBooks starting at just over $500.00. The pricing certainly does go up from there. So when you think about how much the original Mac computer sold for, they really have held their pricing. The first Apple-1 computers were sold for $666.66 in 1976. Forty-five years later, a still-functioning one has sold for $400,000. Bought at an auction I guess it must be the nostalgia that would make someone spend that kind of money for an old, yet working computer. I wonder if they get dial-up with that?
TECHNOLOGY
arxiv.org

Energy Efficient Learning with Low Resolution Stochastic Domain Wall Synapse Based Deep Neural Networks

We demonstrate that extremely low resolution quantized (nominally 5-state) synapses with large stochastic variations in Domain Wall (DW) position can be both energy efficient and achieve reasonably high testing accuracies compared to Deep Neural Networks (DNNs) of similar sizes using floating precision synaptic weights. Specifically, voltage controlled DW devices demonstrate stochastic behavior as modeled rigorously with micromagnetic simulations and can only encode limited states; however, they can be extremely energy efficient during both training and inference. We show that by implementing suitable modifications to the learning algorithms, we can address the stochastic behavior as well as mitigate the effect of their low-resolution to achieve high testing accuracies. In this study, we propose both in-situ and ex-situ training algorithms, based on modification of the algorithm proposed by Hubara et al. [1] which works well with quantization of synaptic weights. We train several 5-layer DNNs on MNIST dataset using 2-, 3- and 5-state DW device as synapse. For in-situ training, a separate high precision memory unit is adopted to preserve and accumulate the weight gradients, which are then quantized to program the low precision DW devices. Moreover, a sizeable noise tolerance margin is used during the training to address the intrinsic programming noise. For ex-situ training, a precursor DNN is first trained based on the characterized DW device model and a noise tolerance margin, which is similar to the in-situ training. Remarkably, for in-situ inference the energy dissipation to program the devices is only 13 pJ per inference given that the training is performed over the entire MNIST dataset for 10 epochs.
COMPUTERS

Comments / 0

Community Policy