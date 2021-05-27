Cancel
CreatorsPublishersAdvertisers
View more in
Coding & Programming

Optimization Induced Equilibrium Networks

By Xingyu Xie, Qiuhao Wang, Zenan Ling, Xia Li, Yisen Wang, Guangcan Liu, Zhouchen Lin
arxiv.org
 22 days ago

Implicit equilibrium models, i.e., deep neural networks (DNNs) defined by implicit equations, have been becoming more and more attractive recently. In this paper, we investigate one emerging question if model's equilibrium point can be regarded as the solution of an optimization problem. Specifically, we first decompose DNNs into a new class of unit layer that is differential of an implicit convex function while keeping its output unchanged. Then, the equilibrium model of the unit layer can be derived, named Optimization Induced Equilibrium Networks (OptEq), which can be easily extended to deep layers. The equilibrium point of OptEq can be theoretically connected to the solution of its corresponding convex optimization problem with explicit objectives. Based on this, we can flexibly introduce prior properties to the equilibrium points: 1) modifying the underlying convex problems explicitly so as to change the architectures of OptEq; and 2) merging the information into the fixed point iteration, which guarantees to choose the desired equilibrium when the fixed point set is non-singleton. This work establishes an important first step towards optimization guided design of deep models.

arxiv.org
IN THIS ARTICLE
#Convex Optimization#Design#Equilibrium Point#Machine Learning#Lg#Ne
YOU MAY ALSO LIKE
News Break
Technology
News Break
Computers
News Break
Science
News Break
Coding & Programming
News Break
Computer Science
Related
Coding & Programmingopensource.com

Optimize Java serverless functions in Kubernetes

A faster startup and smaller memory footprint always matter in Kubernetes due to the expense of running thousands of application pods and the cost savings of doing it with fewer worker nodes and other resources. Memory is more important than throughput on containerized microservices on Kubernetes because:. It's more expensive...
Physicsarxiv.org

Dephasing-induced growth of discrete crystalline order in spin networks

A quantum phase of matter can be understood from the symmetry of the system's Hamiltonian. The system symmetry along the time axis has been proposed to show a new phase of matter referred as discrete-time crystals (DTCs). A DTC is a quantum phase of matter in non-equilibrium systems, and it is also intimately related to the symmetry of the initial state. DTCs that are stable in isolated systems are not necessarily resilient to the influence from the external reservoir. In this paper, we discuss the dynamics of the DTCs under the influence of an environment. Specifically, we consider a non-trivial situation in which the initial state is prepared to partly preserve the symmetry of the Liouvillian. Our analysis shows that the entire system evolves towards a DTC phase and is stabilised by the effect of dephasing. Our results provide a new understanding of quantum phases emerging from the competition between the coherent and incoherent dynamics in dissipative non-equilibrium quantum systems.
Coding & Programmingmathworks.com

Multi-objectives Harmony Search Optimization

Multi-objectives Harmony Search optimization algorithm. This is a function script to solve an optimization problem with multiple objective functions. Harmony Search algorithm is one of the best optimization algorithm that balances local and global search. It is simple yet fast and efficient in find optimal solution. This script was created...
Googleatoallinks.com

An Effective Way to Optimizing Branded Keywords

SEO is a little more pivotal than ever before in today’s competitive internet market for business. However, It is not enough to develop a website, collect a few links, and call it a day. Without a doubt, It is not enough to focus just on SEO strategies. All you have to do now is tie it all together to create an online brand that is both trusted and authoritative.
Sciencearxiv.org

Title:The Local Equilibrium State of a Crystal Surface Jump Process in the Rough Scaling Regime

Abstract: We investigate the local equilibrium (LE) distribution of a crystal surface jump process as it approaches its hydrodynamic (continuum) limit in a nonstandard, "rough" scaling regime introduced by Marzuola and Weare. The rough scaling leads to a local equilibrium state whose structure is novel, to the best of our knowledge. The distinguishing characteristic of the new LE state is that the ensemble average of single lattice site observables do not vary smoothly across lattice sites. This raises the question, how does the microscopic process converge to its continuum limit in the presence of such discontinuity? We address this question by investigating the new LE state in relation to a key identity which expresses, in mathematical terms, how a macroscopic dynamics emerges out of many microscopic interactions. A variant of this identity lies at the heart of rigorous hydrodynamic limit convergence proofs. We conjecture that the crystal surface LE state satisfies several general conditions defining what we call a "rough" local equilibrium. These conditions lend insight into how the key identity can be satisfied absent a smooth LE state. Indeed, we show that the identity still holds under the rough LE conditions, which constitute relaxations of standard "smooth" LE properties. Employing both numerics and analysis, we then verify that for the crystal surface relaxation process, these conjectured rough LE conditions are indeed satisfied. Our explicit analysis of the crystal surface LE state will also make evident why the rough scaling leads to this discontinuity in the first place.
Energy Industrywindtech-international.com

Understanding and Applying Optimal Bolt Tension

Achieving and maintaining the right tension in bolted joints in wind turbines can help prevent system failures and associated repair costs. One loose bolt in a cluster of several hundred holding a structure together, often interdependently, can cause a domino effect that could, at worst, result in failure of the entire unit. As wind turbines continue to increase in size, the structures need to withstand ever higher centrifugal and bending forces, as well as vibrations – all factors that can affect the integrity of bolted joints. Correct bolt tensioning is, therefore, critical. But, accurate bolt tension is not only difficult to achieve, it can also be difficult to monitor. Danish engineering company R&D has developed an accurate system that uses both mechanical and ultrasonic measurements to determine the desired bolt tension in a way that also saves time. The solution can also digitally track individual bolts throughout their lifetime, ensuring the condition of the bolts is documented.
Sciencearxiv.org

Instantaneous equilibrium Transport for Brownian systems under time-dependent temperature and potential variations: Reversibility, Heat and work relations, and Fast Isentropic process

The theory of constructing instantaneous equilibrium (ieq) transition under arbitrary time-dependent temperature and potential variation for a Brownian particle is developed. It is shown that it is essential to consider the underdamped dynamics for temperature-changing transitions. The ieq is maintained by a time-dependent auxiliary position and momentum potential, which can be calculated for given time-dependent transition protocols. Explicit analytic results are derived for the work and heat statistics, energy, and entropy changes for harmonic and non-harmonic trapping potential with arbitrary time-dependent potential parameters and temperature protocols. Numerical solutions of the corresponding Langevin dynamics are computed to confirm the theoretical results. Although ieq transition of the reverse process is not the time-reversal of the ieq transition of the forward process due to the odd-parity of controlling parameters, their phase-space distribution functions restore the time-reversal symmetry, and hence the energy and entropy changes of the ieq of the reverse process are simply the negative of that of the forward process. Furthermore, it is shown that it is possible to construct an ieq transition that has zero entropy production at a finite transition rate, i.e., a fast ieq isentropic process, and is further demonstrated by explicit Langevin dynamics simulations. Our theory provides fundamental building blocks for designing controlled microscopic heat engine cycles. Implications for constructing an efficient Brownian heat engine are also discussed.
Mathematicsarxiv.org

Equilibrium Energy and Entropy of Vortex Filaments on a Cubic Lattice: A Localized Transformations Algorithm

In this work we propose a new algorithm for the computation of statistical equilibrium quantities on a cubic lattice when both an energy and a statistical temperature are involved. We demonstrate that the pivot algorithm used in situations such as protein folding works well for a small range of temperatures near the polymeric case, but it fails in other situations. The new algorithm, using localized transformations, seems to perform well for all possible temperature values. Having reliably approximated the values of equilibrium energy, we also propose an efficient way to compute equilibrium entropy for all temperature values. We apply the algorithms in the context of suction or supercritical vortices in a tornadic flow, which are approximated by vortex filaments on a cubic lattice. We confirm that supercritical (smooth, "straight") vortices have the highest energy and correspond to negative temperatures in this model. The lowest-energy configurations are folded up and "balled up" to a great extent. The results support A. Chorin's findings that, in the context of supercritical vortices in a tornadic flow, when such high-energy vortices stretch, they need to fold.
Astronomyarxiv.org

optimizing the searches for interstellar heterocycles

It is a fact that interstellar formation processes are thermodynamically affected. Based on this, the seven heterocycles; imidazole, pyridine, pyrimidine, pyrrole, quinoline, isoquinoline and furan that have been searched for from different astronomical sources with only upper limits of their column density determined without any successful detection remain the best candidates for astronomical observation with respect to their isomers. These molecules are believed to be formed on the surface of the interstellar dust grains and as such, they are susceptible to interstellar hydrogen bonding. In this study, a two way approach using ab initio quantum chemical simulations is considered in optimizing the searches for these molecules in interstellar medium. Firstly, these molecules and their isomers are subjected to the effect of interstellar hydrogen bonding. Secondly, the deuterated analogues of these heterocycles are examined for their possible detectability. From the results, all the heterocycles except furan are found to be strongly bonded to the surfaces of the interstellar dust grains thereby reducing their abundances, thus contributing to their unsuccessful detection. Successful detection of furan remains highly feasible. With respect to their D-analogues, the computed Boltzmann factor indicates that they are formed under the dense molecular cloud conditions where major deuterium fractionation dominates implying very high D/H ratio above the cosmic D/H ratio which suggests the detectability of these deuterated species.
Coding & Programmingarxiv.org

Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network whose links are allowed to change in time. We solve two fundamental problems for this task. First, we establish the first lower bounds on the number of decentralized communication rounds and the number of local computations required to find an $\epsilon$-accurate solution. Second, we design two optimal algorithms that attain these lower bounds: (i) a variant of the recently proposed algorithm ADOM (Kovalev et al., 2021) enhanced via a multi-consensus subroutine, which is optimal in the case when access to the dual gradients is assumed, and (ii) a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
Coding & Programmingarxiv.org

Multi-layered Network Exploration via Random Walks: From Offline Optimization to Online Learning

Multi-layered network exploration (MuLaNE) problem is an important problem abstracted from many applications. In MuLaNE, there are multiple network layers where each node has an importance weight and each layer is explored by a random walk. The MuLaNE task is to allocate total random walk budget $B$ into each network layer so that the total weights of the unique nodes visited by random walks are maximized. We systematically study this problem from offline optimization to online learning. For the offline optimization setting where the network structure and node weights are known, we provide greedy based constant-ratio approximation algorithms for overlapping networks, and greedy or dynamic-programming based optimal solutions for non-overlapping networks. For the online learning setting, neither the network structure nor the node weights are known initially. We adapt the combinatorial multi-armed bandit framework and design algorithms to learn random walk related parameters and node weights while optimizing the budget allocation in multiple rounds, and prove that they achieve logarithmic regret bounds. Finally, we conduct experiments on a real-world social network dataset to validate our theoretical results.
Mathematicsarxiv.org

Linear response in large deviations theory: A method to compute non-equilibrium distributions

We consider thermodynamically consistent autonomous Markov jump processes displaying a macroscopic limit in which the logarithm of the probability distribution is proportional to a scale-independent rate function (i.e., a large deviations principle is satisfied). In order to provide an explicit expression for the probability distribution valid away from equilibrium, we propose a linear response theory performed at the level of the rate function. We show that the first order non-equilibrium contribution to the steady state rate function, $g(\bm{x})$, satisfies $\bm{u}(\bm{x})\cdot \nabla g(\bm{x}) = -\beta \dot W(\bm{x})$ where the vector field $\bm{u}(\bm{x})$ defines the macroscopic deterministic dynamics, and the scalar field $\dot W(\bm{x})$ equals the rate at which work is performed on the system in a given state $\bm{x}$. This equation provides a practical way to determine $g(\bm{x})$, significantly outperforms standard linear response theory applied at the level of the probability distribution, and approximates the rate function surprisingly well in some far-from-equilibrium conditions. The method applies to a wealth of physical and chemical systems, that we exemplify by two analytically tractable models -- an electrical circuit and an autocatalytic chemical reaction network -- both undergoing a non-equilibrium transition from a monostable phase to a bistable phase. Our approach can be easily generalized to transient probabilities and non-autonomous dynamics. Moreover, its recursive application generates a virtual flow in the probability space which allows to determine the steady state rate function arbitrarily far from equilibrium.
Coding & Programmingarxiv.org

Provably Faster Algorithms for Bilevel Optimization

Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning. Recently, several momentum-based algorithms have been proposed to solve bilevel optimization problems faster. However, those momentum-based algorithms do not achieve provably better computational complexity than $\mathcal{O}(\epsilon^{-2})$ of the SGD-based algorithm. In this paper, we propose two new algorithms for bilevel optimization, where the first algorithm adopts momentum-based recursive iterations, and the second algorithm adopts recursive gradient estimations in nested loops to decrease the variance. We show that both algorithms achieve the complexity of $\mathcal{O}(\epsilon^{-1.5})$, which outperforms all existing algorithms by the order of magnitude. Our experiments validate our theoretical results and demonstrate the superior empirical performance of our algorithms in hyperparameter applications. Our codes for MRBO, VRBO and other benchmarks are available $\text{online}^1$.
Sciencearxiv.org

Complete Realization of Energy Landscape and Non-equilibrium Trapping Dynamics in Spin Glass and Optimization Problem

Energy landscapes are high-dimensional surfaces representing the dependence of system energy on variable configurations, which determine crucially the system's emergent behavior but are difficult to be analyzed due to their high-dimensional nature. In this article, we introduce an approach to reveal the complete energy landscapes of small spin glasses and Boolean satisfiability problems, which also unravels their non-equilibrium dynamics at an arbitrary temperature for an arbitrarily long time. In contrary to our common belief, our results show that it can be less likely to identify the ground states when temperature decreases, due to trapping in individual local minima, which ceases at different time, leading to multiple abrupt jumps with time in the ground-state probability. Simulations agree well with theoretical predictions on these remarkable phenomena. Finally, for large systems, we introduce a variant approach to extract partially the energy landscapes and observe both analytically and in simulations similar phenomena. This work introduces new methodology to unravel the non-equilibrium dynamics of glassy systems, and provides us with a clear, complete and new physical picture on their long-time behaviors inaccessible by modern numerics.
Computersarxiv.org

DORO: Distributional and Outlier Robust Optimization

Many machine learning tasks involve subpopulation shift where the testing data distribution is a subpopulation of the training distribution. For such settings, a line of recent work has proposed the use of a variant of empirical risk minimization(ERM) known as distributionally robust optimization (DRO). In this work, we apply DRO to real, large-scale tasks with subpopulation shift, and observe that DRO performs relatively poorly, and moreover has severe instability. We identify one direct cause of this phenomenon: sensitivity of DRO to outliers in the datasets. To resolve this issue, we propose the framework of DORO, for Distributional and Outlier Robust Optimization. At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers. We instantiate DORO for the Cressie-Read family of Rényi divergence, and delve into two specific instances of this family: CVaR and $\chi^2$-DRO. We theoretically prove the effectiveness of the proposed method, and empirically show that DORO improves the performance and stability of DRO with experiments on large modern datasets, thereby positively addressing the open question raised by Hashimoto et al., 2018.
Mathematicsarxiv.org

Optimized Rate-Profiling for PAC Codes

The polarization-adjusted convolutional (PAC) codes concatenate the polar transform and the convolutional transform to improve the decoding performance of the finite-length polar codes, where the rate-profile is used to construct the PAC codes by setting the positions of frozen bits. However, the optimal rateprofile method of PAC codes is still unknown. In this paper, an optimized rate-profile algorithm of PAC codes is proposed. First, we propose the normalized compression factor (NCF) to quantify the transmission efficiency of useful information, showing that the distribution of useful information that needs to be transmitted after the convolutional transform should be adaptive to the capacity profile after finite-length polar transform. This phenomenon indicates that the PAC code improves the transmission efficiency of useful information, which leads to a better decoding performance than the polar codes with the same length. Then, we propose a novel rate-profile method of PAC codes, where a quadratic optimization model is established and the Euclidean norm of the NCF spectrum is adopted to construct the objective function. Finally, a heuristic bit-swapping strategy is designed to search for the frozen set with high objective function values, where the search space is limited by considering the only bits with medium Hamming weight of the row index. Simulation results show that the PAC codes with the proposed optimized rate-profile construction have better decoding performance than the PAC codes with the originally proposed Reed-Muller design construction.
Computersarxiv.org

Title:Efficient input placement for the optimal control of network moments

Abstract: In this paper, we study the optimal control of the mean and variance of the network state vector. We develop an algorithm to optimize the control input placement subject to constraints on the state, which must be achieved at a given time threshold; seeking an input placement which moves the moment at minimum cost. First, we solve the state-selection problem for a number of variants of the first and second moment, and find solutions related to the eigenvalues of the systems' Gramian matrices. Our algorithm then uses this information to find a locally optimal input placement. This is a Generalization of the Projected Gradient Method (GPGM). We solve the problem for some common versions of these moments, including the mean state and versions of the second moment which induce discord, repel from a certain state, or encourage convergence. We then perform simulations, and discuss a measure of centrality based on the system flux -- a measure which describes what nodes are most important to optimal control of the average state.
Coding & Programmingarxiv.org

ZoPE: A Fast Optimizer for ReLU Networks with Low-Dimensional Inputs

Deep neural networks often lack the safety and robustness guarantees needed to be deployed in safety critical systems. Formal verification techniques can be used to prove input-output safety properties of networks, but when properties are difficult to specify, we rely on the solution to various optimization problems. In this work, we present an algorithm called ZoPE that solves optimization problems over the output of feedforward ReLU networks with low-dimensional inputs. The algorithm eagerly splits the input space, bounding the objective using zonotope propagation at each step, and improves computational efficiency compared to existing mixed integer programming approaches. We demonstrate how to formulate and solve three types of optimization problems: (i) minimization of any convex function over the output space, (ii) minimization of a convex function over the output of two networks in series with an adversarial perturbation in the layer between them, and (iii) maximization of the difference in output between two networks. Using ZoPE, we observe a $25\times$ speedup on property 1 of the ACAS Xu neural network verification benchmark and an $85\times$ speedup on a set of linear optimization problems. We demonstrate the versatility of the optimizer in analyzing networks by projecting onto the range of a generative adversarial network and visualizing the differences between a compressed and uncompressed network.
Computersarxiv.org

Bayesian Optimization over Hybrid Spaces

We consider the problem of optimizing hybrid structures (mixture of discrete and continuous input variables) via expensive black-box function evaluations. This problem arises in many real-world applications. For example, in materials design optimization via lab experiments, discrete and continuous variables correspond to the presence/absence of primitive elements and their relative concentrations respectively. The key challenge is to accurately model the complex interactions between discrete and continuous variables. In this paper, we propose a novel approach referred as Hybrid Bayesian Optimization (HyBO) by utilizing diffusion kernels, which are naturally defined over continuous and discrete variables. We develop a principled approach for constructing diffusion kernels over hybrid spaces by utilizing the additive kernel formulation, which allows additive interactions of all orders in a tractable manner. We theoretically analyze the modeling strength of additive hybrid kernels and prove that it has the universal approximation property. Our experiments on synthetic and six diverse real-world benchmarks show that HyBO significantly outperforms the state-of-the-art methods.
Coding & Programmingarxiv.org

Stein Latent Optimization for GANs

Generative adversarial networks (GANs) with clustered latent spaces can perform conditional generation in a completely unsupervised manner. However, the salient attributes of unlabeled data in the real-world are mostly imbalanced. Existing unsupervised conditional GANs cannot properly cluster the attributes in their latent spaces because they assume uniform distributions of the attributes. To address this problem, we theoretically derive Stein latent optimization that provides reparameterizable gradient estimations of the latent distribution parameters assuming a Gaussian mixture prior in a continuous latent space. Structurally, we introduce an encoder network and a novel contrastive loss to help generated data from a single mixture component to represent a single attribute. We confirm that the proposed method, named Stein Latent Optimization for GANs (SLOGAN), successfully learns the balanced or imbalanced attributes and performs unsupervised tasks such as unsupervised conditional generation, unconditional generation, and cluster assignment even in the absence of information of the attributes (e.g. the imbalance ratio). Moreover, we demonstrate that the attributes to be learned can be manipulated using a small amount of probe data.