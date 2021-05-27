Cancel
CreatorsPublishersAdvertisers
View more in
Science

Attention-oriented Brain Storm Optimization for Multimodal Optimization Problems

By Jian Yang, Yuhui Shi
arxiv.org
 22 days ago

Population-based methods are often used to solve multimodal optimization problems. By combining niching or clustering strategy, the state-of-the-art approaches generally divide the population into several subpopulations to find multiple solutions for a problem at hand. However, these methods only guided by the fitness value during iterations, which are suffering from determining the number of subpopulations, i.e., the number of niche areas or clusters. To compensate for this drawback, this paper presents an Attention-oriented Brain Storm Optimization (ABSO) method that introduces the attention mechanism into a relatively new swarm intelligence algorithm, i.e., Brain Storm Optimization (BSO). By converting the objective space from the fitness space into "attention" space, the individuals are clustered and updated iteratively according to their salient values. Rather than converge to a single global optimum, the proposed method can guide the search procedure to converge to multiple "salient" solutions. The preliminary results show that the proposed method can locate multiple global and local optimal solutions of several multimodal benchmark functions. The proposed method needs less prior knowledge of the problem and can automatically converge to multiple optimums guided by the attention mechanism, which has excellent potential for further development.

arxiv.org
IN THIS ARTICLE
#Fitness#Optimization Problems#Clustering#Arxiv#Niching#Abso#Ne
YOU MAY ALSO LIKE
News Break
Science
Related
Computersarxiv.org

Learning Hard Optimization Problems: A Data Generation Perspective

Optimization problems are ubiquitous in our societies and are present in almost every segment of the economy. Most of these optimization problems are NP-hard and computationally demanding, often requiring approximate solutions for large-scale instances. Machine learning frameworks that learn to approximate solutions to such hard optimization problems are a potentially promising avenue to address these difficulties, particularly when many closely related problem instances must be solved repeatedly. Supervised learning frameworks can train a model using the outputs of pre-solved instances. However, when the outputs are themselves approximations, when the optimization problem has symmetric solutions, and/or when the solver uses randomization, solutions to closely related instances may exhibit large differences and the learning task can become inherently more difficult. This paper demonstrates this critical challenge, connects the volatility of the training data to the ability of a model to approximate it, and proposes a method for producing (exact or approximate) solutions to optimization problems that are more amenable to supervised learning tasks. The effectiveness of the method is tested on hard non-linear nonconvex and discrete combinatorial problems.
Coding & Programmingmathworks.com

Multi-objectives Harmony Search Optimization

Multi-objectives Harmony Search optimization algorithm. This is a function script to solve an optimization problem with multiple objective functions. Harmony Search algorithm is one of the best optimization algorithm that balances local and global search. It is simple yet fast and efficient in find optimal solution. This script was created...
Mathematicsarxiv.org

Discrete-to-Continuous Extensions, II: Lovász extension, optimizations and eigenvalue problems

In this paper, we use various versions of Lovász extension to systematically derive continuous formulations of problems from discrete mathematics. This will take place in the following context:. (1) For combinatorial optimization problems, we systematically develop equivalent continuous versions, thereby making tools from convex optimization, fractional programming and more general...
Computersarxiv.org

Error Mitigation for Deep Quantum Optimization Circuits by Leveraging Problem Symmetries

High error rates and limited fidelity of quantum gates in near-term quantum devices are the central obstacles to successful execution of the Quantum Approximate Optimization Algorithm (QAOA). In this paper we introduce an application-specific approach for mitigating the errors in QAOA evolution by leveraging the symmetries present in the classical objective function to be optimized. Specifically, the QAOA state is projected into the symmetry-restricted subspace, with projection being performed either at the end of the circuit or throughout the evolution. Our approach improves the fidelity of the QAOA state, thereby increasing both the accuracy of the sample estimate of the QAOA objective and the probability of sampling the binary string corresponding to that objective value. We demonstrate the efficacy of the proposed methods on QAOA applied to the MaxCut problem, although our methods are general and apply to any objective function with symmetries, as well as to the generalization of QAOA with alternative mixers. We experimentally verify the proposed methods on an IBM Quantum processor, utilizing up to 5 qubits. When leveraging a global bit-flip symmetry, our approach leads to a 23% average improvement in quantum state fidelity.
Sciencearxiv.org

Optimization of optical waveguide antennas for directive emission of light

Optical travelling wave antennas offer unique opportunities to control and selectively guide light into a specific direction which renders them as excellent candidates for optical communication and sensing. These applications require state of the art engineering to reach optimized functionalities such as high directivity and radiation efficiency, low side lobe level, broadband and tunable capabilities, and compact design. In this work we report on the numerical optimization of the directivity of optical travelling wave antennas made from low-loss dielectric materials using full-wave numerical simulations in conjunction with a particle swarm optimization algorithm. The antennas are composed of a reflector and a director deposited on a glass substrate and an emitter placed in the feed gap between them serves as an internal source of excitation. In particular, we analysed antennas with rectangular- and horn-shaped directors made of either Hafnium dioxide or Silicon. The optimized antennas produce highly directional emission due to the presence of two dominant guided TE modes in the director in addition to leaky modes. These guided modes dominate the far-field emission pattern and govern the direction of the main lobe emission which predominately originates from the end facet of the director. Our work also provides a comprehensive analysis of the modes, radiation patterns, parametric influences, and bandwidths of the antennas that highlights their robust nature.
Mathematicsarxiv.org

Optimized Rate-Profiling for PAC Codes

The polarization-adjusted convolutional (PAC) codes concatenate the polar transform and the convolutional transform to improve the decoding performance of the finite-length polar codes, where the rate-profile is used to construct the PAC codes by setting the positions of frozen bits. However, the optimal rateprofile method of PAC codes is still unknown. In this paper, an optimized rate-profile algorithm of PAC codes is proposed. First, we propose the normalized compression factor (NCF) to quantify the transmission efficiency of useful information, showing that the distribution of useful information that needs to be transmitted after the convolutional transform should be adaptive to the capacity profile after finite-length polar transform. This phenomenon indicates that the PAC code improves the transmission efficiency of useful information, which leads to a better decoding performance than the polar codes with the same length. Then, we propose a novel rate-profile method of PAC codes, where a quadratic optimization model is established and the Euclidean norm of the NCF spectrum is adopted to construct the objective function. Finally, a heuristic bit-swapping strategy is designed to search for the frozen set with high objective function values, where the search space is limited by considering the only bits with medium Hamming weight of the row index. Simulation results show that the PAC codes with the proposed optimized rate-profile construction have better decoding performance than the PAC codes with the originally proposed Reed-Muller design construction.
Computersarxiv.org

Efficient solution method based on inverse dynamics for optimal control problems of rigid body systems

We propose an efficient way of solving optimal control problems for rigid-body systems on the basis of inverse dynamics and the multiple-shooting method. We treat all variables, including the state, acceleration, and control input torques, as optimization variables and treat the inverse dynamics as an equality constraint. We eliminate the update of the control input torques from the linear equation of Newton's method by applying condensing for inverse dynamics. The size of the resultant linear equation is the same as that of the multiple-shooting method based on forward dynamics except for the variables related to the passive joints and contacts. Compared with the conventional methods based on forward dynamics, the proposed method reduces the computational cost of the dynamics and their sensitivities by utilizing the recursive Newton-Euler algorithm (RNEA) and its partial derivatives. In addition, it increases the sparsity of the Hessian of the Karush-Kuhn-Tucker conditions, which reduces the computational cost, e.g., of Riccati recursion. Numerical experiments show that the proposed method outperforms state-of-the-art implementations of differential dynamic programming based on forward dynamics in terms of computational time and numerical robustness.
Computersarxiv.org

A gradient based resolution strategy for a PDE-constrained optimization approach for 3D-1D coupled problems

Coupled 3D-1D problems arise in many practical applications, in an attempt to reduce the computational burden in simulations where cylindrical inclusions with a small section are embedded in a much larger domain. Nonetheless the resolution of such problems can be non trivial, both from a mathematical and a geometrical standpoint. Indeed 3D-1D coupling requires to operate in non standard function spaces, and, also, simulation geometries can be complex for the presence of multiple intersecting domains. Recently, a PDE-constrained optimization based formulation has been proposed for such problems, proving a well posed mathematical formulation and allowing for the use of non conforming meshes for the discrete problem. Here an unconstrained optimization formulation of the problem is derived and an efficient gradient based solver is proposed for such formulation. Some numerical tests on quite complex configurations are discussed to show the viability of the method.
Coding & Programmingarxiv.org

Provably Faster Algorithms for Bilevel Optimization

Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning. Recently, several momentum-based algorithms have been proposed to solve bilevel optimization problems faster. However, those momentum-based algorithms do not achieve provably better computational complexity than $\mathcal{O}(\epsilon^{-2})$ of the SGD-based algorithm. In this paper, we propose two new algorithms for bilevel optimization, where the first algorithm adopts momentum-based recursive iterations, and the second algorithm adopts recursive gradient estimations in nested loops to decrease the variance. We show that both algorithms achieve the complexity of $\mathcal{O}(\epsilon^{-1.5})$, which outperforms all existing algorithms by the order of magnitude. Our experiments validate our theoretical results and demonstrate the superior empirical performance of our algorithms in hyperparameter applications. Our codes for MRBO, VRBO and other benchmarks are available $\text{online}^1$.
Coding & Programmingtowardsdatascience.com

The Hitchhiker’s Guide to Optimization in Machine Learning

A Detailed Guide on Optimization and Stochastic Gradient Descent. The aim of this article is to establish a proper understanding of what exactly “optimizing” a Machine Learning algorithm means. Further, we’ll have a look at the gradient-based class (Gradient Descent, Stochastic Gradient Descent, etc.) of optimization algorithms. NOTE: For the...
Mathematicsarxiv.org

An Optimal Algorithm for Strict Circular Seriation

We study the problem of circular seriation, where we are given a matrix of pairwise dissimilarities between $n$ objects, and the goal is to find a {\em circular order} of the objects in a manner that is consistent with their dissimilarity. This problem is a generalization of the classical {\em linear seriation} problem where the goal is to find a {\em linear order}, and for which optimal ${\cal O}(n^2)$ algorithms are known. Our contributions can be summarized as follows. First, we introduce {\em circular Robinson matrices} as the natural class of dissimilarity matrices for the circular seriation problem. Second, for the case of {\em strict circular Robinson dissimilarity matrices} we provide an optimal ${\cal O}(n^2)$ algorithm for the circular seriation problem. Finally, we propose a statistical model to analyze the well-posedness of the circular seriation problem for large $n$. In particular, we establish ${\cal O}(\log(n)/n)$ rates on the distance between any circular ordering found by solving the circular seriation problem to the underlying order of the model, in the Kendall-tau metric.
Computersarxiv.org

Joint System-Wise Optimization for Pipeline Goal-Oriented Dialog System

Recent work (Takanobu et al., 2020) proposed the system-wise evaluation on dialog systems and found that improvement on individual components (e.g., NLU, policy) in prior work may not necessarily bring benefit to pipeline systems in system-wise evaluation. To improve the system-wise performance, in this paper, we propose new joint system-wise optimization techniques for the pipeline dialog system. First, we propose a new data augmentation approach which automates the labeling process for NLU training. Second, we propose a novel stochastic policy parameterization with Poisson distribution that enables better exploration and offers a principled way to compute policy gradient. Third, we propose a reward bonus to help policy explore successful dialogs. Our approaches outperform the competitive pipeline systems from Takanobu et al. (2020) by big margins of 12% success rate in automatic system-wise evaluation and of 16% success rate in human evaluation on the standard multi-domain benchmark dataset MultiWOZ 2.1, and also outperform the recent state-of-the-art end-to-end trained model from DSTC9.
Technologyarxiv.org

Coordination of operational planning and real-time optimization in microgrids

Hierarchical microgrid control levels range from distributed device level controllers that run at a high frequency to centralized controllers optimizing market integration that run much less frequently. Centralized controllers are often subdivided into operational planning controllers that optimize decisions over a time horizon of one or several days, and real-time optimization controllers that deal with actions in the current market period. The coordination of these levels is of paramount importance. In this paper, we propose a value function-based approach as a way to propagate information from operational planning to real-time optimization. We apply this method to an environment where operational planning, using day-ahead forecasts, optimizes at a market period resolution the decisions to minimize the total energy cost and revenues, the peak consumption and injection-related costs, and plans for reserve requirements. While real-time optimization copes with the forecast errors and yields implementable actions based on real-time measurements. The approach is compared to a rule-based controller on three use cases, and its sensitivity to forecast error is assessed.
Computersarxiv.org

Optimal Counterfactual Explanations in Tree Ensembles

Counterfactual explanations are usually generated through heuristics that are sensitive to the search's initial conditions. The absence of guarantees of performance and robustness hinders trustworthiness. In this paper, we take a disciplined approach towards counterfactual explanations for tree ensembles. We advocate for a model-based search aiming at "optimal" explanations and propose efficient mixed-integer programming approaches. We show that isolation forests can be modeled within our framework to focus the search on plausible explanations with a low outlier score. We provide comprehensive coverage of additional constraints that model important objectives, heterogeneous data types, structural constraints on the feature space, along with resource and actionability restrictions. Our experimental analyses demonstrate that the proposed search approach requires a computational effort that is orders of magnitude smaller than previous mathematical programming algorithms. It scales up to large data sets and tree ensembles, where it provides, within seconds, systematic explanations grounded on well-defined models solved to optimality.
Coding & Programmingarxiv.org

A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems

Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks~(GANs). However, most of the recent efforts for solving them are limited to special regimes such as convex-concave games. Further, it is customarily assumed that the underlying optimization problem is solved either by a single machine or in the case of multiple machines connected in centralized fashion, wherein each one communicates with a central node. The latter approach becomes challenging, when the underlying communications network has low bandwidth. In addition, privacy considerations may dictate that certain nodes can communicate with a subset of other nodes. Hence, it is of interest to develop methods that solve min-max games in a decentralized manner. To that end, we develop a decentralized adaptive momentum (ADAM)-type algorithm for solving min-max optimization problem under the condition that the objective function satisfies a Minty Variational Inequality condition, which is a generalization to convex-concave case. The proposed method overcomes shortcomings of recent non-adaptive gradient-based decentralized algorithms for min-max optimization problems that do not perform well in practice and require careful tuning. In this paper, we obtain non-asymptotic rates of convergence of the proposed algorithm (coined DADAM$^3$) for finding a (stochastic) first-order Nash equilibrium point and subsequently evaluate its performance on training GANs. The extensive empirical evaluation shows that DADAM$^3$ outperforms recently developed methods, including decentralized optimistic stochastic gradient for solving such min-max problems.
Mathematicsarxiv.org

Learning the optimal regularizer for inverse problems

In this work, we consider the linear inverse problem $y=Ax+\epsilon$, where $A\colon X\to Y$ is a known linear operator between the separable Hilbert spaces $X$ and $Y$, $x$ is a random variable in $X$ and $\epsilon$ is a zero-mean random process in $Y$. This setting covers several inverse problems in imaging including denoising, deblurring, and X-ray tomography. Within the classical framework of regularization, we focus on the case where the regularization functional is not given a priori but learned from data. Our first result is a characterization of the optimal generalized Tikhonov regularizer, with respect to the mean squared error. We find that it is completely independent of the forward operator $A$ and depends only on the mean and covariance of $x$. Then, we consider the problem of learning the regularizer from a finite training set in two different frameworks: one supervised, based on samples of both $x$ and $y$, and one unsupervised, based only on samples of $x$. In both cases, we prove generalization bounds, under some weak assumptions on the distribution of $x$ and $\epsilon$, including the case of sub-Gaussian variables. Our bounds hold in infinite-dimensional spaces, thereby showing that finer and finer discretizations do not make this learning problem harder. The results are validated through numerical simulations.
Coding & Programmingarxiv.org

Distributed Optimization with Global Constraints Using Noisy Measurements

We propose a new distributed optimization algorithm for solving a class of constrained optimization problems in which (a) the objective function is separable (i.e., the sum of local objective functions of agents), (b) the optimization variables of distributed agents, which are subject to nontrivial local constraints, are coupled by global constraints, and (c) only noisy observations are available to estimate (the gradients of) local objective functions. In many practical scenarios, agents may not be willing to share their optimization variables with others. For this reason, we propose a distributed algorithm that does not require the agents to share their optimization variables with each other; instead, each agent maintains a local estimate of the global constraint functions and share the estimate only with its neighbors. These local estimates of constraint functions are updated using a consensus-type algorithm, while the local optimization variables of each agent are updated using a first-order method based on noisy estimates of gradient. We prove that, when the agents adopt the proposed algorithm, their optimization variables converge with probability 1 to an optimal point of an approximated problem based on the penalty method.
Trafficarxiv.org

Optimal transport in multilayer networks

Modeling traffic distribution and extracting optimal flows in multilayer networks is of utmost importance to design efficient multi-modal network infrastructures. Recent results based on optimal transport theory provide powerful and computationally efficient methods to address this problem, but they are mainly focused on modeling single-layer networks. Here we adapt these results to study how optimal flows distribute on multilayer networks. We propose a model where optimal flows on different layers contribute differently to the total cost to be minimized. This is done by means of a parameter that varies with layers, which allows to flexibly tune the sensitivity to traffic congestion of the various layers. As an application, we consider transportation networks, where each layer is associated to a different transportation system and show how the traffic distribution varies as we tune this parameter across layers. We show an example of this result on the real 2-layer network of the city of Bordeaux with bus and tram, where we find that in certain regimes the presence of the tram network significantly unburdens the traffic on the road network. Our model paves the way to further analysis of optimal flows and navigability strategies in real multilayer networks.
Mathematicsarxiv.org

The quantum annealing gap and dynamical quantum phase transitions in complex optimization problems

Quenching and annealing are extreme opposites in the time evolution of a quantum system: Annealing explores equilibrium phases of a Hamiltonian with slowly changing parameters and can be exploited as a tool for solving complex optimization problems. In contrast, quenches are sudden changes of the Hamiltonian, producing a non-equilibrium situation in which dynamical phase transitions can occur. Here, we investigate the relation between the two cases. Specifically, we show that the minimum of the annealing gap, which is an important bottleneck of quantum annealing algorithms, can be revealed from the order parameter which describes the dynamical quantum phase transition after the quench. Combined with statistical tools including the training of a neural network, the relation between quench and annealing dynamics can be exploited to reproduce the full functional behavior of the annealing gap from the quench data. We show that the partial or full knowledge about the annealing gap which can be gained in this way can be used to design optimized quantum annealing protocols with a practical time-to-solution benefit. Our results are obtained from simulating random Ising Hamiltonians, representing hard-to-solve instances of the exact cover problem.
IndustryLogistics Management

Network Optimization — Make Your Data Visible and Actionable

Download our free ebook and learn how to unlock the potential in your enterprise. It’s a common scenario for many high-volume shippers: Data is siloed in disparate locations—often embedded across multiple spreadsheets—which puts supply chain decision-makers at a clear disadvantage. It’s especially challenging for large global organizations looking to clearly understand if product mapping is poor, global standardization is lacking or various regions have different priorities and protocols for shipping.