Computers

A Multi-Fidelity Active Learning Method for Global Design Optimization Problems with Noisy Evaluations

By Riccardo Pellegrini, Jeroen Wackers, Riccardo Broglia, Matteo Diez, Andrea Serani, Michel Visonneau
arxiv.org
 2 days ago

A multi-fidelity (MF) active learning method is presented for design optimization problems characterized by noisy evaluations of the performance metrics. Namely, a generalized MF surrogate model is used for design-space exploration, exploiting an arbitrary number of hierarchical fidelity levels, i.e., performance evaluations coming from different models, solvers, or discretizations, characterized by...

arxiv.org

arxiv.org

COIL: Constrained Optimization in Learned Latent Space -- Learning Representations for Valid Solutions

Constrained optimization problems can be difficult because their search spaces have properties not conducive to search, e.g., multimodality, discontinuities, or deception. To address such difficulties, considerable research has been performed on creating novel evolutionary algorithms or specialized genetic operators. However, if the representation that defined the search space could be altered such that it only permitted valid solutions that satisfied the constraints, the task of finding the optimal would be made more feasible without any need for specialized optimization algorithms. We propose the use of a Variational Autoencoder to learn such representations. We present Constrained Optimization in Latent Space (COIL), which uses a VAE to generate a learned latent representation from a dataset comprising samples from the valid region of the search space according to a constraint, thus enabling the optimizer to find the objective in the new space defined by the learned representation. We investigate the value of this approach on different constraint types and for different numbers of variables. We show that, compared to an identical GA using a standard representation, COIL with its learned latent representation can satisfy constraints and find solutions with distance to objective up to two orders of magnitude closer.
COMPUTERS
itprotoday.com

Technical Debt and Modular Software Architecture

Some technologists argue that it should be possible to accommodate change by “innovation surfing,” by which they mean something like anticipating or coopting the disruption wrought by innovation.[i]. The Achilles heel of this idea is the presumption that an organization’s existing technology investments -- its technical debt --...
SOFTWARE
arxiv.org

Tailoring Gradient Methods for Differentially-Private Distributed Optimization

Decentralized optimization is gaining increased traction due to its widespread applications in large-scale machine learning and multi-agent systems. The same mechanism that enables its success, i.e., information sharing among participating agents, however, also leads to the disclosure of individual agents' private information, which is unacceptable when sensitive data are involved. As differential privacy is becoming a de facto standard for privacy preservation, recently results have emerged integrating differential privacy with distributed optimization. Although such differential-privacy based privacy approaches for distributed optimization are efficient in both computation and communication, directly incorporating differential privacy design in existing distributed optimization approaches significantly compromises optimization accuracy. In this paper, we propose to redesign and tailor gradient methods for differentially-private distributed optimization, and propose two differential-privacy oriented gradient methods that can ensure both privacy and optimality. We prove that the proposed distributed algorithms can ensure almost sure convergence to an optimal solution under any persistent and variance-bounded differential-privacy noise, which, to the best of our knowledge, has not been reported before. The first algorithm is based on static-consensus based gradient methods and only shares one variable in each iteration. The second algorithm is based on dynamic-consensus (gradient-tracking) based distributed optimization methods and, hence, it is applicable to general directed interaction graph topologies. Numerical comparisons with existing counterparts confirm the effectiveness of the proposed approaches.
CODING & PROGRAMMING
arxiv.org

Posterior temperature optimized Bayesian models for inverse problems in medical imaging

We present Posterior Temperature Optimized Bayesian Inverse Models (POTOBIM), an unsupervised Bayesian approach to inverse problems in medical imaging using mean-field variational inference with a fully tempered posterior. Bayesian methods exhibit useful properties for approaching inverse tasks, such as tomographic reconstruction or image denoising. A suitable prior distribution introduces regularization, which is needed to solve the ill-posed problem and reduces overfitting the data. In practice, however, this often results in a suboptimal posterior temperature, and the full potential of the Bayesian approach is not being exploited. In POTOBIM, we optimize both the parameters of the prior distribution and the posterior temperature with respect to reconstruction accuracy using Bayesian optimization with Gaussian process regression. Our method is extensively evaluated on four different inverse tasks on a variety of modalities with images from public data sets and we demonstrate that an optimized posterior temperature outperforms both non-Bayesian and Bayesian approaches without temperature optimization. The use of an optimized prior distribution and posterior temperature leads to improved accuracy and uncertainty estimation and we show that it is sufficient to find these hyperparameters per task domain. Well-tempered posteriors yield calibrated uncertainty, which increases the reliability in the predictions. Our source code is publicly available at this http URL.
SCIENCE
#Design Optimization#Fidelity#Active Learning#Mf#Srbf#Naca
arxiv.org

On the Convergence of Gradient Extrapolation Methods for Unbalanced Optimal Transport

We study the Unbalanced Optimal Transport (UOT) between two measures of possibly different masses with at most $n$ components, where marginal constraints of the standard Optimal Transport (OT) are relaxed via Kullback-Leibler divergence with regularization factor $\tau$. We propose a novel algorithm based on Gradient Extrapolation Method (GEM-UOT) to find an $\varepsilon$-approximate solution to the UOT problem in $O\big( \kappa n^2 \log\big(\frac{\tau n}{\varepsilon}\big) \big)$, where $\kappa$ is the condition number depending on only the two input measures. Compared to the only known complexity ${O}\big(\tfrac{\tau n^2 \log(n)}{\varepsilon} \log\big(\tfrac{\log(n)}{\varepsilon}\big)\big)$ for solving the UOT problem via the Sinkhorn algorithm, ours is better in $\varepsilon$ and lifts Sinkhorn's linear dependence on $\tau$, which hindered its practicality to approximate the standard OT via UOT. Our proof technique is based on a novel dual formulation of the squared $\ell_2$-norm regularized UOT objective, which is of independent interest and also leads to a new characterization of approximation error between UOT and OT in terms of both the transportation plan and transport distance. To this end, we further present an algorithm, based on GEM-UOT with fine tuned $\tau$ and a post-process projection step, to find an $\varepsilon$-approximate solution to the standard OT problem in $O\big( \kappa n^2 \log\big(\frac{ n}{\varepsilon}\big) \big)$, which is a new complexity in the literature of OT. Extensive experiments on synthetic and real datasets validate our theories and demonstrate the favorable performance of our methods in practice.
arxiv.org

Designing active colloidal folders

Can active forces be exploited to drive the consistent collapse of an active polymer into a folded structure? In this paper we introduce and perform numerical simulations of a simple model of active colloidal folders, and show that a judicious inclusion of active forces into a stiff colloidal chain can generate designable and reconfigurable two dimensional folded structures. The key feature is to organize the forces perpendicular to the chain backbone according to specific patterns (sequences). We characterize the physical properties of this model and perform, using a number of numerical techniques, an in-depth statistical analysis of structure and dynamics of the emerging conformations. We discovered a number of interesting features, including the existence of a direct correspondence between the sequence of the active forces and the structure of folded conformations, and we discover the existence of an ensemble of highly mobile compact structures capable of moving from conformation to conformation. Finally, akin to protein design problems, we discuss a method that is capable of designing specific target folds by sampling over sequences of active forces.
MATHEMATICS
arxiv.org

Jamming Resilient Indoor Factory Deployments: Design and Performance Evaluation

In the framework of 5G-and-beyond Industry 4.0, jamming attacks for denial of service are a rising threat which can severely compromise the system performance. Therefore, in this paper we deal with the problem of jamming detection and mitigation in indoor factory deployments. We design two jamming detectors based on pseudo-random blanking of subcarriers with orthogonal frequency division multiplexing and consider jamming mitigation with frequency hopping and random scheduling of the user equipments. We then evaluate the performance of the system in terms of achievable BLER with ultra-reliable low-latency communications traffic and jamming missed detection probability. Simulations are performed considering a 3rd Generation Partnership Project spatial channel model for the factory floor with a jammer stationed outside the plant trying to disrupt the communication inside the factory. Numerical results show that jamming resiliency increases when using a distributed access point deployment and exploiting channel correlation among antennas for jamming detection, while frequency hopping is helpful in jamming mitigation only for strict BLER requirements.
ENGINEERING
arxiv.org

Active Multi-Task Representation Learning

To leverage the power of big data from source tasks and overcome the scarcity of the target task samples, representation learning based on multi-task pretraining has become a standard approach in many applications. However, up until now, choosing which source tasks to include in the multi-task learning has been more art than science. In this paper, we give the first formal study on resource task sampling by leveraging the techniques from active learning. We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance. Theoretically, we show that for the linear representation class, to achieve the same error rate, our algorithm can save up to a \textit{number of source tasks} factor in the source task sample complexity, compared with the naive uniform sampling from all source tasks. We also provide experiments on real-world computer vision datasets to illustrate the effectiveness of our proposed method on both linear and convolutional neural network representation classes. We believe our paper serves as an important initial step to bring techniques from active learning to representation learning.
CODING & PROGRAMMING
NewsBreak
Technology
NewsBreak
Computers
arxiv.org

Solving matrix nearness problems via Hamiltonian systems, matrix factorization, and optimization

In these lectures notes, we review our recent works addressing various problems of finding the nearest stable system to an unstable one. After the introduction, we provide some preliminary background, namely, defining Port-Hamiltonian systems and dissipative Hamiltonian systems and their properties, briefly discussing matrix factorizations, and describing the optimization methods that we will use in these notes. In the third chapter, we present our approach to tackle the distance to stability for standard continuous linear time invariant (LTI) systems. The main idea is to rely on the characterization of stable systems as dissipative Hamiltonian systems. We show how this idea can be generalized to compute the nearest $\Omega$-stable matrix, where the eigenvalues of the sought system matrix $A$ are required to belong a rather general set $\Omega$. We also show how these ideas can be used to compute minimal-norm static feedbacks, that is, stabilize a system by choosing a proper input $u(t)$ that linearly depends on $x(t)$ (static-state feedback), or on $y(t)$ (static-output feedback). In the fourth chapter, we present our approach to tackle the distance to passivity. The main idea is to rely on the characterization of stable systems as port-Hamiltonian systems. We also discuss in more details the special case of computing the nearest stable matrix pairs. In the last chapter, we focus on discrete-time LTI systems. Similarly as for the continuous case, we propose a parametrization that allows efficiently compute the nearest stable system (for matrices and matrix pairs), allowing to compute the distance to stability. We show how this idea can be used in data-driven system identification, that is, given a set of input-output pairs, identify the system $A$.
COMPUTERS
arxiv.org

Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems

Over the recent years, reinforcement learning (RL) has shown impressive performance in finding strategic solutions for game environments, and recently starts to show promising results in solving combinatorial optimization (CO) problems, inparticular when coupled with curriculum learning to facilitate training. Despite emerging empirical evidence, theoretical study on why RL helps is still at its early stage. This paper presents the first systematic study on policy optimization methods for solving CO problems. We show that CO problems can be naturally formulated as latent Markov Decision Processes (LMDPs), and prove convergence bounds on natural policy gradient (NPG) for solving LMDPs. Furthermore, our theory explains the benefit of curriculum learning: it can find a strong sampling policy and reduce the distribution shift, a critical quantity that governs the convergence rate in our theorem. For a canonical combinatorial problem, Secretary Problem, we formally prove that distribution shift is reduced exponentially with curriculum learning. Our theory also shows we can simplify the curriculum learning scheme used in prior work from multi-step to single-step. Lastly, we provide extensive experiments on Secretary Problem and Online Knapsack to empirically verify our findings.
EDUCATION
arxiv.org

Bubble identification from images with machine learning methods

An automated and reliable processing of bubbly flow images is highly needed to analyse large data sets of comprehensive experimental series. A particular difficulty arises due to overlapping bubble projections in recorded images, which highly complicates the identification of individual bubbles. Recent approaches focus on the use of deep learning algorithms for this task and have already proven the high potential of such techniques. The main difficulties are the capability to handle different image conditions, higher gas volume fractions and a proper reconstruction of the hidden segment of a partly occluded bubble. In the present work, we try to tackle these points by testing three different methods based on Convolutional Neural Networks (CNNs) for the two former and two individual approaches that can be used subsequently to address the latter. To validate our methodology, we created test data sets with synthetic images that further demonstrate the capabilities as well as limitations of our combined approach. The generated data, code and trained models are made accessible to facilitate the use as well as further developments in the research field of bubble recognition in experimental images.
SOFTWARE
arxiv.org

Approximative Algorithms for Multi-Marginal Optimal Transport and Free-Support Wasserstein Barycenters

Computationally solving multi-marginal optimal transport (MOT) with squared Euclidean costs for $N$ discrete probability measures has recently attracted considerable attention, in part because of the correspondence of its solutions with Wasserstein-$2$ barycenters, which have many applications in data science. In general, this problem is NP-hard, calling for practical approximative algorithms. While entropic regularization has been successfully applied to approximate Wasserstein barycenters, this loses the sparsity of the optimal solution, making it difficult to solve the MOT problem directly in practice because of the curse of dimensionality. Thus, for obtaining barycenters, one usually resorts to fixed-support restrictions to a grid, which is, however, prohibitive in higher ambient dimensions $d$. In this paper, after analyzing the relationship between MOT and barycenters, we present two algorithms to approximate the solution of MOT directly, requiring mainly just $N-1$ standard two-marginal OT computations. Thus, they are fast, memory-efficient and easy to implement and can be used with any sparse OT solver as a black box. Moreover, they produce sparse solutions and show promising numerical results. We analyze these algorithms theoretically, proving upper and lower bounds for the relative approximation error.
SCIENCE
arxiv.org

Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits

Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems, since they allow efficient reuse of existing log data. However, there are fundamental limits to using existing log data alone, since the counterfactual estimators that are commonly used in these methods can have large bias and large variance when the logging policy is very different from the target policy being evaluated. To overcome this limitation, we explore the question of how to design data-gathering policies that most effectively augment an existing dataset of bandit feedback with additional observations for both learning and evaluation. To this effect, this paper introduces Minimum Variance Augmentation Logging (MVAL), a method for constructing logging policies that minimize the variance of the downstream evaluation or learning problem. We explore multiple approaches to computing MVAL policies efficiently, and find that they can be substantially more effective in decreasing the variance of an estimator than naïve approaches.
COMPUTERS
arxiv.org

Metric Learning-enhanced Optimal Transport for Biochemical Regression Domain Adaptation

Generalizing knowledge beyond source domains is a crucial prerequisite for many biomedical applications such as drug design and molecular property prediction. To meet this challenge, researchers have used optimal transport (OT) to perform representation alignment between the source and target domains. Yet existing OT algorithms are mainly designed for classification tasks. Accordingly, we consider regression tasks in the unsupervised and semi-supervised settings in this paper. To exploit continuous labels, we propose novel metrics to measure domain distances and introduce a posterior variance regularizer on the transport plan. Further, while computationally appealing, OT suffers from ambiguous decision boundaries and biased local data distributions brought by the mini-batch training. To address those issues, we propose to couple OT with metric learning to yield more robust boundaries and reduce bias. Specifically, we present a dynamic hierarchical triplet loss to describe the global data distribution, where the cluster centroids are progressively adjusted among consecutive iterations. We evaluate our method on both unsupervised and semi-supervised learning tasks in biochemistry. Experiments show the proposed method significantly outperforms state-of-the-art baselines across various benchmark datasets of small molecules and material crystals.
CHEMISTRY
arxiv.org

On the relations of stochastic convex optimization problems with empirical risk minimization problems on $p$-norm balls

In this paper, we consider convex stochastic optimization problems arising in machine learning applications (e.g., risk minimization) and the application of mathematical statistics (e.g., maximum likelihood estimation). There are two main approaches to solve such kinds of problems, namely the Stochastic Approximation approach (online approach) and the Sample Average Approximation, also known as the Monte Carlo approach, (offline approach). One of the advantages of the Monte Carlo approach is the possibility to solve the problem in a distributed decentralized setup, which is especialy valuable for large-scale problems. In the online approach, the problem can be solved by stochastic gradient descent-based methods, meanwhile, in the offline approach, the problem is replaced by its empirical counterpart (the empirical risk minimization problem). The natural question is how to define the problem sample size, i.e., how many realizations should be sampled so that the quite accurate solution of the empirical problem be the solution of the original problem with the desired precision. This issue is one of the main issues in modern machine learning and optimization. In the last decade, a lot of significant advances were made in these areas to solve convex stochastic optimization problems on the Euclidean balls (or the whole space). In this work, we are based on these advances and study the case of arbitrary balls in the $\ell_p$-norms. We also explore the question of how the parameter $p$ affects the estimates of the required number of terms as a function of empirical risk.
COMPUTERS
arxiv.org

Conformal prediction for the design problem

In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. For example, in the protein design problem, we have a regression model that predicts some real-valued property of a protein sequence, which we use to propose new sequences believed to exhibit higher property values than observed in the training data. Since validating designed sequences in the wet lab is typically costly, it is important to know how much we can trust the model's predictions. In such settings, however, there is a distinct type of distribution shift between the training and test data: one where the training and test data are statistically dependent, as the latter is chosen based on the former. Consequently, the model's error on the test data -- that is, the designed sequences -- has some non-trivial relationship with its error on the training data. Herein, we introduce a method to quantify predictive uncertainty in such settings. We do so by constructing confidence sets for predictions that account for the dependence between the training and test data. The confidence sets we construct have finite-sample guarantees that hold for any prediction algorithm, even when a trained model chooses the test-time input distribution. As a motivating use case, we demonstrate how our method quantifies uncertainty for the predicted fitness of designed protein using several real data sets.
SCIENCE
arxiv.org

Demystify Optimization and Generalization of Over-parameterized PAC-Bayesian Learning

PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution whilst incorporating the prior knowledge. In addition to being a pure generalization bound analysis tool, PAC-Bayesian bound can also be incorporated into an objective function to train a probabilistic neural network, making them a powerful and relevant framework that can numerically provide a tight generalization bound for supervised learning. For simplicity, we call probabilistic neural network learned using training objectives derived from PAC-Bayesian bounds as {\it PAC-Bayesian learning}. Despite their empirical success, the theoretical analysis of PAC-Bayesian learning for neural networks is rarely explored. This paper proposes a new class of convergence and generalization analysis for PAC-Bayes learning when it is used to train the over-parameterized neural networks by the gradient descent method. For a wide probabilistic neural network, we show that when PAC-Bayes learning is applied, the convergence result corresponds to solving a kernel ridge regression when the probabilistic neural tangent kernel (PNTK) is used as its kernel. Based on this finding, we further characterize the uniform PAC-Bayesian generalization bound which improves over the Rademacher complexity-based bound for non-probabilistic neural network. Finally, drawing the insight from our theoretical results, we propose a proxy measure for efficient hyperparameters selection, which is proven to be time-saving.
CODING & PROGRAMMING
arxiv.org

Evaluation Methods and Measures for Causal Learning Algorithms

The convenient access to copious multi-faceted data has encouraged machine learning researchers to reconsider correlation-based learning and embrace the opportunity of causality-based learning, i.e., causal machine learning (causal learning). Recent years have therefore witnessed great effort in developing causal learning algorithms aiming to help AI achieve human-level intelligence. Due to the lack-of ground-truth data, one of the biggest challenges in current causal learning research is algorithm evaluations. This largely impedes the cross-pollination of AI and causal inference, and hinders the two fields to benefit from the advances of the other. To bridge from conventional causal inference (i.e., based on statistical methods) to causal learning with big data (i.e., the intersection of causal inference and machine learning), in this survey, we review commonly-used datasets, evaluation methods, and measures for causal learning using an evaluation pipeline similar to conventional machine learning. We focus on the two fundamental causal-inference tasks and causality-aware machine learning tasks. Limitations of current evaluation procedures are also discussed. We then examine popular causal inference tools/packages and conclude with primary challenges and opportunities for benchmarking causal learning algorithms in the era of big data. The survey seeks to bring to the forefront the urgency of developing publicly available benchmarks and consensus-building standards for causal learning evaluation with observational data. In doing so, we hope to broaden the discussions and facilitate collaboration to advance the innovation and application of causal learning.
COMPUTERS
arxiv.org

A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization

We propose a projection-free conditional gradient-type algorithm for smooth stochastic multi-level composition optimization, where the objective function is a nested composition of $T$ functions and the constraint set is a closed convex set. Our algorithm assumes access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle satisfying certain standard unbiasedness and second moment assumptions. We show that the number of calls to the stochastic first-order oracle and the linear-minimization oracle required by the proposed algorithm, to obtain an $\epsilon$-stationary solution, are of order $\mathcal{O}_T(\epsilon^{-2})$ and $\mathcal{O}_T(\epsilon^{-3})$ respectively, where $\mathcal{O}_T$ hides constants in $T$. Notably, the dependence of these complexity bounds on $\epsilon$ and $T$ are separate in the sense that changing one does not impact the dependence of the bounds on the other. Moreover, our algorithm is parameter-free and does not require any (increasing) order of mini-batches to converge unlike the common practice in the analysis of stochastic conditional gradient-type algorithms.
MATHEMATICS
Design World Network

New asynchronous motors from NORD feature optimized design for improved

NORD’s newly redesigned 100 frame motors are the first step in updating their asynchronous motor portfolio with an optimized electrical, mechanical, and visual design. The new 3 and 4 hp premium efficient motors offer a simplified assembly including the elimination of a copper rotor as the motors produce the same energy efficiency without it. This change results in reduced costs and sell price to the consumer. Not changing in the design are the outer dimensions and the mounting options available – direct mount, NEMA, and IEC, making the new motors drop-in compatible for existing systems. While the new motors will be the preferred offering beginning Q1 2022, legacy versions are not being immediately discontinued and will still be available for a period.
TECHNOLOGY

