Bayesian Optimization for Cascade-type Multi-stage Processes

By Shunya Kusakawa, Shion Takeno, Yu Inatsu, Kentaro Kutsukake, Shogo Iwazaki, Takashi Nakano, Toru Ujihara, Masayuki Karasuyama, Ichiro Takeuchi
Shunya Kusakawa, Shion Takeno, Yu Inatsu, Kentaro Kutsukake, Shogo Iwazaki, Takashi Nakano, Toru Ujihara, Masayuki Karasuyama, Ichiro Takeuchi. Complex processes in science and engineering are often formulated as multi-stage decision-making problems. In this paper, we consider a type of multi-stage decision-making process called a cascade...

Bayesian Approach to Inverse Problems: an Application to NNPDF Closure Testing

We discuss the Bayesian approach to the solution of inverse problems and apply the formalism to analyse the closure tests performed by the NNPDF collaboration. Starting from a comparison with the approach that is currently used for the determination of parton distributions (PDFs) by the NNPDF collaboration, we discuss some analytical results that can be obtained for linear problems and use these results as a guidance for the more complicated non-linear problems. We show that, in the case of Gaussian distributions, the posterior probability density of the parametrized PDFs is fully determined by the results of the NNPDF fitting procedure. In the particular case that we consider, the fitting procedure and the Bayesian analysis yield exactly the same result. Building on the insight that we obtain from the analytical results, we introduce new estimators to assess the statistical faithfulness of the fit results in closure tests. These estimators are defined in data space, and can be studied analytically using the Bayesian formalism in a linear model in order to clarify their meaning. Finally we present numerical results from a number of closure tests performed with current NNPDF methodologies. These further tests allow us to validate the NNPDF4.0 methodology and provide a quantitative comparison of the NNPDF4.0 and NNPDF3.1 methodologies. As PDFs determinations move into precision territory, the need for a careful validation of the methodology becomes increasingly important: the error bar has become the focal point of contemporary PDFs determinations. In this perspective, theoretical assumptions and other sources of error are best formulated and analysed in the Bayesian framework, which provides an ideal language to address the precision and the accuracy of current fits.
Wasserstein convergence in Bayesian deconvolution models

We study the reknown deconvolution problem of recovering a distribution function from independent replicates (signal) additively contaminated with random errors (noise), whose distribution is known. We investigate whether a Bayesian nonparametric approach for modelling the latent distribution of the signal can yield inferences with asymptotic frequentist validity under the $L^1$-Wasserstein metric. When the error density is ordinary smooth, we develop two inversion inequalities relating either the $L^1$ or the $L^1$-Wasserstein distance between two mixture densities (of the observations) to the $L^1$-Wasserstein distance between the corresponding distributions of the signal. This smoothing inequality improves on those in the literature. We apply this general result to a Bayesian approach bayes on a Dirichlet process mixture of normal distributions as a prior on the mixing distribution (or distribution of the signal), with a Laplace or Linnik noise. In particular we construct an \textit{adaptive} approximation of the density of the observations by the convolution of a Laplace (or Linnik) with a well chosen mixture of normal densities and show that the posterior concentrates at the minimax rate up to a logarithmic factor. The same prior law is shown to also adapt to the Sobolev regularity level of the mixing density, thus leading to a new Bayesian estimation method, relative to the Wasserstein distance, for distributions with smooth densities.
Multi-Task Neural Processes

Neural processes have recently emerged as a class of powerful neural latent variable models that combine the strengths of neural networks and stochastic processes. As they can encode contextual data in the network's function space, they offer a new way to model task relatedness in multi-task learning. To study its potential, we develop multi-task neural processes, a new variant of neural processes for multi-task learning. In particular, we propose to explore transferable knowledge from related tasks in the function space to provide inductive bias for improving each individual task. To do so, we derive the function priors in a hierarchical Bayesian inference framework, which enables each task to incorporate the shared knowledge provided by related tasks into its context of the prediction function. Our multi-task neural processes methodologically expand the scope of vanilla neural processes and provide a new way of exploring task relatedness in function spaces for multi-task learning. The proposed multi-task neural processes are capable of learning multiple tasks with limited labeled data and in the presence of domain shift. We perform extensive experimental evaluations on several benchmarks for the multi-task regression and classification tasks. The results demonstrate the effectiveness of multi-task neural processes in transferring useful knowledge among tasks for multi-task learning and superior performance in multi-task classification and brain image segmentation.
Self-Compression in Bayesian Neural Networks

Machine learning models have achieved human-level performance on various tasks. This success comes at a high cost of computation and storage overhead, which makes machine learning algorithms difficult to deploy on edge devices. Typically, one has to partially sacrifice accuracy in favor of an increased performance quantified in terms of reduced memory usage and energy consumption. Current methods compress the networks by reducing the precision of the parameters or by eliminating redundant ones. In this paper, we propose a new insight into network compression through the Bayesian framework. We show that Bayesian neural networks automatically discover redundancy in model parameters, thus enabling self-compression, which is linked to the propagation of uncertainty through the layers of the network. Our experimental results show that the network architecture can be successfully compressed by deleting parameters identified by the network itself while retaining the same level of accuracy.
Bayesian Knockoff Generators for Robust Inference Under Complex Data Structure

Michael J. Martens (1), Anjishnu Banerjee (1), Xinran Qi (2), Yushu Shi (3) ((1) Medical College of Wisconsin, (2) Stanford University, (3) University of Missouri - Columbia) The recent proliferation of medical data, such as genetics and electronic health records (EHR), offers new opportunities to find novel predictors of health outcomes. Presented with a large set of candidate features, interest often lies in selecting the ones most likely to be predictive of an outcome for further study such that the goal is to control the false discovery rate (FDR) at a specified level. Knockoff filtering is an innovative strategy for FDR-controlled feature selection. But, existing knockoff methods make strong distributional assumptions that hinder their applicability to real world data. We propose Bayesian models for generating high quality knockoff copies that utilize available knowledge about the data structure, thus improving the resolution of prognostic features. Applications to two feature sets are considered: those with categorical and/or continuous variables possibly having a population substructure, such as in EHR; and those with microbiome features having a compositional constraint and phylogenetic relatedness. Through simulations and real data applications, these methods are shown to identify important features with good FDR control and power.
Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm. Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium at the rate of $\tilde{O}(T^{-1})$. This substantially improves over the prior best rate of convergence for correlated equilibria of $O(T^{-3/4})$ due to Chen and Peng (NeurIPS`20), and it is optimal -- within the no-regret framework -- up to polylogarithmic factors in $T$.
Multi-Objective Optimization for Value-Sensitive and Sustainable Basket Recommendations

Sustainable consumption aims to minimize the environmental and societal impact of the use of services and products. Over-consumption of services and products leads to potential natural resource exhaustion and societal inequalities, as access to goods and services becomes more challenging. In everyday life, a person can simply achieve more sustainable purchases by drastically changing their lifestyle choices and potentially going against their personal values or wishes. Conversely, achieving sustainable consumption while accounting for personal values is a more complex task, as potential trade-offs arise when trying to satisfy environmental and personal goals. This article focuses on value-sensitive design of recommender systems, which enable consumers to improve the sustainability of their purchases while respecting their personal values. Value-sensitive recommendations for sustainable consumption are formalized as a multi-objective optimization problem, where each objective represents different sustainability goals and personal values. Novel and existing multi-objective algorithms calculate solutions to this problem. The solutions are proposed as personalized sustainable basket recommendations to consumers. These recommendations are evaluated on a synthetic dataset, which comprises three established real-world datasets from relevant scientific and organizational reports. The synthetic dataset contains quantitative data on product prices, nutritional values and environmental impact metrics, such as greenhouse gas emissions and water footprint. The recommended baskets are highly similar to consumer purchased baskets and aligned with both sustainability goals and personal values relevant to health, expenditure and taste. Even when consumers would accept only a fraction of recommendations, a considerable reduction of environmental impact is observed.
An Approach of Bayesian Variable Selection for Ultrahigh Dimensional Multivariate Regression

In many practices, scientists are particularly interested in detecting which of the predictors are truly associated with a multivariate response. It is more accurate to model multiple responses as one vector rather than separating each component one by one. This is particularly true for complex traits having multiple correlated components. A Bayesian multivariate variable selection (BMVS) approach is proposed to select important predictors influencing the multivariate response from a candidate pool with an ultrahigh dimension. By applying the sample-size-dependent spike and slab priors, the BMVS approach satisfies the strong selection consistency property under certain conditions, which represents the advantages of BMVS over other existing Bayesian multivariate regression-based approaches. The proposed approach considers the covariance structure of multiple responses without assuming independence and integrates the estimation of covariance-related parameters together with all regression parameters into one framework through a fast updating MCMC procedure. It is demonstrated through simulations that the BMVS approach outperforms some other relevant frequentist and Bayesian approaches. The proposed BMVS approach possesses the flexibility of wide applications, including genome-wide association studies with multiple correlated phenotypes and a large scale of genetic variants and/or environmental variables, as demonstrated in the real data analyses section. The computer code and test data of the proposed method are available as an R package.
Bayesian, frequentist and fiducial intervals for the difference between two binomial proportions

Estimating the difference between two binomial proportions will be investigated, where Bayesian, frequentist and fiducial (BFF) methods will be considered. Three vague priors will be used, the Jeffreys prior, a divergence prior and the probability matching prior. A probability matching prior is a prior distribution under which the posterior probabilities of certain regions coincide with their coverage probabilities. Fiducial inference can be viewed as a procedure that obtains a measure on a parameter space while assuming less than what Bayesian inference does, i.e. no prior. Fisher introduced the idea of fiducial probability and fiducial inference. In some cases the fiducial distribution is equivalent to the Jeffreys posterior. The performance of the Jeffreys prior, divergence prior and the probability matching prior will be compared to a fiducial method and other classical methods of constructing confidence intervals for the difference between two independent binomial parameters. These intervals will be compared and evaluated by looking at their coverage rates and average interval lengths. The probability matching and divergence priors perform better than the Jeffreys prior.
Hierarchical Bayesian Bandits

Meta-, multi-task, and federated learning can be all viewed as solving similar tasks, drawn from an unknown distribution that reflects task similarities. In this work, we provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit. We analyze a natural hierarchical Thompson sampling algorithm (hierTS) that can be applied to any problem in this class. Our regret bounds hold under many instances of such problems, including when the tasks are solved sequentially or in parallel; and capture the structure of the problems, such that the regret decreases with the width of the task prior. Our proofs rely on novel total variance decompositions, which can be applied to other graphical model structures. Finally, our theory is complemented by experiments, which show that the hierarchical structure helps with knowledge sharing among the tasks. This confirms that hierarchical Bayesian bandits are a universal and statistically-efficient tool for learning to act with similar bandit tasks.
A Chinese Multi-type Complex Questions Answering Dataset over Wikidata

Jianyun Zou, Min Yang, Lichao Zhang, Yechen Xu, Qifan Pan, Fengqing Jiang, Ran Qin, Shushu Wang, Yifan He, Songfang Huang, Zhou Zhao. Complex Knowledge Base Question Answering is a popular area of research in the past decade. Recent public datasets have led to encouraging results in this field, but are mostly limited to English and only involve a small number of question types and relations, hindering research in more realistic settings and in languages other than English. In addition, few state-of-the-art KBQA models are trained on Wikidata, one of the most popular real-world knowledge bases. We propose CLC-QuAD, the first large scale complex Chinese semantic parsing dataset over Wikidata to address these challenges. Together with the dataset, we present a text-to-SPARQL baseline model, which can effectively answer multi-type complex questions, such as factual questions, dual intent questions, boolean questions, and counting questions, with Wikidata as the background knowledge. We finally analyze the performance of SOTA KBQA models on this dataset and identify the challenges facing Chinese KBQA.
Non-parametric Bayesian Vector Autoregression using Multi-subject Data

There has been a rich development of vector autoregressive (VAR) models for modeling temporally correlated multivariate outcomes. However, the existing VAR literature has largely focused on single subject parametric analysis, with some recent extensions to multi-subject modeling with known subgroups. Motivated by the need for flexible Bayesian methods that can pool information across heterogeneous samples in an unsupervised manner, we develop a novel class of non-parametric Bayesian VAR models based on heterogeneous multi-subject data. In particular, we propose a product of Dirichlet process mixture priors that enables separate clustering at multiple scales, which result in partially overlapping clusters that provide greater flexibility. We develop several variants of the method to cater to varying levels of heterogeneity. We implement an efficient posterior computation scheme and illustrate posterior consistency properties under reasonable assumptions on the true density. Extensive numerical studies show distinct advantages over competing methods in terms of estimating model parameters and identifying the true clustering and sparsity structures. Our analysis of resting state fMRI data from the Human Connectome Project reveals biologically interpretable differences between distinct fluid intelligence groups, and reproducible parameter estimates. In contrast, single-subject VAR analyses followed by permutation testing result in negligible differences, which is biologically implausible.
Bayesian modelling and computation utilising cycles in multiple network data

Modelling multiple network data is crucial for addressing a wide range of applied research questions. However, there are many challenges, both theoretical and computational, to address. Network cycles are often of particular interest in many applications, such as ecological studies, and an unexplored area has been how to incorporate networks' cycles within the inferential framework in an explicit way. The recently developed Spherical Network Family of models (SNF) offers a flexible formulation for modelling multiple network data that permits any type of metric. This has opened up the possibility to formulate network models that focus on network properties hitherto not possible or practical to consider. In this article we propose a novel network distance metric that measures similarities between networks with respect to their cycles, and incorporate this within the SNF model to allow inferences that explicitly capture information on cycles. These network motifs are of particular interest in ecological studies. We further propose a novel computational framework to allow posterior inferences from the intractable SNF model for moderate sized networks. Lastly, we apply the resulting methodology to a set of ecological network data studying aggressive interactions between species of fish. We show our model is able to make cogent inferences concerning the cycle behaviour amongst the species, and beyond those possible from a model that does not consider this network motif.
Locally Learned Synaptic Dropout for Complete Bayesian Inference

The Bayesian brain hypothesis postulates that the brain accurately operates on statistical distributions according to Bayes' theorem. The random failure of presynaptic vesicles to release neurotransmitters may allow the brain to sample from posterior distributions of network parameters, interpreted as epistemic uncertainty. It has not been shown previously how random failures might allow networks to sample from observed distributions, also known as aleatoric or residual uncertainty. Sampling from both distributions enables probabilistic inference, efficient search, and creative or generative problem solving. We demonstrate that under a population-code based interpretation of neural activity, both types of distribution can be represented and sampled with synaptic failure alone. We first define a biologically constrained neural network and sampling scheme based on synaptic failure and lateral inhibition. Within this framework, we derive drop-out based epistemic uncertainty, then prove an analytic mapping from synaptic efficacy to release probability that allows networks to sample from arbitrary, learned distributions represented by a receiving layer. Second, our result leads to a local learning rule by which synapses adapt their release probabilities. Our result demonstrates complete Bayesian inference, related to the variational learning method of dropout, in a biologically constrained network using only locally-learned synaptic failure rates.
Distributed Optimal Output Consensus of Uncertain Nonlinear Multi-Agent Systems over Unbalanced Directed Networks via Output Feedback

In this note, a novel observer-based output feedback control approach is proposed to address the distributed optimal output consensus problem of uncertain nonlinear multi-agent systems in the normal form over unbalanced directed graphs. The main challenges of the concerned problem lie in unbalanced directed graphs and nonlinearities of multi-agent systems with their agent states not available for feedback control. Based on a two-layer controller structure, a distributed optimal coordinator is first designed to convert the considered problem into a reference-tracking problem. Then a decentralized output feedback controller is developed to stabilize the resulting augmented system. A high-gain observer is exploited in controller design to estimate the agent states in the presence of uncertainties and disturbances so that the proposed controller relies only on agent outputs. The semi-global convergence of the agent outputs toward the optimal solution that minimizes the sum of all local cost functions is proved under standard assumptions. A key feature of the obtained results is that the nonlinear agents under consideration are only required to be locally Lipschitz and possess globally asymptotically stable and locally exponentially stable zero dynamics.
Bayesian Solar Wind Modeling with Pulsar Timing Arrays

Using Bayesian analyses we study the solar electron density with the NANOGrav 11-year pulsar timing array (PTA) dataset. Our model of the solar wind is incorporated into a global fit starting from pulse times-of-arrival. We introduce new tools developed for this global fit, including analytic expressions for solar electron column densities and open source models for the solar wind that port into existing PTA software. We perform an ab initio recovery of various solar wind model parameters. We then demonstrate the richness of information about the solar electron density, $n_E$, that can be gleaned from PTA data, including higher order corrections to the simple $1/r^2$ model associated with a free-streaming wind (which are informative probes of coronal acceleration physics), quarterly binned measurements of $n_E$ and a continuous time-varying model for $n_E$ spanning approximately one solar cycle period. Finally, we discuss the importance of our model for chromatic noise mitigation in gravitational-wave analyses of pulsar timing data and the potential of developing synergies between sophisticated PTA solar electron density models and those developed by the solar physics community.
On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users progressively execute their respective workloads in the cloud. As the configuration of workloads and resources is often challenging, various methods have been proposed that either quickly profile towards a good configuration or determine one based on data from previous runs. Still, performance data to train such methods is often lacking and must be costly collected.
Robustness of Bayesian Neural Networks to White-Box Adversarial Attacks

Bayesian Neural Networks (BNNs), unlike Traditional Neural Networks (TNNs) are robust and adept at handling adversarial attacks by incorporating randomness. This randomness improves the estimation of uncertainty, a feature lacking in TNNs. Thus, we investigate the robustness of BNNs to white-box attacks using multiple Bayesian neural architectures. Furthermore, we create our BNN model, called BNN-DenseNet, by fusing Bayesian inference (i.e., variational Bayes) to the DenseNet architecture, and BDAV, by combining this intervention with adversarial training. Experiments are conducted on the CIFAR-10 and FGVC-Aircraft datasets. We attack our models with strong white-box attacks ($l_\infty$-FGSM, $l_\infty$-PGD, $l_2$-PGD, EOT $l_\infty$-FGSM, and EOT $l_\infty$-PGD). In all experiments, at least one BNN outperforms traditional neural networks during adversarial attack scenarios. An adversarially-trained BNN outperforms its non-Bayesian, adversarially-trained counterpart in most experiments, and often by significant margins. Lastly, we investigate network calibration and find that BNNs do not make overconfident predictions, providing evidence that BNNs are also better at measuring uncertainty.
The "Bayesian" brain, with a bit less Bayes

The idea that the brain is a probabilistic (Bayesian) inference machine, continuously trying to figure out the hidden causes of its inputs, has become very influential in cognitive (neuro)science over recent decades. Here I present a relatively straightforward generalization of this idea: the primary computational task that the brain is faced with is to track the probabilistic structure of observations themselves, without recourse to hidden states. Taking this starting point seriously turns out to have considerable explanatory power, and several key ideas are developed from it: (1) past experience, encoded in prior expectations, has an influence over the future that is analogous to regularization as known from machine learning; (2) action generation (interpreted as constraint satisfaction) is a special case of such regularization; (3) the concept of attractors in dynamical systems provides a useful lens through which prior expectations, regularization, and action induction can be viewed; these thus appear as different perspectives on the same phenomenon; (4) the phylogenetically ancient imperative of acting to ensure and thereby observe conditions beneficial for survival is likely the same as that which underlies perceptual inference. The Bayesian brain hypothesis has been touted as promising to deliver a "unified science of mind and action". In this paper, I sketch an informal step towards fulfilling that promise, while avoiding some pitfalls that other such attempts have fallen prey to.
