ContributorsPublishersAdvertisers
Science

Fast characterization of inducible regions of atrial fibrillation models with multi-fidelity Gaussian process classification

By Lia Gandera, Simone Pezzutoa, Ali Gharaviri, Rolf Krause, Paris Perdikaris, Francisco Sahli Costabal
arxiv.org
 4 days ago

Computational models of atrial fibrillation have successfully been used to predict optimal ablation sites. A critical step to assess the effect of an ablation pattern is to pace the model from different, potentially random, locations to determine whether...

arxiv.org

Comments / 0

Related
arxiv.org

Extrapolation Frameworks in Cognitive Psychology Suitable for Study of Image Classification Models

We study the functional task of deep learning image classification models and show that image classification requires extrapolation capabilities. This suggests that new theories have to be developed for the understanding of deep learning as the current theory assumes models are solely interpolating, leaving many questions about them unanswered. We investigate the pixel space and also the feature spaces extracted from images by trained models (in their hidden layers, including the 64-dimensional feature space in the last hidden layer of pre-trained residual neural networks), and also the feature space extracted by wavelets/shearlets. In all these domains, testing samples considerably fall outside the convex hull of training sets, and image classification requires extrapolation. In contrast to the deep learning literature, in cognitive science, psychology, and neuroscience, extrapolation and learning are often studied in tandem. Moreover, many aspects of human visual cognition and behavior are reported to involve extrapolation. We propose a novel extrapolation framework for the mathematical study of deep learning models. In our framework, we use the term extrapolation in this specific way of extrapolating outside the convex hull of training set (in the pixel space or feature space) but within the specific scope defined by the training data, the same way extrapolation is defined in many studies in cognitive science. We explain that our extrapolation framework can provide novel answers to open research problems about deep learning including their over-parameterization, their training regime, out-of-distribution detection, etc. We also see that the extent of extrapolation is negligible in learning tasks where deep learning is reported to have no advantage over simple models.
MENTAL HEALTH
arxiv.org

DriPP: Driven Point Processes to Model Stimuli Induced Patterns in M/EEG Signals

The quantitative analysis of non-invasive electrophysiology signals from electroencephalography (EEG) and magnetoencephalography (MEG) boils down to the identification of temporal patterns such as evoked responses, transient bursts of neural oscillations but also blinks or heartbeats for data cleaning. Several works have shown that these patterns can be extracted efficiently in an unsupervised way, e.g., using Convolutional Dictionary Learning. This leads to an event-based description of the data. Given these events, a natural question is to estimate how their occurrences are modulated by certain cognitive tasks and experimental manipulations. To address it, we propose a point process approach. While point processes have been used in neuroscience in the past, in particular for single cell recordings (spike trains), techniques such as Convolutional Dictionary Learning make them amenable to human studies based on EEG/MEG signals. We develop a novel statistical point process model-called driven temporal point processes (DriPP)-where the intensity function of the point process model is linked to a set of point processes corresponding to stimulation events. We derive a fast and principled expectation-maximization (EM) algorithm to estimate the parameters of this model. Simulations reveal that model parameters can be identified from long enough signals. Results on standard MEG datasets demonstrate that our methodology reveals event-related neural responses-both evoked and induced-and isolates non-task specific temporal patterns.
SCIENCE
arxiv.org

Joint Posterior Inference for Latent Gaussian Models with R-INLA

Efficient Bayesian inference remains a computational challenge in hierarchical models. Simulation-based approaches such as Markov Chain Monte Carlo methods are still popular but have a large computational cost. When dealing with the large class of Latent Gaussian Models, the INLA methodology embedded in the R-INLA software provides accurate Bayesian inference by computing deterministic mixture representation to approximate the joint posterior, from which marginals are computed. The INLA approach has from the beginning been targeting to approximate univariate posteriors. In this paper we lay out the development foundation of the tools for also providing joint approximations for subsets of the latent field. These approximations inherit Gaussian copula structure and additionally provide corrections for skewness. The same idea is carried forward also to sampling from the mixture representation, which we now can adjust for skewness.
CODING & PROGRAMMING
arxiv.org

Conditional Gaussian Nonlinear System: a Fast Preconditioner and a Cheap Surrogate Model For Complex Nonlinear Systems

Developing suitable approximate models for analyzing and simulating complex nonlinear systems is practically important. This paper aims at exploring the skill of a rich class of nonlinear stochastic models, known as the conditional Gaussian nonlinear system (CGNS), as both a cheap surrogate model and a fast preconditioner for facilitating many computationally challenging tasks. The CGNS preserves the underlying physics to a large extent and can reproduce intermittency, extreme events and other non-Gaussian features in many complex systems arising from practical applications. Three interrelated topics are studied. First, the closed analytic formulae of solving the conditional statistics provide an efficient and accurate data assimilation scheme. It is shown that the data assimilation skill of a suitable CGNS approximate forecast model outweighs that by applying an ensemble method even to the perfect model with strong nonlinearity, where the latter suffers from filter divergence. Second, the CGNS allows the development of a fast algorithm for simultaneously estimating the parameters and the unobserved variables with uncertainty quantification in the presence of only partial observations. Utilizing an appropriate CGNS as a preconditioner significantly reduces the computational cost in accurately estimating the parameters in the original complex system. Finally, the CGNS advances rapid and statistically accurate algorithms for computing the probability density function and sampling the trajectories of the unobserved state variables. These fast algorithms facilitate the development of an efficient and accurate data-driven method for predicting the linear response of the original system with respect to parameter perturbations based on a suitable CGNS preconditioner.
COMPUTERS
IN THIS ARTICLE
#Atrial Fibrillation#Gaussian Process#Multi#Fidelity#Lg
arxiv.org

Gaussian Process Constraint Learning for Scalable Chance-Constrained Motion Planning from Demonstrations

We propose a method for learning constraints represented as Gaussian processes (GPs) from locally-optimal demonstrations. Our approach uses the Karush-Kuhn-Tucker (KKT) optimality conditions to determine where on the demonstrations the constraint is tight, and a scaling of the constraint gradient at those states. We then train a GP representation of the constraint which is consistent with and which generalizes this information. We further show that the GP uncertainty can be used within a kinodynamic RRT to plan probabilistically-safe trajectories, and that we can exploit the GP structure within the planner to exactly achieve a specified safety probability. We demonstrate our method can learn complex, nonlinear constraints demonstrated on a 5D nonholonomic car, a 12D quadrotor, and a 3-link planar arm, all while requiring minimal prior information on the constraint. Our results suggest the learned GP constraint is accurate, outperforming previous constraint learning methods that require more a priori knowledge.
COMPUTERS
arxiv.org

Numerical methods for Mean field Games based on Gaussian Processes and Fourier Features

In this article, we propose two numerical methods, the Gaussian Process (GP) method and the Fourier Features (FF) algorithm, to solve mean field games (MFGs). The GP algorithm approximates the solution of a MFG with maximum a posteriori probability estimators of GPs conditioned on the partial differential equation (PDE) system of the MFG at a finite number of sample points. The main bottleneck of the GP method is to compute the inverse of a square gram matrix, whose size is proportional to the number of sample points. To improve the performance, we introduce the FF method, whose insight comes from the recent trend of approximating positive definite kernels with random Fourier features. The FF algorithm seeks approximated solutions in the space generated by sampled Fourier features. In the FF method, the size of the matrix to be inverted depends only on the number of Fourier features selected, which is much less than the size of sample points. Hence, the FF method reduces the precomputation time, saves the memory, and achieves comparable accuracy to the GP method. We give the existence and the convergence proofs for both algorithms. The convergence argument of the GP method does not depend on the Lasry-Lions monotonicity condition, which suggests the potential applications of the GP method to solve MFGs with non-monotone couplings in future work. We show the efficacy of our algorithms through experiments on a stationary MFG with a non-local coupling and on a time-dependent planning problem. We believe that the FF method can also serve as an alternative algorithm to solve general PDEs.
COMPUTERS
arxiv.org

Mesoscale Modelling of the Tolman Length in Multi-component Systems

In this paper we analyze the curvature corrections to the surface tension in the context of the Shan-Chen (SC) multi-component Lattice Boltzmann method (LBM). We demonstrate that the same techniques recently applied in the context of the Shan-Chen multi-phase model can be applied to multi-component mixtures. We implement, as a new application, the calculation of the surface of tension radius $R_s$ through the minimization of the generalized surface tension $\sigma[R]$. In turn we are able to estimate the Tolman length, i.e. the first order coefficient of the curvature expansion of the surface tension $\sigma(R)$, as well as the higher order corrections, i.e. the curvature- and the Gaussian-rigidity coefficients. The SC multi-component model allows to model both fully-symmetric as well as asymmetric interactions among the components. By performing an extensive set of simulations we present a first example of tunable Tolman length in the mesoscopic model, being zero for symmetric interactions and different from zero otherwise. This result paves the way for controlling such interface properties which are paramount in presence of thermal fluctuations. All reported results can be independently reproduced through the "idea.deploy" framework available at this https URL.
MATHEMATICS
arxiv.org

Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing

Motivation: A perennial challenge for biomedical researchers and clinical practitioners is to stay abreast with the rapid growth of publications and medical notes. Natural language processing (NLP) has emerged as a promising direction for taming information overload. In particular, large neural language models facilitate transfer learning by pretraining on unlabeled text, as exemplified by the successes of BERT models in various NLP applications. However, fine-tuning such models for an end task remains challenging, especially with small labeled datasets, which are common in biomedical NLP.
SCIENCE
YOU MAY ALSO LIKE
NewsBreak
Health
NewsBreak
Science
arxiv.org

Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling

Neural autoregressive sequence models smear the probability among many possible sequences including degenerate ones, such as empty or repetitive sequences. In this work, we tackle one specific case where the model assigns a high probability to unreasonably short sequences. We define the oversmoothing rate to quantify this issue. After confirming the high degree of oversmoothing in neural machine translation, we propose to explicitly minimize the oversmoothing rate during training. We conduct a set of experiments to study the effect of the proposed regularization on both model distribution and decoding performance. We use a neural machine translation task as the testbed and consider three different datasets of varying size. Our experiments reveal three major findings. First, we can control the oversmoothing rate of the model by tuning the strength of the regularization. Second, by enhancing the oversmoothing loss contribution, the probability and the rank of <eos> token decrease heavily at positions where it is not supposed to be. Third, the proposed regularization impacts the outcome of beam search especially when a large beam is used. The degradation of translation quality (measured in BLEU) with a large beam significantly lessens with lower oversmoothing rate, but the degradation compared to smaller beam sizes remains to exist. From these observations, we conclude that the high degree of oversmoothing is the main reason behind the degenerate case of overly probable short sequences in a neural autoregressive model.
CODING & PROGRAMMING
arxiv.org

Visual Transformers with Primal Object Queries for Multi-Label Image Classification

Multi-label image classification is about predicting a set of class labels that can be considered as orderless sequential data. Transformers process the sequential data as a whole, therefore they are inherently good at set prediction. The first vision-based transformer model, which was proposed for the object detection task introduced the concept of object queries. Object queries are learnable positional encodings that are used by attention modules in decoder layers to decode the object classes or bounding boxes using the region of interests in an image. However, inputting the same set of object queries to different decoder layers hinders the training: it results in lower performance and delays convergence. In this paper, we propose the usage of primal object queries that are only provided at the start of the transformer decoder stack. In addition, we improve the mixup technique proposed for multi-label classification. The proposed transformer model with primal object queries improves the state-of-the-art class wise F1 metric by 2.1% and 1.8%; and speeds up the convergence by 79.0% and 38.6% on MS-COCO and NUS-WIDE datasets respectively.
COMPUTERS
arxiv.org

MPLR: a novel model for multi-target learning of logical rules for knowledge graph reasoning

Large-scale knowledge graphs (KGs) provide structured representations of human knowledge. However, as it is impossible to contain all knowledge, KGs are usually incomplete. Reasoning based on existing facts paves a way to discover missing facts. In this paper, we study the problem of learning logic rules for reasoning on knowledge graphs for completing missing factual triplets. Learning logic rules equips a model with strong interpretability as well as the ability to generalize to similar tasks. We propose a model called MPLR that improves the existing models to fully use training data and multi-target scenarios are considered. In addition, considering the deficiency in evaluating the performance of models and the quality of mined rules, we further propose two novel indicators to help with the problem. Experimental results empirically demonstrate that our MPLR model outperforms state-of-the-art methods on five benchmark datasets. The results also prove the effectiveness of the indicators.
COMPUTERS
arxiv.org

Cumulants asymptotics for the zeros counting measure of real Gaussian processes

We compute the exact asymptotics for the cumulants of linear statistics associated with the zeros counting measure of a large class of real Gaussian processes. Precisely, we show that if the underlying covariance function is regular and square integrable, the cumulants of order higher than two of these statistics asymptotically vanish. This result implies in particular that the number of zeros of such processes satisfies a central limit theorem. Our methods refines the recent approach by T. Letendre and M. Ancona and allows us to prove a stronger quantitative asymptotics, under weaker hypotheses on the underlying process. The proof exploits in particular the elegant interplay between the combinatorial structures of cumulants and factorial moments in order to simplify the determination of the asymptotics of nodal observables. The class of processes addressed by our main theorem englobes as motivating examples random Gaussian trigonometric polynomials, random orthogonal polynomials and the universal Gaussian process with sinc kernel on the real line, for which the asymptotics of higher moments of the number of zeros were so far only conjectured.
MATHEMATICS
arxiv.org

CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain

The field of natural language processing (NLP) has recently seen a large change towards using pre-trained language models for solving almost any task. Despite showing great improvements in benchmark datasets for various tasks, these models often perform sub-optimal in non-standard domains like the clinical domain where a large gap between pre-training documents and target documents is observed. In this paper, we aim at closing this gap with domain-specific training of the language model and we investigate its effect on a diverse set of downstream tasks and settings. We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models by a large margin for ten clinical concept extraction tasks from two languages. In addition, we demonstrate how the transformer model can be further improved with our proposed task- and language-agnostic model architecture based on ensembles over random splits and cross-sentence context. Our studies in low-resource and transfer settings reveal stable model performance despite a lack of annotated data with improvements of up to 47 F1points when only 250 labeled sentences are available. Our results highlight the importance of specialized language models as CLIN-X for concept extraction in non-standard domains, but also show that our task-agnostic model architecture is robust across the tested tasks and languages so that domain- or task-specific adaptations are not required. The CLIN-Xlanguage models and source code for fine-tuning and transferring the model are publicly available at this https URL\_x/ and the huggingface model hub.
SCIENCE
arxiv.org

Thermodynamic and Scaling Limits of the non-Gaussian Membrane Model

We characterize the behavior of a random discrete interface $\phi$ on $[-L,L]^d \cap \mathbb{Z}^d$ with energy $\sum V(\Delta \phi(x))$ as $L \to \infty$, where $\Delta$ is the discrete Laplacian and $V$ is a uniformly convex, symmetric, and smooth potential. The interface $\phi$ is called the non-Gaussian membrane model. By analyzing the Helffer-Sjöstrand representation associated to $\Delta \phi$, we provide a unified approach to continuous scaling limits of the rescaled and interpolated interface in dimensions $d=2,3$, Gaussian approximation in negative regularity spaces for all $d \geq 2$, and the infinite volume limit in $d \geq 5$. Our results generalize some of those of arXiv:1801.05663.
MATHEMATICS
arxiv.org

BayesFlow can reliably detect Model Misspecification and Posterior Errors in Amortized Bayesian Inference

Neural density estimators have proven remarkably powerful in performing efficient simulation-based Bayesian inference in various research domains. In particular, the BayesFlow framework uses a two-step approach to enable amortized parameter estimation in settings where the likelihood function is implicitly defined by a simulation program. But how faithful is such inference when simulations are poor representations of reality? In this paper, we conceptualize the types of model misspecification arising in simulation-based inference and systematically investigate the performance of the BayesFlow framework under these misspecifications. We propose an augmented optimization objective which imposes a probabilistic structure on the latent data space and utilize maximum mean discrepancy (MMD) to detect potentially catastrophic misspecifications during inference undermining the validity of the obtained results. We verify our detection criterion on a number of artificial and realistic misspecifications, ranging from toy conjugate models to complex models of decision making and disease outbreak dynamics applied to real data. Further, we show that posterior inference errors increase as a function of the distance between the true data-generating distribution and the typical set of simulations in the latent summary space. Thus, we demonstrate the dual utility of MMD as a method for detecting model misspecification and as a proxy for verifying the faithfulness of amortized Bayesian inference.
COMPUTERS
arxiv.org

Learning and Analyzing Generation Order for Undirected Sequence Models

Undirected neural sequence models have achieved performance competitive with the state-of-the-art directed sequence models that generate monotonically from left to right in machine translation tasks. In this work, we train a policy that learns the generation order for a pre-trained, undirected translation model via reinforcement learning. We show that the translations decoded by our learned orders achieve higher BLEU scores than the outputs decoded from left to right or decoded by the learned order from Mansimov et al. (2019) on the WMT'14 German-English translation task. On examples with a maximum source and target length of 30 from De-En, WMT'16 English-Romanian, and WMT'21 English-Chinese translation tasks, our learned order outperforms all heuristic generation orders on four out of six tasks. We next carefully analyze the learned order patterns via qualitative and quantitative analysis. We show that our policy generally follows an outer-to-inner order, predicting the left-most and right-most positions first, and then moving toward the middle while skipping less important words at the beginning. Furthermore, the policy usually predicts positions for a single syntactic constituent structure in consecutive steps. We believe our findings could provide more insights on the mechanism of undirected generation models and encourage further research in this direction. Our code is publicly available at this https URL.
CODING & PROGRAMMING
arxiv.org

A molecular generative model with genetic algorithm and tree search for cancer samples

Personalized medicine is expected to maximize the intended drug effects and minimize side effects by treating patients based on their genetic profiles. Thus, it is important to generate drugs based on the genetic profiles of diseases, especially in anticancer drug discovery. However, this is challenging because the vast chemical space and variations in cancer properties require a huge time resource to search for proper molecules. Therefore, an efficient and fast search method considering genetic profiles is required for de novo molecular design of anticancer drugs. Here, we propose a faster molecular generative model with genetic algorithm and tree search for cancer samples (FasterGTS). FasterGTS is constructed with a genetic algorithm and a Monte Carlo tree search with three deep neural networks: supervised learning, self-trained, and value networks, and it generates anticancer molecules based on the genetic profiles of a cancer sample. When compared to other methods, FasterGTS generated cancer sample-specific molecules with general chemical properties required for cancer drugs within the limited numbers of samplings. We expect that FasterGTS contributes to the anticancer drug generation.
CANCER
arxiv.org

A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules

Xinfeng Xie, Prakash Prabhu, Ulysse Beaugnon, Phitchaya Mangpo Phothilimthana, Sudip Roy, Azalia Mirhoseini, Eugene Brevdo, James Laudon, Yanqi Zhou. Multi-Chip-Modules (MCMs) reduce the design and fabrication cost of machine learning (ML) accelerators while delivering performance and energy efficiency on par with a monolithic large chip. However, ML compilers targeting MCMs need to solve complex optimization problems optimally and efficiently to achieve this high performance. One such problem is the multi-chip partitioning problem where compilers determine the optimal partitioning and placement of operations in tensor computation graphs on chiplets in MCMs. Partitioning ML graphs for MCMs is particularly hard as the search space grows exponentially with the number of chiplets available and the number of nodes in the neural network. Furthermore, the constraints imposed by the underlying hardware produce a search space where valid solutions are extremely sparse. In this paper, we present a strategy using a deep reinforcement learning (RL) framework to emit a possibly invalid candidate partition that is then corrected by a constraint solver. Using the constraint solver ensures that RL encounters valid solutions in the sparse space frequently enough to converge with fewer samples as compared to non-learned strategies. The architectural choices we make for the policy network allow us to generalize across different ML graphs. Our evaluation of a production-scale model, BERT, on real hardware reveals that the partitioning generated using RL policy achieves 6.11% and 5.85% higher throughput than random search and simulated annealing. In addition, fine-tuning the pre-trained RL policy reduces the search time from 3 hours to only 9 minutes, while achieving the same throughput as training RL policy from scratch.
COMPUTERS
arxiv.org

Multi-User Holographic MIMO Surface: Channel Modeling and Spectral Efficiency Analysis

Li Wei, Chongwen Huang, George C. Alexandropoulos, Wei E. I. Sha, Zhaoyang Zhang, Merouane Debbah, Chau Yuen. The multi-user Holographic Multiple-Input and Multiple-Output Surface (MU-HMIMOS) paradigm, which is capable of realizing large continuous apertures with minimal power consumption, has been recently considered as an energyefficient solution for future wireless networks, offering the increased flexibility in impacting electromagnetic wave propagation according to the desired communication, localization, and sensing objectives. The tractable channel modeling of MU-HMIMOS systems is one of the most critical challenges, mainly due to the coupling effect induced by the excessively large number of closely spaced patch antennas. In this paper, we focus on this challenge for downlink multi-user communications and model the electromagnetic channel in the wavenumber domain using the Fourier plane wave representation. Based on the proposed channel model, we devise the maximum-ratio transmission and Zero-Forcing (ZF) precoding schemes capitalizing on the sampled channel variance that depends on the number and spacing of the patch antennas in MU-HMIMOS, and present their analytical spectral efficiency performance. Moreover, we propose a low computational ZF precoding scheme leveraging Neumann series expansion to replace the matrix inversion, since it is practically impossible to perform direct matrix inversion when the number of patch antennas is extremely large. Our extensive simulation results showcase the impact of the number of patch antennas and their spacing on the spectral efficiency of the considered systems. It is shown that the more patch antennas and larger spacing results in improved performance due to the decreased correlation among the patches.
COMPUTERS
arxiv.org

Attention-Based Model and Deep Reinforcement Learning for Distribution of Event Processing Tasks

Event processing is the cornerstone of the dynamic and responsive Internet of Things (IoT). Recent approaches in this area are based on representational state transfer (REST) principles, which allow event processing tasks to be placed at any device that follows the same principles. However, the tasks should be properly distributed among edge devices to ensure fair resources utilization and guarantee seamless execution. This article investigates the use of deep learning to fairly distribute the tasks. An attention-based neural network model is proposed to generate efficient load balancing solutions under different scenarios. The proposed model is based on the Transformer and Pointer Network architectures, and is trained by an advantage actor-critic reinforcement learning algorithm. The model is designed to scale to the number of event processing tasks and the number of edge devices, with no need for hyperparameters re-tuning or even retraining. Extensive experimental results show that the proposed model outperforms conventional heuristics in many key performance indicators. The generic design and the obtained results show that the proposed model can potentially be applied to several other load balancing problem variations, which makes the proposal an attractive option to be used in real-world scenarios due to its scalability and efficiency.
COMPUTERS

Comments / 0

Community Policy