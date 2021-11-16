ContributorsPublishersAdvertisers
Covariate Shift in High-Dimensional Random Feature Regression

By Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington
 8 days ago

A significant obstacle in the development of robust machine learning models is covariate shift, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label distributions remain...

scitechdaily.com

Astrophysicists Reveal Largest-Ever Suite of Universe Simulations – How Gravity Shaped the Distribution of Dark Matter

To understand how the universe formed, astronomers have created AbacusSummit, more than 160 simulations of how gravity may have shaped the distribution of dark matter. Collectively clocking in at nearly 60 trillion particles, a newly released set of cosmological simulations is by far the biggest ever produced. The simulation suite,...
ASTRONOMY
arxiv.org

variable selection and missing data imputation in categorical genomic data analysis by integrated ridge regression and random forest

Genomic data arising from a genome-wide association study (GWAS) are often not only of large-scale, but also incomplete. A specific form of their incompleteness is missing values with non-ignorable missingness mechanism. The intrinsic complications of genomic data present significant challenges in developing an unbiased and informative procedure of phenotype-genotype association analysis by a statistical variable selection approach. In this paper we develop a coherent procedure of categorical phenotype-genotype association analysis, in the presence of missing values with non-ignorable missingness mechanism in GWAS data, by integrating the state-of-the-art methods of random forest for variable selection, weighted ridge regression with EM algorithm for missing data imputation, and linear statistical hypothesis testing for determining the missingness mechanism. Two simulated GWAS are used to validate the performance of the proposed procedure. The procedure is then applied to analyze a real data set from breast cancer GWAS.
SCIENCE
arxiv.org

The Covariance of Squeezed Bispectrum Configurations

We measure the halo bispectrum covariance in a large set of N-body simulations and compare it with theoretical expectations. We find a large correlation among (even mildly) squeezed halo bispectrum configurations. A similarly large correlation can be found between squeezed triangles and the long-wavelength halo power spectrum. This shows that the diagonal Gaussian contribution fails to describe, even approximately, the full covariance in these cases. We compare our numerical estimate with a model that includes, in addition to the Gaussian one, only the non-Gaussian terms that are large for squeezed configurations. We find that accounting for these large terms in the modeling greatly improves the agreement of the full covariance with simulations. We apply these results to a simple Fisher matrix forecast, and find that constraints on primordial non-Gaussianity are degraded by a factor of $\sim 2$ when a non-Gaussian covariance is assumed instead of the diagonal, Gaussian approximation.
SCIENCE
toolfarm.com

Audiomodern Random Generators Bundle

Contains Audiomodern's full Suite of Creative Audio & MIDI plugins. This great value bundle features all of Audiomodern’s award-winning software. Combines technology and artistic sensibility to push boundaries and deliver innovative new tools for musicians, composers, and producers. These ‘random generators’ will provide endless inspiration for your next music projects – whatever the genre!
COMPUTERS
realsport101.com

Siege Y6S4: New Elite 2.0 Customization features coming in High Calibre

ELITE 2.0 - A Customization Upgrade. Rainbow Six Siege's Elite 2.0 Customization features have been common knowledge for a while now, but they're finally going to be added to the game. This Elite 2.0 Customization will allow players to pick and choose their Operator's Headgear, Uniform, Operator Portrait, Card Background...
VIDEO GAMES
arxiv.org

Quantum multicritical point in the two- and three-dimensional random transverse-field Ising model

Quantum multicritical points (QMCPs) emerge at the junction of two or more quantum phase transitions due to the interplay of disparate fluctuations, leading to novel universality classes. While quantum critical points have been well characterized, our understanding of QMCPs is much more limited, even though they might be less elusive to study experimentally than quantum critical points. Here, we characterize the QMCP of an interacting heterogeneous quantum system in two and three dimensions, the ferromagnetic random transverse-field Ising model (RTIM). The QMCP of the RTIM emerges due to both geometric and quantum fluctuations, studied here numerically by the strong disorder renormalization group method. The QMCP of the RTIM is found to exhibit ultraslow, activated dynamic scaling, governed by an infinite disorder fixed point. This ensures that the obtained multicritical exponents tend to the exact values at large scales, while also being universal -- i.e. independent of the form of disorder -- , providing a solid theoretical basis for future experiments.
PHYSICS
arxiv.org

High-Dimensional Functional Mixed-effect Model for Bilevel Repeated Measurements

The bilevel functional data under consideration has two sources of repeated measurements. One is to densely and repeatedly measure a variable from each subject at a series of regular time/spatial points, which is named as functional data. The other is to repeatedly collect one functional data at each of the multiple visits. Compared to the well-established single-level functional data analysis approaches, those that are related to high-dimensional bilevel functional data are limited. In this article, we propose a high-dimensional functional mixed-effect model (HDFMM) to analyze the association between the bilevel functional response and a large scale of scalar predictors. We utilize B-splines to smooth and estimate the infinite-dimensional functional coefficient, a sandwich smoother to estimate the covariance function and integrate the estimation of covariance-related parameters together with all regression parameters into one framework through a fast updating MCMC procedure. We demonstrate that the performance of the HDFMM method is promising under various simulation studies and a real data analysis. As an extension of the well-established linear mixed model, the HDFMM model extends the response from repeatedly measured scalars to repeatedly measured functional data/curves, while maintaining the ability to account for the relatedness among samples and control for confounding factors.
COMPUTERS
Nature.com

Machine learning of high dimensional data on a noisy quantum processor

Quantum kernel methods show promise for accelerating data analysis by efficiently learning relationships between input data points that have been encoded into an exponentially large Hilbert space. While this technique has been used successfully in small-scale experiments on synthetic datasets, the practical challenges of scaling to large circuits on noisy hardware have not been thoroughly addressed. Here, we present our findings from experimentally implementing a quantum kernel classifier on real high-dimensional data taken from the domain of cosmology using Google's universal quantum processor, Sycamore. We construct a circuit ansatz that preserves kernel magnitudes that typically otherwise vanish due to an exponentially growing Hilbert space, and implement error mitigation specific to the task of computing quantum kernels on near-term hardware. Our experiment utilizes 17 qubits to classify uncompressed 67 dimensional data resulting in classification accuracy on a test set that is comparable to noiseless simulation.
SOFTWARE
arxiv.org

Interquantile Shrinkage in Spatial Quantile Autoregressive Regression models

Spatial dependent data frequently occur in many fields such as spatial econometrics and epidemiology. To deal with the dependence of variables and estimate quantile-specific effects by covariates, spatial quantile autoregressive models (SQAR models) are introduced. Conventional quantile regression only focuses on the fitting models but ignores the examination of multiple conditional quantile functions, which provides a comprehensive view of the relationship between the response and covariates. Thus, it is necessary to study the different regression slopes at different quantiles, especially in situations where the quantile coefficients share some common feature. However, traditional Wald multiple tests not only increase the burden of computation but also bring greater FDR. In this paper, we transform the estimation and examination problem into a penalization problem, which estimates the parameters at different quantiles and identifies the interquantile commonality at the same time. To avoid the endogeneity caused by the spatial lag variables in SQAR models, we also introduce instrumental variables before estimation and propose two-stage estimation methods based on fused adaptive LASSO and fused adaptive sup-norm penalty approaches. The oracle properties of the proposed estimation methods are established. Through numerical investigations, it is demonstrated that the proposed methods lead to higher estimation efficiency than the traditional quantile regression.
SCIENCE
arxiv.org

The Three Stages of Learning Dynamics in High-Dimensional Kernel Methods

To understand how deep learning works, it is crucial to understand the training dynamics of neural networks. Several interesting hypotheses about these dynamics have been made based on empirically observed phenomena, but there exists a limited theoretical understanding of when and why such phenomena occur. In this paper, we consider...
CODING & PROGRAMMING
towardsdatascience.com

4 Metrics to Evaluate your Regression Models

Regression problems are one of the most common problems to solve with Data Science and Machine Learning. When you want to predict a target with a (theoretical) number of infinite values, you are dealing with a regression problem — some examples are:. Predicting the income of some person based on...
SCIENCE
towardsdatascience.com

Power of dimensionality reduction for Data Story-telling

One of the best ways to use the power of dimensionality reduction is to create a data-story. Oh No! One more article on dimensionality reduction! But wait, this is different. There are many articles that explain dimensionality reduction from a technical point of view. However, I will focus on how you can use the power of dimensionality reduction in innovative ways. You will also better appreciate the awesome power of dimensionality reduction as I will focus on its use instead of the technical details.
SCIENCE
arxiv.org

Covariant 3+1 correspondence of the spatially covariant gravity and the degeneracy conditions

A necessary condition for a generally covariant scalar-tensor theory to be ghostfree is that it contains no extra degrees of freedom in the unitary gauge, in which the Lagrangian corresponds to the spatially covariant gravity. Comparing with analysing the scalar-tensor theory directly, it is simpler to map the spatially covariant gravity to the generally covariant scalar-tensor theory using the gauge recovering procedures. In order to ensure the resulting scalar-tensor theory to be ghostfree absolutely, i.e., no matter if the unitary gauge is accessible, a further covariant degeneracy/constraint analysis is required. We develop a method of covariant 3+1 correspondence, which map the spatially covariant gravity to the scalar-tensor theory in 3+1 decomposed form without fixing any coordinates. Then the degeneracy conditions to remove the extra degrees of freedom can be found easily. As an illustration of this approach, we show how the Horndeski theory is recovered from the spatially covariant gravity. This approach can be used to find more general ghostfree scalar-tensor theory.
SCIENCE
arxiv.org

An Approach of Bayesian Variable Selection for Ultrahigh Dimensional Multivariate Regression

In many practices, scientists are particularly interested in detecting which of the predictors are truly associated with a multivariate response. It is more accurate to model multiple responses as one vector rather than separating each component one by one. This is particularly true for complex traits having multiple correlated components. A Bayesian multivariate variable selection (BMVS) approach is proposed to select important predictors influencing the multivariate response from a candidate pool with an ultrahigh dimension. By applying the sample-size-dependent spike and slab priors, the BMVS approach satisfies the strong selection consistency property under certain conditions, which represents the advantages of BMVS over other existing Bayesian multivariate regression-based approaches. The proposed approach considers the covariance structure of multiple responses without assuming independence and integrates the estimation of covariance-related parameters together with all regression parameters into one framework through a fast updating MCMC procedure. It is demonstrated through simulations that the BMVS approach outperforms some other relevant frequentist and Bayesian approaches. The proposed BMVS approach possesses the flexibility of wide applications, including genome-wide association studies with multiple correlated phenotypes and a large scale of genetic variants and/or environmental variables, as demonstrated in the real data analyses section. The computer code and test data of the proposed method are available as an R package.
SCIENCE
FanBolt.Com

Random Name Generators

In need of a random name generator? Looking for ideas for a new username? Or maybe for a character you’re creating or a book you’re writing? Perhaps you’re just looking for a new username or a name for your next roleplaying gathering. Whatever the reason may be, we have you...
TECHNOLOGY
arxiv.org

Three dimensional Doppler tomography

Doppler tomography is a method to compute the emissivity distribution within the co-rotating frames of binary stars from observations of their emission line profiles at multiple orbital phases. A key assumption of the method as it is usually applied is that all gas flow is parallel to the orbital plane of the binary. In this paper I examine the possibility of lifting this assumption to allow for motion parallel to the orbital "$z$" axis of the binary as well. I show that the problem is best considered in Fourier space, and that line profiles directly constrain the 3D Fourier transform of the 3D Doppler image in velocity space, but only over the 2D surface of a double-cone centred upon the origin, and aligned with the axis reciprocal to the $v_z$ velocity axis. Hence the full information needed for the recovery of the 3D emissivity distribution is simply not available. Despite this, an inversion method is presented and tested on a number of simulated images. While artefacts resulting from the missing information do appear, the tests suggest that there could be some value in applying 3D Doppler tomography to data from real systems, although considerable care is needed when doing so.
ASTRONOMY
arxiv.org

An Improved Random Shift Algorithm for Spanners and Low Diameter Decompositions

Spanners have been shown to be a powerful tool in graph algorithms. Many spanner constructions use a certain type of clustering at their core, where each cluster has small diameter and there are relatively few spanner edges between clusters. In this paper, we provide a clustering algorithm that, given $k\geq 2$, can be used to compute a spanner of stretch $2k-1$ and expected size $O(n^{1+1/k})$ in $k$ rounds in the CONGEST model. This improves upon the state of the art (by Elkin, and Neiman [TALG'19]) by making the bounds on both running time and stretch independent of the random choices of the algorithm, whereas they only hold with high probability in previous results. Spanners are used in certain synchronizers, thus our improvement directly carries over to such synchronizers. Furthermore, for keeping the \emph{total} number of inter-cluster edges small in low diameter decompositions, our clustering algorithm provides the following guarantees. Given $\beta\in (0,1]$, we compute a low diameter decomposition with diameter bound $O\left(\frac{\log n}{\beta}\right)$ such that each edge $e\in E$ is an inter-cluster edge with probability at most $\beta\cdot w(e)$ in $O\left(\frac{\log n}{\beta}\right)$ rounds in the CONGEST model. Again, this improves upon the state of the art (by Miller, Peng, and Xu [SPAA'13]) by making the bounds on both running time and diameter independent of the random choices of the algorithm, whereas they only hold with high probability in previous results.
CODING & PROGRAMMING
towardsdatascience.com

One-dimensional CNN for human behavior classification

A step-by-step tutorial on ways to incorporate CNNs for time-series data. My first-ever publication on Medium was a deep dive into convolutional neural networks (CNN). In that post, I went through a step-by-step example on how to use this technique for medical image classification. It’s a powerful tool in the arsenal of any dataphile, and it’s important to recognize the CNN is not limited to computer vision tasks. Today, I will show you how this technique can be adapted for 1 dimensional sequential data.
SCIENCE
arxiv.org

High-Dimensional Multi-input Quantum Random Access Codes and Mutually Unbiased Bases

Quantum random access codes (QRACs) provide a basic tool for demonstrating the advantages of quantum resources and protocols, which have found a wide range of applications in quantum information processing tasks. However, the investigation and application of high-dimensional multi-input QRACs are still lacking. Here, we focus on $n$-dit string input QRACs with a $d$-dimensional system ($n^{(d)}\rightarrow1$ QRACs) and present a general method to find the maximum success probability of $n^{(d)}\rightarrow1$ QRACs. In particular, we give the analytical solution for maximum success probability of $3^{(d)}\rightarrow1$ QRACs under the limitation of mutually unbiased bases (MUBs). Based on the analytical solution, we show the relationship between MUBs and $n^{(d)}\rightarrow1$ QRACs. First, we provide a systematic method of searching for the operational inequivalence of MUBs (OI-MUBs) when the dimension $d$ is a prime power, which means that the choice of the subset of MUBs will affect the final results of quantum information tasks. Second, we theoretically prove that MUBs are not the optimal measurement bases to obtain the maximum success probability of $n^{(d)}\rightarrow1$ QRACs, which indicates a breakthrough according to the traditional conjecture regarding the optimal measurement bases. Furthermore, based on high-fidelity high-dimensional quantum states of orbital angular momentum, we experimentally achieve 3-input QRACs up to dimension 11. Finally, for the first time, we experimentally confirm the OI-MUBs when $d=5$. Our results open new avenues for investigating the foundational properties of quantum mechanics and quantum network coding.
COMPUTERS

