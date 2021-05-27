Cancel
CreatorsPublishersAdvertisers
View more in
Mathematics

On the Baum--Katz theorem for sequences of pairwise independent random variables with regularly varying normalizing constants

By Lê Vǎn Thành
arxiv.org
 22 days ago

This paper proves the Baum--Katz theorem for sequences of pairwise independent identically distributed random variables with general norming constants under optimal moment conditions. The proof exploits some properties of slowly varying functions and the de Bruijn conjugates, and uses the techniques developed by Rio (1995) to avoid using the maximal type inequalities.

arxiv.org
IN THIS ARTICLE
#Sequences#Theorem#Variables#Rio#Comptes#60f15 Journal
YOU MAY ALSO LIKE
News Break
Mathematics
News Break
Science
News Break
Computer Science
Related
Computersarxiv.org

A direct product theorem for quantum communication complexity with applications to device-independent QKD

We give a direct product theorem for the entanglement-assisted interactive quantum communication complexity of an $l$-player predicate $\mathsf{V}$. In particular we show that for a distribution $p$ that is product across the input sets of the $l$ players, the success probability of any entanglement-assisted quantum communication protocol for computing $n$ copies of $\mathsf{V}$, whose communication is $o(\log(\mathrm{eff}^*(\mathsf{V},p))\cdot n)$, goes down exponentially in $n$. Here $\mathrm{eff}^*(\mathsf{V}, p)$ is a distributional version of the quantum efficiency or partition bound introduced by Laplante, Lerays and Roland (2014), which is a lower bound on the distributional quantum communication complexity of computing a single copy of $\mathsf{V}$ with respect to $p$.
Computersarxiv.org

Batch Normalization Orthogonalizes Representations in Deep Random Networks

This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, under a mild assumption, we prove that the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main implications: 1) Theoretically, as the depth grows, the distribution of the representation -- after the linear layers -- contracts to a Wasserstein-2 ball around an isotropic Gaussian distribution. Furthermore, the radius of this Wasserstein ball shrinks with the width of the network. 2) In practice, the orthogonality of the representations directly influences the performance of stochastic gradient descent (SGD). When representations are initially aligned, we observe SGD wastes many iterations to orthogonalize representations before the classification. Nevertheless, we experimentally show that starting optimization from orthogonal representations is sufficient to accelerate SGD, with no need for BN.
Mathematicsarxiv.org

Near-squares in binary recurrence sequences

We call an integer a \emph{near-square} if it is a prime times a square. In 1993, Mignotte and Pethő proved that, for all integers $a>3$, there are no elements that are squares, two times squares or three times squares in the sequence defined by $u_{0}=0$, $u_{1}=1$ and $u_{n+2}=au_{n+1}-u_{n}$ for $n \geq 0$, once $n>6$.
Astronomyarxiv.org

The Penrose Property with a Cosmological Constant

A spacetime possesses the Penrose property if the timelike future of any point on $\mathcal{I}^-$ contains the whole of $\mathcal{I}^+$. This property was first defined by Penrose (along with two other equivalent definitions). In this paper we consider the Penrose property in greater generality. In particular, we discuss spacetimes with non-zero cosmological constant. This requires us to reconsider the three equivalent definitions of the property given by Penrose. We find that two of these generalise to a sensible notion of the Penrose property which remain equivalent, while the third (the ''finite version'') does not. We then move on to consider some further example spacetimes (with zero cosmological constant) which highlight some features of the Penrose property which were not discussed previously. We discuss the Ellis-Bronnikov wormhole (an example of a spacetime with more than one asymptotically flat end), the Hayward metric (an example of a non-singular black hole spacetime) and the black string spacetime (which is topologically $\Bbb{R}^{d+1}\times S^1$ so is not asymptotically flat). The Penrose property in each of these spacetimes is discussed.
Astronomyarxiv.org

A Radial Velocity Search for Binary RR Lyrae Variables

We report 272 radial velocities for 19 RR Lyrae variables. For most of the stars we have radial velocities for the complete pulsation cycle. These data are used to determine robust center--of--mass radial velocities that have been compared to values from the literature in a search for evidence of binary systems. Center--of--mass velocities were determined for each star using Fourier Series and Template fits to the radial velocities. Our center--of--mass velocities have uncertainties from $\pm0.16$ km s$^{-1}$ to $\pm$2.5 km s$^{-1}$, with a mean uncertainty of $\pm$0.92 km s$^{-1}$. We combined our center--of--mass velocities with values from the literature to look for deviations from the mean center--of--mass velocity of each star. Fifteen RR Lyrae show no evidence of binary motion (BK And, CI And, Z CVn, DM Cyg, BK Dra, RR Gem, XX Hya, SZ Leo, BX Leo, TT Lyn, CN Lyr, TU Per, U Tri, RV UMa, and AV Vir). In most cases this conclusion is reached due to the sporadic sampling of the center--of--mass velocities over time. Three RR Lyrae show suspicious variation in the center--of--mass velocities that may indicate binary motion but do not prove it (SS Leo, ST Leo, and AO Peg). TU UMa was observed by us near a predicted periastron passage (at 0.14 in orbital phase) but the absence of additional center--of--mass velocities near periastron make the binary detection, based on radial velocities alone, uncertain. Two stars in our sample show $H\gamma$ emission in phases 0.9--1.0: SS Leo and TU UMa.
Mathematicsarxiv.org

Title:Finding normal binary floating-point factors in constant time

Abstract: Solving the floating-point equation $x \otimes y = z$, where $x$, $y$ and $z$ belong to floating-point intervals, is a common task in automated reasoning for which no efficient algorithm is known in general. We show that it can be solved by computing a constant number of floating-point factors, and give a constant-time algorithm for computing successive normal floating-point factors of normal floating-point numbers in radix 2. This leads to a constant-time procedure for solving the given equation.
Softwarearxiv.org

Simulating time-varying strong lenses

We present a self-consistent and versatile forward modelling software package that can produce time series and pixel-level simulations of time-varying strongly lensed systems. The time dimension, which needs to take into account different physical mechanisms for variability such as microlensing, has been missing from existing approaches and it is of direct relevance to time delay, and consequently H0, measurements and caustic crossing event predictions. Such experiments are becoming more streamlined, especially with the advent of time domain surveys, and understanding their systematic and statistical uncertainties in a model-aware and physics-driven way can help improve their accuracy and precision. Here we demonstrate the software's capabilities by exploring the effect of measuring time delays from lensed quasars and supernovae in many wavelengths and under different microlensing and intrinsic variability assumptions. In this initial application, we find that the cadence of the observations and combining information from different wavelengths plays an important role in the correct recovery of the time delays. The Mock Lenses in Time (MOLET) software package is available at: \url{this https URL}
Mathematicsarxiv.org

On nonlinear Rudin-Carleson type theorem

In this paper we study nonlinear interpolation problems for interpolation and peak-interpolation sets of function algebras. The subject goes back to the classical Rudin-Carleson interpolation theorem. In particular, we prove the following nonlinear version of this theorem:. Let $\bar{\mathbb D}\subset \mathbb C$ be the closed unit disk, $\mathbb T\subset\bar{\mathbb D}$...
Mathematicsjohndcook.com

Justifying separation of variables

The separation of variables technique for solving partial differential equations looks like a magic trick the first time you see it. The lecturer, or author if you’re more self-taught, makes an audacious assumption, like pulling a rabbit out of a hat, and it works. For example, you might first see...
Mathematicsarxiv.org

A Fubini type theorem for rough integration

We develop the integration theory of two-parameter controlled paths $Y$ allowing us to define integrals of the form \begin{equation}. \;d(X_{r}, X_{r'}) \end{equation} where $X$ is the geometric $p$-rough path that controls $Y$. This extends to arbitrary regularity the definition presented for $2\leq p<3$ in the recent paper of Hairer and Gerasimovičs where it is used in the proof of a version of Hörmander's theorem for a class of SPDEs. We extend the Fubini type theorem of the same paper by showing that this two-parameter integral coincides with the two iterated one-parameter integrals \[
Mathematicsarxiv.org

Matching Patterns with Variables under Hamming Distance

A pattern $\alpha$ is a string of variables and terminal letters. We say that $\alpha$ matches a word $w$, consisting only of terminal letters, if $w$ can be obtained by replacing the variables of $\alpha$ by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. In this paper, we approach this problem in a generalized setting, by considering approximate pattern matching under Hamming distance. More precisely, we are interested in what is the minimum Hamming distance between $w$ and any word $u$ obtained by replacing the variables of $\alpha$ by terminal words. Firstly, we address the class of regular patterns (in which no variable occurs twice) and propose efficient algorithms for this problem, as well as matching conditional lower bounds. We show that the problem can still be solved efficiently if we allow repeated variables, but restrict the way the different variables can be interleaved according to a locality parameter. However, as soon as we allow a variable to occur more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if none of them occurs more than once, the problem becomes intractable.
Engineeringarxiv.org

Staircase Attention for Recurrent Processing of Sequences

Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture. In this work we introduce a novel attention procedure called staircase attention that, unlike self-attention, operates across the sequence (in time) recurrently processing the input by adding another step of processing. A step in the staircase comprises of backward tokens (encoding the sequence so far seen) and forward tokens (ingesting a new part of the sequence), or an extreme Ladder version with a forward step of zero that simply repeats the Transformer on each step of the ladder, sharing the weights. We thus describe a family of such models that can trade off performance and compute, by either increasing the amount of recurrence through time, the amount of sequential processing via recurrence in depth, or both. Staircase attention is shown to be able to solve tasks that involve tracking that conventional Transformers cannot, due to this recurrence. Further, it is shown to provide improved modeling power for the same size model (number of parameters) compared to self-attentive Transformers on large language modeling and dialogue tasks, yielding significant perplexity gains.
Coding & Programmingarxiv.org

Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

Much recent research has been dedicated to improving the efficiency of training and inference for image classification. This effort has commonly focused on explicitly improving theoretical efficiency, often measured as ImageNet validation accuracy per FLOP. These theoretical savings have, however, proven challenging to achieve in practice, particularly on high-performance training accelerators.
Sciencearxiv.org

Generation of variable compressibility in turbulence at fixed Reynolds numbers

The Variable Density and Speed of Sound Vessel (VDSSV) produces subsonic turbulent flows that are both compressible and observable at all scales with existing instrumentation including hot wires and particle tracking. We realize this objective by looking at the flow of a heavy gas (sulfur hexafluoride $SF_6$), with a speed of sound almost three times lower than for air. By switching between air and $SF_6$, we isolate the influence of the turbulent Mach number (up to $M_t$ = 0.17) on turbulence statistics from the influences of changes in the Reynolds number (up to $R_{\lambda}$ = 1600), and boundary conditions, which we hold constant. A free shear flow is produced by a ducted fan, and we show that it behaves like a turbulent jet in that the mean velocity profiles approach self-similarity with increasing distance from the orifice (up to $x/D_f$ = 9). The jet responds like a compressible shear layer in that it spreads more slowly at higher Mach numbers (up to $M_j$ = 0.7) than at low Mach numbers. In contrast, the integral length scales and Kolmogorov constant of the turbulence are approximately invariant with respect to changes in either the Reynolds or Mach numbers. We briefly report on instrumentation under development that will extend the accessible Taylor-scale Reynolds and turbulent Mach numbers to 4000 and 0.3, respectively.
ScienceAPS physics

Quantum Coarse Graining for Extreme Dimension Reduction in Modeling Stochastic Temporal Dynamics

Stochastic modeling of complex systems plays an essential, yet often computationally intensive, role across the quantitative sciences. Recent advances in quantum information processing have elucidated the potential for quantum simulators to exhibit memory advantages for such tasks. Heretofore, the focus has been on lossless memory compression, wherein the advantage is typically in terms of lessening the amount of information tracked by the model, while—arguably more practical—reductions in memory dimension are not always possible. Here, we address the case of lossy compression for quantum stochastic modeling of continuous-time processes, introducing a method for coarse graining in quantum state space that drastically reduces the requisite memory dimension for modeling temporal dynamics while retaining near-exact statistics. In contrast to classical coarse graining, this compression is not based on sacrificing temporal resolution and brings memory-efficient high-fidelity stochastic modeling within reach of present quantum technologies.
Mathematicsarxiv.org

A Liouville comparison principle for solutions of semilinear elliptic second-order partial differential inequalities

We consider semilinear elliptic second-order partial differential inequalities of the form Lu +|u|q-1u < and = Lv +|v|q-1v (*) in the whole space Rn, where n > and = 2, q > 0 and L is a linear elliptic second-order partial differential operator in divergence form. We assume that the coefficients of the operator L are defined, measurable and locally bounded in Rn, and that the quadratic form associated with the operator L is symmetric and non-negative definite. We obtain a Liouville comparison principle in terms of a capacity associated with the operator L for solutions of (*), which are defined and measurable in Rn and which belong locally to a Sobolev-type function space also associated with the operator L.
Mathematicsupgrad.com

Bayes Theorem Explained With Example – Complete Guide

Home > Artificial Intelligence > Bayes Theorem Explained With Example – Complete Guide. Bayes’s theorem is used for the calculation of a conditional probability where intuition often fails. Although widely used in probability, the theorem is being applied in the machine learning field too. Its use in machine learning includes the fitting of a model to a training dataset and developing classification models.
Computerssas.com

Simulate correlated variables by using the Iman-Conover transformation

Simulating univariate data is relatively easy. Simulating multivariate data is much harder. The main difficulty is to generate variables that have given univariate distributions but also are correlated with each other according to a specified correlation matrix. However, Iman and Conover (1982, "A distribution-free approach to inducing rank correlation among input variables") proposed a transformation that approximately induces a specified correlation among component variables without changing the univariate distribution of the components. (The univariate distributions are called the marginal distributions.) The Iman-Conover transformation is designed to transform continuous variables.
Mathematicsarxiv.org

Fréchet derivatives of expected functionals of solutions to stochastic differential equations

In the analysis of stochastic dynamical systems described by stochastic differential equations (SDEs), it is often of interest to analyse the sensitivity of the expected value of a functional of the solution of the SDE with respect to perturbations in the SDE parameters. In this paper, we consider path functionals that depend on the solution of the SDE up to a stopping time. We derive formulas for Fréchet derivatives of the expected values of these functionals with respect to bounded perturbations of the drift, using the Cameron-Martin-Girsanov theorem for the change of measure. Using these derivatives, we construct an example to show that the map that sends the change of drift to the corresponding relative entropy is not in general convex. We then analyse the existence and uniqueness of solutions to stochastic optimal control problems defined on possibly random time intervals, as well as gradient-based numerical methods for solving such problems.
Mathematicsarxiv.org

Title:Quantum symmetries of Cayley graphs of abelian groups

Abstract: We study Cayley graphs of abelian groups from the perspective of quantum symmetries. We develop a general strategy for determining the quantum automorphism groups of such graphs. Applying this procedure, we find the quantum symmetries of the halved cube graph, the folded cube graph and the Hamming graphs. Subjects:...