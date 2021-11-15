ContributorsPublishersAdvertisers
Computers

A Probabilistic Hard Attention Model For Sequentially Observed Scenes

By Samrudhdhi B. Rangrej, James J. Clark
arxiv.org
 5 days ago

A visual hard attention model actively selects and observes a sequence of subregions in an image to make a prediction. The majority of hard attention models determine the attention-worthy regions by first analyzing a complete image. However, it may be the case that the entire image is not available initially but...

arxiv.org

Interesting Engineering

NASA Scientists Think 'Singing Trees' Can Bring Us Closer to Another World

A project led by a group of NASA scientists that brings art and science together called The Tree of Life wants to connect the Earth and space through a song that will last two centuries. And this unusual duet will be transmitted through radio waves between a spacecraft in low Earth orbit and a collection of trees that have been rigged to function as a living antenna system.
AEROSPACE & DEFENSE
arxiv.org

Configuration entropy and confinement-deconfinement transition in higher-dimensional hard wall model

We consider a higher-dimensional hard wall model with an infrared (IR) cut-off in asymptotically AdS space and investigate its thermodynamics via the holographic renormalization method. We find a relation between the confinement temperature and the IR cut-off for any dimension. It is also shown that the entropy of $p$-branes with the number of coincident branes (the number of the gauge group) $N$ jumps from leading order in $\cal O$($N^0$) at the confining low temperature phase to $\cal O$($N^{\frac{p+1}{2}}$) at the deconfining high temperature phase like $D3$-branes ($p=3$) case. On the other hand, we calculate the configuration entropy (CE) of various magnitudes of an inverse temperature at an given IR cut-off scale. It is shown that as the inverse temperature grows up, the CE above the critical temperature decreases and AdS black hole (BH) is stable while it below the critical temperature is constant and thermal AdS (ThAdS) is stable. In particular, we also find that the CE below the critical temperature becomes constant and its magnitude increases as a dimension of AdS space increases.
SCIENCE
arxiv.org

On the Geometric Potential and the Relationship between the Exact Electron Factorization and Density Functional Theory

There are different ways to obtain an exact one-electron theory for a many-electron system, and the exact electron factorization (EEF) is one of them. In the EEF, the Schrödinger equation for one electron in the environment of other electrons is constructed. The environment provides the potentials that appear in this equation: A scalar potential $v^{\rm H}$ representing the energy of the environment and another scalar potential $v^{\rm G}$ as well as a vector potential that have geometric meaning. By replacing the interacting many-electron system with the non-interacting Kohn-Sham (KS) system, we show how the EEF is related to density functional theory (DFT) and we interpret the Hartree-exchange-correlation potential as well as the Pauli potential in terms of the EEF. In particular, we show that from the EEF viewpoint, the Pauli potential does not represent the difference between a fermionic and a bosonic non-interacting system, but that it corresponds to $v^{\rm G}$ and partly to $v^{\rm H}$ for the (fermionic) KS system. We then study the meaning of $v^{\rm G}$ in detail: Its geometric origin as a metric measuring the change of the environment is presented. Additionally, its behavior for a simple model of a homo- and heteronucler diatomic is investigated and interpreted with the help of a two-state model. In this way, we provide a physical interpretation for the one-electron potentials that appear in the EEF and in DFT.
MATHEMATICS
arxiv.org

3D modelling of survey scene from images enhanced with a multi-exposure fusion

In current practice, scene survey is carried out by workers using total stations. The method has high accuracy, but it incurs high costs if continuous monitoring is needed. Techniques based on photogrammetry, with the relatively cheaper digital cameras, have gained wide applications in many fields. Besides point measurement, photogrammetry can also create a three-dimensional (3D) model of the scene. Accurate 3D model reconstruction depends on high quality images. Degraded images will result in large errors in the reconstructed 3D model. In this paper, we propose a method that can be used to improve the visibility of the images, and eventually reduce the errors of the 3D scene model. The idea is inspired by image dehazing. Each original image is first transformed into multiple exposure images by means of gamma-correction operations and adaptive histogram equalization. The transformed images are analyzed by the computation of the local binary patterns. The image is then enhanced, with each pixel generated from the set of transformed image pixels weighted by a function of the local pattern feature and image saturation. Performance evaluation has been performed on benchmark image dehazing datasets. Experimentations have been carried out on outdoor and indoor surveys. Our analysis finds that the method works on different types of degradation that exist in both outdoor and indoor images. When fed into the photogrammetry software, the enhanced images can reconstruct 3D scene models with sub-millimeter mean errors.
SOFTWARE
IN THIS ARTICLE
#Observability#Eig
arxiv.org

Cost-effective temperature estimation strategies for thermal states with probabilistic quantum metrology

In probabilistic quantum metrology, one aims at finding weak measurements that concentrate the Fisher Information on the resulting quantum states, post-selected according to the weak outcomes. Though the Quantum Cramér-Rao bound itself cannot be overshot this way, it could be possible to improve the information-cost ratio, or even the total Fisher Information. We propose a post-selection protocol achieving this goal based on single-photon subtraction onto a thermal state of radiation yielding a greater information-cost ratio for the temperature parameter with respect to the standard strategy required to achieve the Quantum Cramér-Rao bound. We address just fully-classical states of radiation: this contrasts with (but does not contradict) a recent result proving that, concerning unitary quantum estimation problems, post-selection strategies can outperform direct measurement protocols only if a particular quasiprobability associated with the family of parameter-dependent quantum states becomes negative, a clear signature of nonclassicality.
SCIENCE
arxiv.org

Analysis cosmological tachyon and fermion model and observation data constraints

In this work we investigate a cosmological model with the tachyon and fermion fields with barotropic equation of state, where pressure $p$, energy density $\rho$ and barotropic index $\gamma$ are related by the relation $p=(\gamma-1)\rho$. We applied the tachyonozation method which allows to consider cosmological model with the fermion and the tachyon fields, driven by special potential. In this paper, tachyonization model was defined from the stability analysis and exact solution standard of the tachyon field. Analysis of the solution via statefinder parameters illustrated that our model in fiducial points with deceleration parameter $q = 0.5$ and statefinder $r = 1$ which corresponds to the matter dominated universe (SCDM) but, ends its evolution at a point in the future $(q =-1, \ r = 1)$ which corresponds to the de-Sitter expansion. Comparison of the model parameters with the cosmological observation data demonstrate, that our proposed cosmological model is stable at barotropic index $\gamma_0=0.00744$.
PHYSICS
arxiv.org

VLT/MUSE observations of SDSS J1029+2623: towards a high-precision strong lensing model

We present a strong lensing analysis of the galaxy cluster SDSS J1029+2623 at $z=0.588$, one of the few currently known lens clusters with multiple images of a background ($z=2.1992$) quasar with a measured time delay. We use archival Hubble Space Telescope multi-band imaging and new Multi Unit Spectroscopic Explorer follow-up spectroscopy to build an accurate lens mass model, a crucial step towards future cosmological applications. The spectroscopic data enable the secure identification of 57 cluster members and of two nearby perturbers along the line-of-sight. We estimate the inner kinematics of a sub-set of 20 cluster galaxies to calibrate the scaling relations parametrizing the sub-halo mass component. We also reliably determine the redshift of 4 multiply imaged sources, provide a tentative measurement for one system, and report the discovery of a new four-image system. The final catalog comprises 26 multiple images from 7 background sources, spanning a wide redshift range, from 1.02 to 5.06. We present two parametric lens models, with slightly different cluster mass parametrizations. The observed positions of the multiple images are accurately reproduced within approximately $0''.2$, the three image positions of the quasar within only $\sim0''.1$. We estimate a cluster projected total mass of $M(<300~ {\rm kpc}) \sim 2.1 \times 10^{14}~ M_{\odot}$, with a statistical uncertainty of a few percent. Both models, that include a small galaxy close to one of the quasar images, predict magnitude differences and time delays between the quasar images that are consistent with the observations.
ASTRONOMY
arxiv.org

Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs

We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large-scale training of GNNs has recently been dominated by sampling-based methods and methods based on non-learnable message passing. SAR on the other hand is a distributed technique that can train any GNN type directly on an entire large graph. The key innovation in SAR is the distributed sequential rematerialization scheme which sequentially re-constructs then frees pieces of the prohibitively large GNN computational graph during the backward pass. This results in excellent memory scaling behavior where the memory consumption per worker goes down linearly with the number of workers, even for densely connected graphs. Using SAR, we report the largest applications of full-batch GNN training to-date, and demonstrate large memory savings as the number of workers increases. We also present a general technique based on kernel fusion and attention-matrix rematerialization to optimize both the runtime and memory efficiency of attention-based models. We show that, coupled with SAR, our optimized attention kernels lead to significant speedups and memory savings in attention-based GNNs.
CODING & PROGRAMMING
VentureBeat

ML observability platform WhyLabs raises $10M to monitor models and data in production

WhyLabs, a startup building what it calls “an interface between humans and AI applications,” last week announced that it raised $10 million in a series A funding round co-led by prolific data scientist Andrew Ng’s fund and Defy Partners, with participation from Madrona Venture Group and Bezos Expeditions. The company says that the capital will be used to further develop its platform as WhyLabs looks to grow both its workforce and customer base.
COMPUTERS
arxiv.org

The global distribution of natural tritium in precipitation simulated with an Atmospheric General Circulation Model and comparison with observations

The description of the hydrological cycle in Atmospheric General Circulation Models (GCMs) can be validated using water isotopes as tracers. Many GCMs now simulate the movement of the stable isotopes of water, but here we present the first GCM simulations modelling the content of natural tritium in water. These simulations were obtained using a version of the LMDZ General Circulation Model enhanced by water isotopes diagnostics, LMDZ-iso. To avoid tritium generated by nuclear bomb testing, the simulations have been evaluated against a compilation of published tritium datasets dating from before 1950, or measured recently. LMDZ-iso correctly captures the observed tritium enrichment in precipitation as oceanic air moves inland (the so-called continental effect) and the observed north-south variations due to the latitudinal dependency of the cosmogenic tritium production rate. The seasonal variability, linked to the stratospheric intrusions of air masses with higher tritium content into the troposphere, is correctly reproduced for Antarctica with a maximum in winter. LMDZ-iso reproduces the spring maximum of tritium over Europe, but underestimates it and produces a peak in winter that is not apparent in the data. This implementation of tritium in a GCM promises to provide a better constraint on: (1) the intrusions and transport of air masses from the stratosphere and (2) the dynamics of the modelled water cycle. The method complements the existing approach of using stable water isotopes.
SCIENCE
arxiv.org

Where to Look: A Unified Attention Model for Visual Recognition with Reinforcement Learning

The idea of using the recurrent neural network for visual attention has gained popularity in computer vision community. Although the recurrent attention model (RAM) leverages the glimpses with more large patch size to increasing its scope, it may result in high variance and instability. For example, we need the Gaussian policy with high variance to explore object of interests in a large image, which may cause randomized search and unstable learning. In this paper, we propose to unify the top-down and bottom-up attention together for recurrent visual attention. Our model exploits the image pyramids and Q-learning to select regions of interests in the top-down attention mechanism, which in turn to guide the policy search in the bottom-up approach. In addition, we add another two constraints over the bottom-up recurrent neural networks for better exploration. We train our model in an end-to-end reinforcement learning framework, and evaluate our method on visual classification tasks. The experimental results outperform convolutional neural networks (CNNs) baseline and the bottom-up recurrent attention models on visual classification tasks.
SOFTWARE
mixonline.com

Soundscape Simulated: New En-Scene Simulation Tool From d&b Marks Radical Development in the Way Spatialized Sound is Modelled

German audio technology and solutions company, d&b audiotechnik today announced the latest addition to its powerful software toolkit – Soundscape Simulation, SPL and localization mapping within an object based workflow. As part of d&b’s ArrayCalc simulation software, this new intuitive visualization tool, accurately models a Soundscape system’s real and perceived...
SOFTWARE
towardsdatascience.com

Attending to Attention

I have been a Machine Learning Engineer for almost 4 years now, I started with what is now called the “Classical Models”, Logistic, Tree-based, Baysian, etc, and since last year has moved into Neural Networks and Deep Learning. I would say I did pretty well, that was until my attention lay on “Attention” (pun intended). I tried reading through tutorials, lectures, guides but nothing ever fully helped me grasp the core idea.
CODING & PROGRAMMING
arxiv.org

Probabilistic hypergraph containers

Given a $k$-uniform hypergraph $\mathcal{H}$ and sufficiently large $m \gg m_0(\mathcal{H})$, we show that an $m$-element set $I \subseteq V(\mathcal{H})$, chosen uniformly at random, with probability $1 - e^{-\omega(m)}$ is either not independent or belongs to an almost-independent set in $\mathcal{H}$ which, crucially, can be constructed from carefully chosen $o(m)$ vertices of $I$. With very little effort, this implies that if the largest almost-independent set in $\mathcal{H}$ is of size $o(v(\mathcal{H}))$ then $I$ itself is an independent set with probability $e^{-\omega(m)}$. More generally, $I$ is very likely to inherit structural properties of almost-independent sets in $\mathcal{H}$.
MATHEMATICS
arxiv.org

TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

Sequential recommendation aims to choose the most suitable items for a user at a specific timestamp given historical behaviors. Existing methods usually model the user behavior sequence based on the transition-based methods like Markov Chain. However, these methods also implicitly assume that the users are independent of each other without considering the influence between users. In fact, this influence plays an important role in sequence recommendation since the behavior of a user is easily affected by others. Therefore, it is desirable to aggregate both user behaviors and the influence between users, which are evolved temporally and involved in the heterogeneous graph of users and items. In this paper, we incorporate dynamic user-item heterogeneous graphs to propose a novel sequential recommendation framework. As a result, the historical behaviors as well as the influence between users can be taken into consideration. To achieve this, we firstly formalize sequential recommendation as a problem to estimate conditional probability given temporal dynamic heterogeneous graphs and user behavior sequences. After that, we exploit the conditional random field to aggregate the heterogeneous graphs and user behaviors for probability estimation, and employ the pseudo-likelihood approach to derive a tractable objective function. Finally, we provide scalable and flexible implementations of the proposed framework. Experimental results on three real-world datasets not only demonstrate the effectiveness of our proposed method but also provide some insightful discoveries on sequential recommendation.
COMPUTERS
arxiv.org

ClipCap: CLIP Prefix for Image Captioning

Image captioning is a fundamental task in vision-language understanding, where the model predicts a textual informative caption to a given input image. In this paper, we present a simple approach to address this task. We use CLIP encoding as a prefix to the caption, by employing a simple mapping network, and then fine-tunes a language model to generate the image captions. The recently proposed CLIP model contains rich semantic features which were trained with textual context, making it best for vision-language perception. Our key idea is that together with a pre-trained language model (GPT2), we obtain a wide understanding of both visual and textual data. Hence, our approach only requires rather quick training to produce a competent captioning model. Without additional annotations or pre-training, it efficiently generates meaningful captions for large-scale and diverse datasets. Surprisingly, our method works well even when only the mapping network is trained, while both CLIP and the language model remain frozen, allowing a lighter architecture with less trainable parameters. Through quantitative evaluation, we demonstrate our model achieves comparable results to state-of-the-art methods on the challenging Conceptual Captions and nocaps datasets, while it is simpler, faster, and lighter. Our code is available in this https URL.
SOFTWARE
arxiv.org

Restormer: Efficient Transformer for High-Resolution Image Restoration

Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shortcomings of CNNs (i.e., limited receptive field and inadaptability to input content), its computational complexity grows quadratically with the spatial resolution, therefore making it infeasible to apply to most image restoration tasks involving high-resolution images. In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks, including image deraining, single-image motion deblurring, defocus deblurring (single-image and dual-pixel data), and image denoising (Gaussian grayscale/color denoising, and real image denoising). The source code and pre-trained models are available at this https URL.
ELECTRONICS
arxiv.org

Fusion research using Azure A100 HPC instances

Fusion simulations have in the past required the use of leadership scale HPC resources to produce advances in physics. One such package is CGYRO, a premier multi-scale plasma turbulence simulation code. CGYRO is a typical HPC application that would not fit into a single node, as it requires O(100 GB) of memory and O(100 TFLOPS) worth of compute for relevant simulations. When distributed across multiple nodes, CGYRO requires high-throughput and low-latency networking to effectively use the compute resources. While in the past such compute may have required hundreds, or even thousands of nodes, recent advances in hardware capabilities allow for just a couple of nodes to deliver the necessary compute power. This paper presents our experience running CGYRO on NVIDIA A100 GPUs on InfiniBand-connected HPC resources in the Microsoft Azure Cloud. A comparison to older generation CPU and GPU Azure resources as well as on-prem resources is also provided.
COMPUTERS
arxiv.org

Swin Transformer V2: Scaling Up Capacity and Resolution

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo. We present techniques for scaling Swin Transformer up to 3 billion parameters and making it capable of training with images of up to 1,536$\times$1,536 resolution. By scaling up capacity and resolution, Swin Transformer sets new records on four representative vision benchmarks: 84.0% top-1 accuracy on ImageNet-V2 image classification, 63.1/54.4 box/mask mAP on COCO object detection, 59.9 mIoU on ADE20K semantic segmentation, and 86.8% top-1 accuracy on Kinetics-400 video action classification. Our techniques are generally applicable for scaling up vision models, which has not been widely explored as that of NLP language models, partly due to the following difficulties in training and applications: 1) vision models often face instability issues at scale and 2) many downstream vision tasks require high resolution images or windows and it is not clear how to effectively transfer models pre-trained at low resolutions to higher resolution ones. The GPU memory consumption is also a problem when the image resolution is high. To address these issues, we present several techniques, which are illustrated by using Swin Transformer as a case study: 1) a post normalization technique and a scaled cosine attention approach to improve the stability of large vision models; 2) a log-spaced continuous position bias technique to effectively transfer models pre-trained at low-resolution images and windows to their higher-resolution counterparts. In addition, we share our crucial implementation details that lead to significant savings of GPU memory consumption and thus make it feasible to train large vision models with regular GPUs. Using these techniques and self-supervised pre-training, we successfully train a strong 3B Swin Transformer model and effectively transfer it to various vision tasks involving high-resolution images or windows, achieving the state-of-the-art accuracy on a variety of benchmarks.
COMPUTERS
arxiv.org

Probabilistic predictions of SIS epidemics on networks based on population-level observations

We predict the future course of ongoing susceptible-infected-susceptible (SIS) epidemics on regular, Erdős-Rényi and Barabási-Albert networks. It is known that the contact network influences the spread of an epidemic within a population. Therefore, observations of an epidemic, in this case at the population-level, contain information about the underlying network. This information, in turn, is useful for predicting the future course of an ongoing epidemic. To exploit this in a prediction framework, the exact high-dimensional stochastic model of an SIS epidemic on a network is approximated by a lower-dimensional surrogate model. The surrogate model is based on a birth-and-death process; the effect of the underlying network is described by a parametric model for the birth rates. We demonstrate empirically that the surrogate model captures the intrinsic stochasticity of the epidemic once it reaches a point from which it will not die out. Bayesian parameter inference allows for uncertainty about the model parameters and the class of the underlying network to be incorporated directly into probabilistic predictions. An evaluation of a number of scenarios shows that in most cases the resulting prediction intervals adequately quantify the prediction uncertainty. As long as the population-level data is available over a long-enough period, even if not sampled frequently, the model leads to excellent predictions where the underlying network is correctly identified and prediction uncertainty mainly reflects the intrinsic stochasticity of the spreading epidemic. For predictions inferred from shorter observational periods, uncertainty about parameters and network class dominate prediction uncertainty. The proposed method relies on minimal data and is numerically efficient, which makes it attractive either as a standalone inference and prediction scheme or in conjunction with other methods.
SCIENCE

