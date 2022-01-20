ContributorsPublishersAdvertisers
Domain Generalization via Frequency-based Feature Disentanglement and Interaction

By Jingye Wang, Ruoyi Du, Dongliang Chang, Zhanyu Ma
 4 days ago

Data out-of-distribution is a meta-challenge for all statistical learning algorithms that strongly rely on the i.i.d. assumption. It leads to unavoidable labor costs and confidence crises in realistic applications. For that, domain generalization aims at mining domain-irrelevant...

Related
A Likelihood Ratio based Domain Adaptation Method for E2E Models

End-to-end (E2E) automatic speech recognition models like Recurrent Neural Networks Transducer (RNN-T) are becoming a popular choice for streaming ASR applications like voice assistants. While E2E models are very effective at learning representation of the training data they are trained on, their accuracy on unseen domains remains a challenging problem. Additionally, these models require paired audio and text training data, are computationally expensive and are difficult to adapt towards the fast evolving nature of conversational speech. In this work, we explore a contextual biasing approach using likelihood-ratio that leverages text data sources to adapt RNN-T model to new domains and entities. We show that this method is effective in improving rare words recognition, and results in a relative improvement of 10% in 1-best word error rate (WER) and 10% in n-best Oracle WER (n=8) on multiple out-of-domain datasets without any degradation on a general dataset. We also show that complementing the contextual biasing adaptation with adaptation of a second-pass rescoring model gives additive WER improvements.
Edge-based Tensor prediction via graph neural networks

Message-passing neural networks (MPNN) have shown extremely high efficiency and accuracy in predicting the physical properties of molecules and crystals, and are expected to become the next-generation material simulation tool after the density functional theory (DFT). However, there is currently a lack of a general MPNN framework for directly predicting the tensor properties of the crystals. In this work, a general framework for the prediction of tensor properties was proposed: the tensor property of a crystal can be decomposed into the average of the tensor contributions of all the atoms in the crystal, and the tensor contribution of each atom can be expanded as the sum of the tensor projections in the directions of the edges connecting the atoms. On this basis, the edge-based expansions of force vectors, Born effective charges (BECs), dielectric (DL) and piezoelectric (PZ) tensors were proposed. These expansions are rotationally equivariant, while the coefficients in these tensor expansions are rotationally invariant scalars which are similar to physical quantities such as formation energy and band gap. The advantage of this tensor prediction framework is that it does not require the network itself to be equivariant. Therefore, in this work, we directly designed the edge-based tensor prediction graph neural network (ETGNN) model on the basis of the invariant graph neural network to predict tensors. The validity and high precision of this tensor prediction framework were shown by the tests of ETGNN on the extended systems, random perturbed structures and JARVIS-DFT datasets. This tensor prediction framework is general for nearly all the GNNs and can achieve higher accuracy with more advanced GNNs in the future.
Intra-domain and cross-domain transfer learning for time series data -- How transferable are the features?

In practice, it is very demanding and sometimes impossible to collect datasets of tagged data large enough to successfully train a machine learning model, and one possible solution to this problem is transfer learning. This study aims to assess how transferable are the features between different domains of time series data and under which conditions. The effects of transfer learning are observed in terms of predictive performance of the models and their convergence rate during training. In our experiment, we use reduced data sets of 1,500 and 9,000 data instances to mimic real world conditions. Using the same scaled-down datasets, we trained two sets of machine learning models: those that were trained with transfer learning and those that were trained from scratch. Four machine learning models were used for the experiment. Transfer of knowledge was performed within the same domain of application (seismology), as well as between mutually different domains of application (seismology, speech, medicine, finance). We observe the predictive performance of the models and the convergence rate during the training. In order to confirm the validity of the obtained results, we repeated the experiments seven times and applied statistical tests to confirm the significance of the results. The general conclusion of our study is that transfer learning is very likely to either increase or not negatively affect the predictive performance of the model or its convergence rate. The collected data is analysed in more details to determine which source and target domains are compatible for transfer of knowledge. We also analyse the effect of target dataset size and the selection of model and its hyperparameters on the effects of transfer learning.
On generalization bounds for deep networks based on loss surface implicit regularization

The classical statistical learning theory says that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. The implicit regularization induced by stochastic gradient descent (SGD) has been regarded to be important, but its specific principle is still unknown. In this work, we study how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima. Our work attempts to better connect non-convex optimization and generalization analysis with uniform convergence.
Speckle Memory Effect in the Frequency Domain and Stability in Time-Reversal Experiments

When waves propagate through a complex medium like the turbulent atmosphere the wave field becomes incoherent and the wave intensity forms a complex speckle pattern. In this paper we study a speckle memory effect in the frequency domain and some of its consequences. This effect means that certain properties of the speckle pattern produced by wave transmission through a randomly scattering medium is preserved when shifting the frequency of the illumination. The speckle memory effect is characterized via a detailed novel analysis of the fourth-order moment of the random paraxial Green's function at four different frequencies. We arrive at a precise characterization of the frequency memory effect and what governs the strength of the memory. As an application we quantify the statistical stability of time-reversal wave refocusing through a randomly scattering medium in the paraxial or beam regime. Time reversal refers to the situation when a transmitted wave field is recorded on a time-reversal mirror then time reversed and sent back into the complex medium. The reemitted wave field then refocuses at the original source point. We compute the mean of the refocused wave and identify a novel quantitative description of its variance in terms of the radius of the time-reversal mirror, the size of its elements, the source bandwidth and the statistics of the random medium fluctuations.
Bond order via cavity-mediated interactions

We numerically study the phase diagram of bosons tightly trapped in the lowest band of an optical lattice and dispersively coupled to a single-mode cavity field. The dynamics is encompassed by an extended Bose-Hubbard model. Here, the cavity-mediated interactions are described by a two-body potential term with a global range and by a correlated tunnelling term where the hopping amplitude depends on a global observable. We determine the ground state properties in one dimension by means of the density matrix renormalization group algorithm, focusing in particular on the effects due to the correlated tunnelling. The latter is responsible for the onset of bond order. We discuss the resulting phases for different geometries that correspond to different relative strengths of the correlated tunnelling coefficient. We finally analyze the scaling of entanglement entropy in the gapless bond ordered phases that appear entirely due to global interactions and determine the corresponding central charges.
Key points in the determination of the interfacial Dzyaloshinskii-Moriya interaction from asymmetric bubble domain expansion

A.Magni, G.Carlotti, A.Casiraghi, E.Darwin, G.Durin, L.Herrera Diez, B.J.Hickey, A.Huxtable, C.Y.Hwang, G.Jakob, C.Kim, M.Kläui, J.Langer, C.H.Marrows, H.T.Nembach, D.Ravelosona, G.A.Riley, J.M.Shaw, V.Sokalski, S.Tacchi, M.Kuepferling. Different models have been used to evaluate the interfacial Dzyaloshinskii-Moriya interaction (DMI) from the asymmetric bubble expansion method using magneto-optics. Here we investigate the most promising candidates...
Domain-shift adaptation via linear transformations

A predictor, $f_A : X \to Y$, learned with data from a source domain (A) might not be accurate on a target domain (B) when their distributions are different. Domain adaptation aims to reduce the negative effects of this distribution mismatch. Here, we analyze the case where $P_A(Y\ |\ X) \neq P_B(Y\ |\ X)$, $P_A(X) \neq P_B(X)$ but $P_A(Y) = P_B(Y)$; where there are affine transformations of $X$ that makes all distributions equivalent. We propose an approach to project the source and target domains into a lower-dimensional, common space, by (1) projecting the domains into the eigenvectors of the empirical covariance matrices of each domain, then (2) finding an orthogonal matrix that minimizes the maximum mean discrepancy between the projections of both domains. For arbitrary affine transformations, there is an inherent unidentifiability problem when performing unsupervised domain adaptation that can be alleviated in the semi-supervised case. We show the effectiveness of our approach in simulated data and in binary digit classification tasks, obtaining improvements up to 48% accuracy when correcting for the domain shift in the data.
Stock Movement Prediction Based on Bi-typed and Hybrid-relational Market Knowledge Graph via Dual Attention Networks

Yu Zhao, Huaming Du, Ying Liu, Shaopeng Wei, Xingyan Chen, Huali Feng, Qinghong Shuai, Qing Li, Fuzhen Zhuang, Gang Kou. Stock Movement Prediction (SMP) aims at predicting listed companies' stock future price trend, which is a challenging task due to the volatile nature of financial markets. Recent financial studies show that the momentum spillover effect plays a significant role in stock fluctuation. However, previous studies typically only learn the simple connection information among related companies, which inevitably fail to model complex relations of listed companies in the real financial market. To address this issue, we first construct a more comprehensive Market Knowledge Graph (MKG) which contains bi-typed entities including listed companies and their associated executives, and hybrid-relations including the explicit relations and implicit relations. Afterward, we propose DanSmp, a novel Dual Attention Networks to learn the momentum spillover signals based upon the constructed MKG for stock prediction. The empirical experiments on our constructed datasets against nine SOTA baselines demonstrate that the proposed DanSmp is capable of improving stock prediction with the constructed MKG.
k-parametric Dynamic Generalized Linear Models: a sequential approach via Information Geometry

Dynamic generalized linear models may be seen simultaneously as an extension to dynamic linear models and to generalized linear models, formally treating serial auto-correlation inherent to responses observed through time. The present work revisits inference methods for this class, proposing an approach based on information geometry, focusing on the $k$- parametric exponential family. Among others, the proposed method accommodates multinomial and can be adapted to accommodate compositional responses on $k=d+1$ categories, while preserving the sequential aspect of the Bayesian inferential procedure, producing real-time inference. The updating scheme benefits from the conjugate structure in the exponential family, assuring computational efficiency. Concepts such as Kullback-Leibler divergence and the projection theorem are used in the development of the method, placing it close to recent approaches on variational inference. Applications to real data are presented, demonstrating the computational efficiency of the method, favorably comparing to alternative approaches, as well as its flexibility to quickly accommodate new information when strategically needed, preserving aspects of monitoring and intervention analysis, as well as discount factors, which are usual in sequential analyzes.
FIESTA II. Disentangling stellar and instrumental variability from exoplanetary Doppler shifts in Fourier domain

The radial velocity (RV) detection of exoplanets is complicated by stellar spectroscopic variability that can mimic the presence of planets, as well as by instrumental instability. These distort the spectral line profiles and can be misinterpreted as apparent RV shifts. We present the improved FourIEr phase SpecTrum Analysis (FIESTA a.k.a....
Enhancing Low-Light Images in Real World via Cross-Image Disentanglement

Images captured in the low-light condition suffer from low visibility and various imaging artifacts, e.g., real noise. Existing supervised enlightening algorithms require a large set of pixel-aligned training image pairs, which are hard to prepare in practice. Though weakly-supervised or unsupervised methods can alleviate such challenges without using paired training images, some real-world artifacts inevitably get falsely amplified because of the lack of corresponded supervision. In this paper, instead of using perfectly aligned images for training, we creatively employ the misaligned real-world images as the guidance, which are considerably easier to collect. Specifically, we propose a Cross-Image Disentanglement Network (CIDN) to separately extract cross-image brightness and image-specific content features from low/normal-light images. Based on that, CIDN can simultaneously correct the brightness and suppress image artifacts in the feature domain, which largely increases the robustness to the pixel shifts. Furthermore, we collect a new low-light image enhancement dataset consisting of misaligned training images with real-world corruptions. Experimental results show that our model achieves state-of-the-art performances on both the newly proposed dataset and other popular low-light datasets.
ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting

Building efficient architecture in neural speech processing is paramount to success in keyword spotting deployment. However, it is very challenging for lightweight models to achieve noise robustness with concise neural operations. In a real-world application, the user environment is typically noisy and may also contain reverberations. We proposed a novel feature interactive convolutional model with merely 100K parameters to tackle this under the noisy far-field condition. The interactive unit is proposed in place of the attention module that promotes the flow of information with more efficient computations. Moreover, curriculum-based multi-condition training is adopted to attain better noise robustness. Our model achieves 98.2% top-1 accuracy on Google Speech Command V2-12 and is competitive against large transformer models under the designed noise condition.
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction and Clustering

Deep neural networks (DNNs) have demonstrated superior performance over classical machine learning to support many features in safety-critical systems. Although DNNs are now widely used in such systems (e.g., self driving cars), there is limited progress regarding automated support for functional safety analysis in DNN-based systems. For example, the identification of root causes of errors, to enable both risk analysis and DNN retraining, remains an open problem. In this paper, we propose SAFE, a black-box approach to automatically characterize the root causes of DNN errors. SAFE relies on a transfer learning model pre-trained on ImageNet to extract the features from error-inducing images. It then applies a density-based clustering algorithm to detect arbitrary shaped clusters of images modeling plausible causes of error. Last, clusters are used to effectively retrain and improve the DNN. The black-box nature of SAFE is motivated by our objective not to require changes or even access to the DNN internals to facilitate adoption.
S$^2$FPR: Crowd Counting via Self-Supervised Coarse to Fine Feature Pyramid Ranking

Most conventional crowd counting methods utilize a fully-supervised learning framework to learn a mapping between scene images and crowd density maps. Under the circumstances of such fully-supervised training settings, a large quantity of expensive and time-consuming pixel-level annotations are required to generate density maps as the supervision. One way to reduce costly labeling is to exploit self-structural information and inner-relations among unlabeled images. Unlike the previous methods utilizing these relations and structural information from the original image level, we explore such self-relations from the latent feature spaces because it can extract more abundant relations and structural information. Specifically, we propose S$^2$FPR which can extract structural information and learn partial orders of coarse-to-fine pyramid features in the latent space for better crowd counting with massive unlabeled images. In addition, we collect a new unlabeled crowd counting dataset (FUDAN-UCC) with 4,000 images in total for training. One by-product is that our proposed S$^2$FPR method can leverage numerous partial orders in the latent space among unlabeled images to strengthen the model representation capability and reduce the estimation errors for the crowd counting task. Extensive experiments on four benchmark datasets, i.e. the UCF-QNRF, the ShanghaiTech PartA and PartB, and the UCF-CC-50, show the effectiveness of our method compared with previous semi-supervised methods. The source code and dataset are available at this https URL.
Time-domain deep learning filtering of structured atmospheric noise for ground-based millimeter astronomy

The complex physics involved in atmospheric turbulence makes it very difficult for ground-based astronomy to build accurate scintillation models and develop efficient methodologies to remove this highly structured noise from valuable astronomical observations. We argue that a Deep Learning approach can bring a significant advance to treat this problem because of deep neural networks' inherent ability to abstract non-linear patterns over a broad scale range. We propose an architecture composed of long-short term memory cells and an incremental training strategy inspired by transfer and curriculum learning. We develop a scintillation model and employ an empirical method to generate a vast catalog of atmospheric noise realizations and train the network with representative data. We face two complexity axes: the signal-to-noise ratio (SNR) and the degree of structure in the noise. Hence, we train our recurrent network to recognize simulated astrophysical point-like sources embedded in three structured noise levels, with a raw-data SNR ranging from 3 to 0.1. We find that a slow and repetitive increase in complexity is crucial during training to obtain a robust and stable learning rate that can transfer information through different data contexts. We probe our recurrent model with synthetic observational data, designing alongside a calibration methodology for flux measurements. Furthermore, we implement a traditional matched filtering (MF) to compare its performance with our neural network, finding that our final trained network can successfully clean structured noise and significantly enhance the SNR compared to raw data and in a more robust way than traditional MF.
Model-Based Image Signal Processors via Learnable Dictionaries

Digital cameras transform sensor RAW readings into RGB images by means of their Image Signal Processor (ISP). Computational photography tasks such as image denoising and colour constancy are commonly performed in the RAW domain, in part due to the inherent hardware design, but also due to the appealing simplicity of noise statistics that result from the direct sensor readings. Despite this, the availability of RAW images is limited in comparison with the abundance and diversity of available RGB data. Recent approaches have attempted to bridge this gap by estimating the RGB to RAW mapping: handcrafted model-based methods that are interpretable and controllable usually require manual parameter fine-tuning, while end-to-end learnable neural networks require large amounts of training data, at times with complex training procedures, and generally lack interpretability and parametric control. Towards addressing these existing limitations, we present a novel hybrid model-based and data-driven ISP that builds on canonical ISP operations and is both learnable and interpretable. Our proposed invertible model, capable of bidirectional mapping between RAW and RGB domains, employs end-to-end learning of rich parameter representations, i.e. dictionaries, that are free from direct parametric supervision and additionally enable simple and plausible data augmentation. We evidence the value of our data generation process by extensive experiments under both RAW image reconstruction and RAW image denoising tasks, obtaining state-of-the-art performance in both. Additionally, we show that our ISP can learn meaningful mappings from few data samples, and that denoising models trained with our dictionary-based data augmentation are competitive despite having only few or zero ground-truth labels.
Skyrmions-based logic gates in one single nanotrack completely reconstructed via chirality barrie

Logic gates based on magnetic elements are promising candidates for the logic-in-memory applications with nonvolatile data retention, near-zero leakage and scalability. In such spin-based logic device, however, the multi-strip structure and fewer functions are obstacles to improving integration and reducing energy consumption. Here we propose a skyrmions-based single-nanotrack logic family including AND, OR, NOT, NAND, NOR, XOR, and XNOR which can be implemented and reconstructed by building and switching Dzyaloshinskii-Moriya interaction (DMI) chirality barrier on a racetrack memory. Besides the pinning effect of DMI chirality barrier on skyrmions, the annihilation, fusion and shunting of two skyrmions with opposite chirality are also achieved and demonstrated via local reversal of DMI, which are necessary for the design of engineer programmable logic nanotrack, transistor and complementary racetrack memory.
Kohler-Jobin meets Ehrhard: the sharp lower bound for the Gaussian principal frequency while the Gaussian torsional rigidity is fixed, via rearrangements

In this note, we provide an adaptation of the Kohler-Jobin rearrangement technique to the setting of the Gauss space. As a result, we prove the Gaussian analogue of the Kohler-Jobin's resolution of a conjecture of Pólya-Szegö: when the Gaussian torsional rigidity of a (convex) domain is fixed, the Gaussian principal frequency is minimized for the half-space. At the core of this rearrangement technique is the idea of considering a "modified" torsional rigidity, with respect to a given function, and rearranging its layers to half-spaces, in a particular way; the Rayleigh quotient decreases with this procedure.
