ContributorsPublishersAdvertisers
Markets

Long Short-Term Memory Neural Network for Financial Time Series

By Carmina Fjellström
arxiv.org
 4 days ago

Performance forecasting is an age-old problem in economics and finance. Recently, developments in machine learning and neural networks have given rise to non-linear time series models that provide modern and promising alternatives to traditional methods of analysis. In this paper, we present an ensemble of independent and...

arxiv.org

Comments / 0

Related
arxiv.org

Representing Long-Range Context for Graph Neural Networks with Global Attention

Graph neural networks are powerful architectures for structured datasets. However, current methods struggle to represent long-range dependencies. Scaling the depth or width of GNNs is insufficient to broaden receptive fields as larger GNNs encounter optimization instabilities such as vanishing gradients and representation oversmoothing, while pooling-based approaches have yet to become as universally useful as in computer vision. In this work, we propose the use of Transformer-based self-attention to learn long-range pairwise relationships, with a novel "readout" mechanism to obtain a global graph embedding. Inspired by recent computer vision results that find position-invariant attention performant in learning long-range relationships, our method, which we call GraphTrans, applies a permutation-invariant Transformer module after a standard GNN module. This simple architecture leads to state-of-the-art results on several graph classification tasks, outperforming methods that explicitly encode graph structure. Our results suggest that purely-learning-based approaches without graph structure may be suitable for learning high-level, long-range relationships on graphs. Code for GraphTrans is available at this https URL.
COMPUTERS
arxiv.org

Dynamic Deep Convolutional Candlestick Learner

Candlestick pattern is one of the most fundamental and valuable graphical tools in financial trading that supports traders observing the current market conditions to make the proper decision. This task has a long history and, most of the time, human experts. Recently, efforts have been made to automatically classify these patterns with the deep learning models. The GAF-CNN model is a well-suited way to imitate how human traders capture the candlestick pattern by integrating spatial features visually. However, with the great potential of the GAF encoding, this classification task can be extended to a more complicated object detection level. This work presents an innovative integration of modern object detection techniques and GAF time-series encoding on candlestick pattern tasks. We make crucial modifications to the representative yet straightforward YOLO version 1 model based on our time-series encoding method and the property of such data type. Powered by the deep neural networks and the unique architectural design, the proposed model performs pretty well in candlestick classification and location recognition. The results show tremendous potential in applying modern object detection techniques on time-series tasks in a real-time manner.
MARKETS
arxiv.org

Deconfounding to Explanation Evaluation in Graph Neural Networks

Explainability of graph neural networks (GNNs) aims to answer ``Why the GNN made a certain prediction?'', which is crucial to interpret the model prediction. The feature attribution framework distributes a GNN's prediction to its input features (e.g., edges), identifying an influential subgraph as the explanation. When evaluating the explanation (i.e., subgraph importance), a standard way is to audit the model prediction based on the subgraph solely. However, we argue that a distribution shift exists between the full graph and the subgraph, causing the out-of-distribution problem. Furthermore, with an in-depth causal analysis, we find the OOD effect acts as the confounder, which brings spurious associations between the subgraph importance and model prediction, making the evaluation less reliable. In this work, we propose Deconfounded Subgraph Evaluation (DSE) which assesses the causal effect of an explanatory subgraph on the model prediction. While the distribution shift is generally intractable, we employ the front-door adjustment and introduce a surrogate variable of the subgraphs. Specifically, we devise a generative model to generate the plausible surrogates that conform to the data distribution, thus approaching the unbiased estimation of subgraph importance. Empirical results demonstrate the effectiveness of DSE in terms of explanation fidelity.
SCIENCE
arxiv.org

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

Chao-Yuan Wu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer. While today's video recognition systems parse snapshots or short clips accurately, they cannot connect the dots and reason across a longer range of time yet. Most existing video architectures can only process <5 seconds of a video without hitting the computation or memory bottlenecks.
COMPUTERS
IN THIS ARTICLE
#Long Short Term Memory#Time Series#Neural Network#Lstm#Machine Learning#Lg
arxiv.org

FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Convolutional Neural Networks (CNNs) demonstrate great performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient.
COMPUTERS
arxiv.org

Decoupling the Depth and Scope of Graph Neural Networks

Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, Ren Chen. State-of-the-art Graph Neural Networks (GNNs) have limited scalability with respect to the graph and model sizes. On large graphs, increasing the model depth often means exponential expansion of the scope (i.e., receptive field). Beyond just a few layers, two fundamental challenges emerge: 1. degraded expressivity due to oversmoothing, and 2. expensive computation due to neighborhood explosion. We propose a design principle to decouple the depth and scope of GNNs -- to generate representation of a target entity (i.e., a node or an edge), we first extract a localized subgraph as the bounded-size scope, and then apply a GNN of arbitrary depth on top of the subgraph. A properly extracted subgraph consists of a small number of critical neighbors, while excluding irrelevant ones. The GNN, no matter how deep it is, smooths the local neighborhood into informative representation rather than oversmoothing the global graph into "white noise". Theoretically, decoupling improves the GNN expressive power from the perspectives of graph signal processing (GCN), function approximation (GraphSAGE) and topological learning (GIN). Empirically, on seven graphs (with up to 110M nodes) and six backbone GNN architectures, our design achieves significant accuracy improvement with orders of magnitude reduction in computation and hardware cost.
COMPUTERS
arxiv.org

Incompleteness of graph convolutional neural networks for points clouds in three dimensions

Graph convolutional neural networks (GCNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GCNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GCNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for the molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a certain preselected cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph convolution NNs" (dGCNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML. Here we show that even for the restricted case of graphs induced by 3D atom clouds dGCNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GCNN architectures for atomistic machine learning. Models that explicitly use angular information in the description of atomic environments can resolve these degeneracies.
COMPUTERS
YOU MAY ALSO LIKE
NewsBreak
Economy
NewsBreak
Markets
arxiv.org

TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data

Efficient anomaly detection and diagnosis in multivariate time-series data is of great importance for modern industrial applications. However, building a system that is able to quickly and accurately pinpoint anomalous observations is a challenging problem. This is due to the lack of anomaly labels, high data volatility and the demands of ultra-low inference times in modern applications. Despite the recent developments of deep learning approaches for anomaly detection, only a few of them can address all of these challenges. In this paper, we propose TranAD, a deep transformer network based anomaly detection and diagnosis model which uses attention-based sequence encoders to swiftly perform inference with the knowledge of the broader temporal trends in the data. TranAD uses focus score-based self-conditioning to enable robust multi-modal feature extraction and adversarial training to gain stability. Additionally, model-agnostic meta learning (MAML) allows us to train the model using limited data. Extensive empirical studies on six publicly available datasets demonstrate that TranAD can outperform state-of-the-art baseline methods in detection and diagnosis performance with data and time-efficient training. Specifically, TranAD increases F1 scores by up to 17%, reducing training times by up to 99% compared to the baselines.
SOFTWARE
arxiv.org

Deep convolutional neural network for shape optimization using level-set approach

This article presents a reduced-order modeling methodology for shape optimization applications via deep convolutional neural networks (CNNs). The CNN provides a nonlinear mapping between the shapes and their associated attributes while conserving the equivariance of these attributes to the shape translations. To implicitly represent complex shapes via a CNN-applicable Cartesian structured grid, a level-set method is employed. The CNN-based reduced-order model (ROM) is constructed in a completely data-driven manner, and suited for non-intrusive applications. We demonstrate our complete ROM-based shape optimization on a gradient-based three-dimensional shape optimization problem to minimize the induced drag of a wing in potential flow. We show a satisfactory comparison between ROM-based optima for the aerodynamic coefficients compared to their counterparts obtained via a potential flow solver. The predicted behavior of our ROM-based global optima closely matches the theoretical predictions. We also present the learning mechanism of the deep CNN model in a physically interpretable manner. The CNN-ROM-based shape optimization algorithm exhibits significant computational efficiency compared to full order model-based online optimization applications. Thus, it promises a tractable solution for shape optimization of complex configuration and physical problems.
CODING & PROGRAMMING
towardsdatascience.com

Artificial Neural Network in Layman’s Terms

Unravelling The Complexity of Neural Network For Beginners. You just moved your eye from left to right, how did you know you had to do it. You didn’t say it out loud to your eye, but its thanks to the neurons in your brain which sends the information to your eye via electrical impulses and chemical signals. There are billions of such neurones in the body, which collectively acts as an information highway throughout the body.
COMPUTERS
arxiv.org

FNETS: Factor-adjusted network estimation and forecasting for high-dimensional time series

We propose {\tt fnets}, a methodology for network estimation and forecasting of high-dimensional time series exhibiting strong serial- and cross-sectional correlations. We operate under a factor-adjusted vector autoregressive (VAR) model where, after controlling for {\it common} factors accounting for pervasive co-movements of the variables, the remaining {\it idiosyncratic} dependence between the variables is modelled by a sparse VAR process. Network estimation of {\tt fnets} consists of three steps: (i) factor-adjustment via dynamic principal component analysis, (ii) estimation of the parameters of the latent VAR process by means of $\ell_1$-regularised Yule-Walker estimators, and (iii) estimation of partial correlation and long-run partial correlation matrices. In doing so, we learn three networks underpinning the latent VAR process, namely a directed network representing the Granger causal linkages between the variables, an undirected one embedding their contemporaneous relationships and finally, an undirected network that summarises both lead-lag and contemporaneous linkages. In addition, {\tt fnets} provides a suite of methods for separately forecasting the factor-driven and the VAR processes. Under general conditions permitting heavy tails and weak factors, we derive the consistency of {\tt fnets} in both network estimation and forecasting. Simulation studies and real data applications confirm the good performance of {\tt fnets}.
COMPUTERS
arxiv.org

Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks

Due to diverse architectures in deep neural networks (DNNs) with severe overparameterization, regularization techniques are critical for finding optimal solutions in the huge hypothesis space. In this paper, we propose an effective regularization technique, called Neighborhood Region Smoothing (NRS). NRS leverages the finding that models would benefit from converging to flat minima, and tries to regularize the neighborhood region in weight space to yield approximate outputs. Specifically, gap between outputs of models in the neighborhood region is gauged by a defined metric based on Kullback-Leibler divergence. This metric provides similar insights with the minimum description length principle on interpreting flat minima. By minimizing both this divergence and empirical loss, NRS could explicitly drive the optimizer towards converging to flat minima. We confirm the effectiveness of NRS by performing image classification tasks across a wide range of model architectures on commonly-used datasets such as CIFAR and ImageNet, where generalization ability could be universally improved. Also, we empirically show that the minima found by NRS would have relatively smaller Hessian eigenvalues compared to the conventional method, which is considered as the evidence of flat minima.
COMPUTERS
arxiv.org

Training Fair Deep Neural Networks by Balancing Influence

Most fair machine learning methods either highly rely on the sensitive information of the training samples or require a large modification on the target models, which hinders their practical application. To address this issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set (second stage) where the sample weights are computed to balance the model performance across different demographic groups (first stage). FAIRIF can be applied on a wide range of models trained by stochastic gradient descent without changing the model, while only requiring group annotations on a small validation set to compute sample weights. Theoretically, we show that, in the classification setting, three notions of disparity among different groups can be mitigated by training with the weights. Experiments on synthetic data sets demonstrate that FAIRIF yields models with better fairness-utility trade-offs against various types of bias; and on real-world data sets, we show the effectiveness and scalability of FAIRIF. Moreover, as evidenced by the experiments with pretrained models, FAIRIF is able to alleviate the unfairness issue of pretrained models without hurting their performance.
CODING & PROGRAMMING
arxiv.org

Predicting Research Trends in Artificial Intelligence with Gradient Boosting Decision Trees and Time-aware Graph Neural Networks

The Science4cast 2021 competition focuses on predicting future edges in an evolving semantic network, where each vertex represents an artificial intelligence concept, and an edge between a pair of vertices denotes that the two concepts have been investigated together in a scientific paper. In this paper, we describe our solution to this competition. We present two distinct approaches: a tree-based gradient boosting approach and a deep learning approach, and demonstrate that both approaches achieve competitive performance. Our final solution, which is based on a blend of the two approaches, achieved the 1st place among all the participating teams. The source code for this paper is available at this https URL.
COMPUTERS
arxiv.org

Edge-based Tensor prediction via graph neural networks

Message-passing neural networks (MPNN) have shown extremely high efficiency and accuracy in predicting the physical properties of molecules and crystals, and are expected to become the next-generation material simulation tool after the density functional theory (DFT). However, there is currently a lack of a general MPNN framework for directly predicting the tensor properties of the crystals. In this work, a general framework for the prediction of tensor properties was proposed: the tensor property of a crystal can be decomposed into the average of the tensor contributions of all the atoms in the crystal, and the tensor contribution of each atom can be expanded as the sum of the tensor projections in the directions of the edges connecting the atoms. On this basis, the edge-based expansions of force vectors, Born effective charges (BECs), dielectric (DL) and piezoelectric (PZ) tensors were proposed. These expansions are rotationally equivariant, while the coefficients in these tensor expansions are rotationally invariant scalars which are similar to physical quantities such as formation energy and band gap. The advantage of this tensor prediction framework is that it does not require the network itself to be equivariant. Therefore, in this work, we directly designed the edge-based tensor prediction graph neural network (ETGNN) model on the basis of the invariant graph neural network to predict tensors. The validity and high precision of this tensor prediction framework were shown by the tests of ETGNN on the extended systems, random perturbed structures and JARVIS-DFT datasets. This tensor prediction framework is general for nearly all the GNNs and can achieve higher accuracy with more advanced GNNs in the future.
COMPUTERS
arxiv.org

De Rham compatible Deep Neural Networks

We construct several classes of neural networks with ReLU and BiSU (Binary Step Unit) activations, which exactly emulate the lowest order Finite Element (FE) spaces on regular, simplicial partitions of polygonal and polyhedral domains $\Omega \subset \mathbb{R}^d$, $d=2,3$. For continuous, piecewise linear (CPwL) functions, our constructions generalize previous results in that arbitrary, regular simplicial partitions of $\Omega$ are admitted, also in arbitrary dimension $d\geq 2$.
CODING & PROGRAMMING
arxiv.org

Training Free Graph Neural Networks for Graph Matching

We present TFGM (Training Free Graph Matching), a framework to boost the performance of Graph Neural Networks (GNNs) based graph matching without training. TFGM sidesteps two crucial problems when training GNNs: 1) the limited supervision due to expensive annotation, and 2) training's computational cost. A basic framework, BasicTFGM, is first proposed by adopting the inference stage of graph matching methods. Our analysis shows that the BasicTFGM is a linear relaxation to the quadratic assignment formulation of graph matching. This guarantees the preservation of structure compatibility and an efficient polynomial complexity. Empirically, we further improve the BasicTFGM by handcrafting two types of matching priors into the architecture of GNNs: comparing node neighborhoods of different localities and utilizing annotation data if available. For evaluation, we conduct extensive experiments on a broad set of settings, including supervised keypoint matching between images, semi-supervised entity alignment between knowledge graphs, and unsupervised alignment between protein interaction networks. Applying TFGM on various GNNs shows promising improvements over baselines. Further ablation studies demonstrate the effective and efficient training-free property of TFGM. Our code is available at this https URL.
CODING & PROGRAMMING
arxiv.org

Coherence resonance and stochastic synchronization in a small-world neural network: An interplay in the presence of spike-timing-dependent plasticity

Coherence resonance (CR), stochastic synchronization (SS), and spike-timing-dependent plasticity (STDP) are ubiquitous dynamical processes in biological neural networks. Whether enhancing CR can be associated with improving SS and vice versa is a fundamental question of interest. The effects of STDP and different network connectivity on this enhancement interplay are still elusive. In this paper, we consider a small-world network of excitable Hodgkin-Huxley neurons driven by channel noise and excitatory STDP with a Hebbian time window. Numerical simulations indicate that there exist intervals of parameter values of the network topology and the STDP learning rule in which an enhanced SS (CR) would improve CR (SS). In particular, it is found that at certain intermediate values of the average degree of the network, higher values of the potentiation adjusting rate, and lower values of the depression temporal window, an enhanced SS (CR) would improve CR (SS). Our results could shed some light on the efficient coding mechanisms based on the spatiotemporal coherence of the spiking activity in neural networks.
SCIENCE
arxiv.org

Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural Networks

This paper is concerned with the asymptotic distribution of the largest eigenvalues for some nonlinear random matrix ensemble stemming from the study of neural networks. More precisely we consider $M= \frac{1}{m} YY^\top$ with $Y=f(WX)$ where $W$ and $X$ are random rectangular matrices with i.i.d. centered entries. This models the data covariance matrix or the Conjugate Kernel of a single layered random Feed-Forward Neural Network. The function $f$ is applied entrywise and can be seen as the activation function of the neural network. We show that the largest eigenvalue has the same limit (in probability) as that of some well-known linear random matrix ensembles. In particular, we relate the asymptotic limit of the largest eigenvalue for the nonlinear model to that of an information-plus-noise random matrix, establishing a possible phase transition depending on the function $f$ and the distribution of $W$ and $X$. This may be of interest for applications to machine learning.
COMPUTERS

Comments / 0

Community Policy