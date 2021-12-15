ContributorsPublishersAdvertisers
N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores

By Yu Gong, Zhihan Xu, Zhezhi He, Weifeng Zhang, Xiaobing Tu, Xiaoyao Liang, Li Jiang
 4 days ago

Accelerating the neural network inference by FPGA has emerged as a popular option, since the reconfigurability and high performance computing capability of FPGA intrinsically satisfies the computation demand of the fast-evolving neural algorithms. However, the popular neural accelerators on FPGA (e.g., Xilinx DPU) mainly utilize the DSP resources for constructing their...

CCasGNN: Collaborative Cascade Prediction Based on Graph Neural Networks

Cascade prediction aims at modeling information diffusion in the network. Most previous methods concentrate on mining either structural or sequential features from the network and the propagation path. Recent efforts devoted to combining network structure and sequence features by graph neural networks and recurrent neural networks. Nevertheless, the limitation of spectral or spatial methods restricts the improvement of prediction performance. Moreover, recurrent neural networks are time-consuming and computation-expensive, which causes the inefficiency of prediction. Here, we propose a novel method CCasGNN considering the individual profile, structural features, and sequence information. The method benefits from using a collaborative framework of GAT and GCN and stacking positional encoding into the layers of graph neural networks, which is different from all existing ones and demonstrates good performance. The experiments conducted on two real-world datasets confirm that our method significantly improves the prediction accuracy compared to state-of-the-art approaches. What's more, the ablation study investigates the contribution of each component in our method.
Arm-based industrial computers have real-time and application cores

SolidRun has announced system-on-modules built around Texas Instrument’s AM64x processor family with precise real-time processing and application processing for industrial IoT and industrial machinery applications. For application processing and running operating systems there is a dual core Arm Cortex-A53, then up to four Cortex-R5F cores for real-time computing, servo control...
Simulating the Mott transition on a noisy digital quantum computer via Cartan-based fast-forwarding circuits

Dynamical mean-field theory (DMFT) maps the local Green's function of the Hubbard model to that of the Anderson impurity model and thus gives an approximate solution of the Hubbard model by solving the simpler quantum impurity model. Quantum and hybrid quantum-classical algorithms have been proposed to efficiently solve impurity models by preparing and evolving the ground state under the impurity Hamiltonian on a quantum computer instead of using intractable classical algorithms. We propose a highly optimized fast-forwarding quantum circuit to significantly improve quantum algorithms for the minimal DMFT problem preserving the Mott phase transition. Our Cartan decomposition based algorithm uses a fixed depth quantum circuit to eliminate time-discretization errors and evolve the initial state over arbitrary times. Exploiting the structure of the fast-forwarding circuits, we sufficiently reduce the gate cost to simulate the dynamics of, and extract frequencies from, the Anderson impurity model on noisy quantum hardware and demonstrate the Mott transition by mapping the phase-diagram of the corresponding impurity problem. Especially near the Mott phase transition when the quasiparticle resonance frequency converges to zero and evolving the system over long-time scales is necessary, our method maintains accuracy where Trotter error would otherwise dominate. This work presents the first computation of the Mott phase transition using noisy digital quantum hardware, made viable by a highly optimized computation in terms of gate depth, simulation error, and run-time on quantum hardware. The combination of algebraic circuit decompositions and model specific error mitigation techniques used may have applications extending beyond our use case to solving correlated electronic phenomena on noisy quantum computers.
Adaptive Kernel Graph Neural Network

Graph neural networks (GNNs) have demonstrated great success in representation learning for graph-structured data. The layer-wise graph convolution in GNNs is shown to be powerful at capturing graph topology. During this process, GNNs are usually guided by pre-defined kernels such as Laplacian matrix, adjacency matrix, or their variants. However, the adoptions of pre-defined kernels may restrain the generalities to different graphs: mismatch between graph and kernel would entail sub-optimal performance. For example, GNNs that focus on low-frequency information may not achieve satisfactory performance when high-frequency information is significant for the graphs, and vice versa. To solve this problem, in this paper, we propose a novel framework - i.e., namely Adaptive Kernel Graph Neural Network (AKGNN) - which learns to adapt to the optimal graph kernel in a unified manner at the first attempt. In the proposed AKGNN, we first design a data-driven graph kernel learning mechanism, which adaptively modulates the balance between all-pass and low-pass filters by modifying the maximal eigenvalue of the graph Laplacian. Through this process, AKGNN learns the optimal threshold between high and low frequency signals to relieve the generality problem. Later, we further reduce the number of parameters by a parameterization trick and enhance the expressive power by a global readout function. Extensive experiments are conducted on acknowledged benchmark datasets and promising results demonstrate the outstanding performance of our proposed AKGNN by comparison with state-of-the-art GNNs. The source code is publicly available at: this https URL.
Imagination Launches Catapult Family of RISC-V CPU Cores: Breaking Into Heterogeneous SoCs

December is here, and with it comes several technical summits ahead of the holiday break. The most notable of which this week is the annual RISC-V summit, which is being put on by the Linux Foundation and sees the numerous (and ever increasing) parties involved in the open source ISA gather to talk about the latest products and advancements in the RISC-V ecosystem. The summit always tends to feature some new product announcements, and this year is no different, as Imagination Technologies is at the show to provide details on their first RISC-V CPU cores, along with announcing their intentions to develop a full suite of CPU cores over the next few years.
Tutorial on communication between access networks and the 5G core

Lucas Baleeiro Dominato, Henrique Carvalho de Resende, Cristiano Bonato Both, Johann M. Marquez-Barja, Bruno O. Silvestre, Kleber V. Cardoso. Fifth-generation (5G) networks enable a variety of use cases, e.g., Ultra-Reliable and Low-Latency Communications, enhanced Mobile Broadband, and massive Machine Type Communication. To explore the full potential of these use cases, it is mandatory to understand the communication between User Equipment (UE), Radio Access Network (RAN), and 5G Core (5GC), which support new network concepts and paradigms. For example, network slicing plays a crucial role in the communication system to address the challenges expected by the 5G networks. 3rd Generation Partnership Project has recently published Release 16, including the protocols used to communicate between RANs and 5GC, i.e., Non-Access Stratum (NAS) and NG Application Protocol (NGAP). The main goal of this article is to present a comprehensive tutorial about NAS and NGAP specifications using a didactic and practical approach. The tutorial describes the protocol stacks and aspects of the functionality of these protocols in 5G networks, such as authentication and identification procedures, data session establishment, and allocation of resources. Moreover, we review message flows related to these protocols in UE and Next Generation Node B (gNodeB) registration. To illustrate the concepts presented in the tutorial, we introduce a 5GC tester that implements NAS and NGAP for availing of three open-source 5GC projects on a black-box testing methodology.
Graph Neural Networks Accelerated Molecular Dynamics

Molecular Dynamics (MD) simulation is a powerful tool for understanding the dynamics and structure of matter. Since the resolution of MD is atomic-scale, achieving long time-scale simulations with femtosecond integration is very expensive. In each MD step, numerous redundant computations are performed which can be learnt and avoided. These redundant computations can be surrogated and modeled by a deep learning model like a Graph Neural Network (GNN). In this work, we developed a GNN Accelerated Molecular Dynamics (GAMD) model that achieves fast and accurate force predictions and generates trajectories consistent with the classical MD simulations. Our results show that GAMD can accurately predict the dynamics of two typical molecular systems, Lennard-Jones (LJ) particles and Water (LJ+Electrostatics). GAMD's learning and inference are agnostic to the scale, where it can scale to much larger systems at test time. We also performed a comprehensive benchmark test comparing our implementation of GAMD to production-level MD softwares, where we showed GAMD is competitive with them on the large-scale simulation.
Transferability Properties of Graph Neural Networks

Graph neural networks (GNNs) are deep convolutional architectures consisting of layers composed by graph convolutions and pointwise nonlinearities. Due to their invariance and stability properties, GNNs are provably successful at learning representations from network data. However, training them requires matrix computations which can be expensive for large graphs. To address this limitation, we investigate the ability of GNNs to be transferred across graphs. We consider graphons, which are both graph limits and generative models for weighted and stochastic graphs, to define limit objects of graph convolutions and GNNs -- graphon convolutions and graphon neural networks (WNNs) -- which we use as generative models for graph convolutions and GNNs. We show that these graphon filters and WNNs can be approximated by graph filters and GNNs sampled from them on weighted and stochastic graphs. Using these results, we then derive error bounds for transferring graph filters and GNNs across such graphs. These bounds show that transferability increases with the graph size, and reveal a tradeoff between transferability and spectral discriminability which in GNNs is alleviated by the pointwise nonlinearities. These findings are further verified empirically in numerical experiments in movie recommendation and decentralized robot control.
Superpixel-Based Building Damage Detection from Post-earthquake Very High Resolution Imagery Using Deep Neural Networks

Building damage detection after natural disasters like earthquakes is crucial for initiating effective emergency response actions. Remotely sensed very high spatial resolution (VHR) imagery can provide vital information due to their ability to map the affected buildings with high geometric precision. Many approaches have been developed to detect damaged buildings due to earthquakes. However, little attention has been paid to exploiting rich features represented in VHR images using Deep Neural Networks (DNN). This paper presents a novel super-pixel based approach combining DNN and a modified segmentation method, to detect damaged buildings from VHR imagery. Firstly, a modified Fast Scanning and Adaptive Merging method is extended to create initial over-segmentation. Secondly, the segments are merged based on the Region Adjacent Graph (RAG), considered an improved semantic similarity criterion composed of Local Binary Patterns (LBP) texture, spectral, and shape features. Thirdly, a pre-trained DNN using Stacked Denoising Auto-Encoders called SDAE-DNN is presented, to exploit the rich semantic features for building damage detection. Deep-layer feature abstraction of SDAE-DNN could boost detection accuracy through learning more intrinsic and discriminative features, which outperformed other methods using state-of-the-art alternative classifiers. We demonstrate the feasibility and effectiveness of our method using a subset of WorldView-2 imagery, in the complex urban areas of Bhaktapur, Nepal, which was affected by the Nepal Earthquake of April 25, 2015.
Quantum processor swapped in for a neural network

It's become increasingly clear that quantum computers won't have a single moment when they become clearly superior to classical hardware. Instead, we're likely to see them becoming useful for a narrow set of problems and then gradually expand out from there to an increasing range of computations. The question obviously becomes one of where the utility will be seen first.
Spatial Graph Convolutional Neural Network via Structured Subdomain Adaptation and Domain Adversarial Learning for Bearing Fault Diagnosis

Unsupervised domain adaptation (UDA) has shown remarkable results in bearing fault diagnosis under changing working conditions in recent years. However, most UDA methods do not consider the geometric structure of the data. Furthermore, the global domain adaptation technique is commonly applied, which ignores the relation between subdomains. This paper addresses mentioned challenges by presenting the novel deep subdomain adaptation graph convolution neural network (DSAGCN), which has two key characteristics: First, graph convolution neural network (GCNN) is employed to model the structure of data. Second, adversarial domain adaptation and local maximum mean discrepancy (LMMD) methods are applied concurrently to align the subdomain's distribution and reduce structure discrepancy between relevant subdomains and global domains. CWRU and Paderborn bearing datasets are used to validate the DSAGCN method's efficiency and superiority between comparison models. The experimental results demonstrate the significance of aligning structured subdomains along with domain adaptation methods to obtain an accurate data-driven model in unsupervised fault diagnosis.
Merging Subject Matter Expertise and Deep Convolutional Neural Network for State-Based Online Machine-Part Interaction Classification

Machine-part interaction classification is a key capability required by Cyber-Physical Systems (CPS), a pivotal enabler of Smart Manufacturing (SM). While previous relevant studies on the subject have primarily focused on time series classification, change point detection is equally important because it provides temporal information on changes in behavior of the machine. In this work, we address point detection and time series classification for machine-part interactions with a deep Convolutional Neural Network (CNN) based framework. The CNN in this framework utilizes a two-stage encoder-classifier structure for efficient feature representation and convenient deployment customization for CPS. Though data-driven, the design and optimization of the framework are Subject Matter Expertise (SME) guided. An SME defined Finite State Machine (FSM) is incorporated into the framework to prohibit intermittent misclassifications. In the case study, we implement the framework to perform machine-part interaction classification on a milling machine, and the performance is evaluated using a testing dataset and deployment simulations. The implementation achieved an average F1-Score of 0.946 across classes on the testing dataset and an average delay of 0.24 seconds on the deployment simulations.
Universal computation using localized limit-cycle attractors in neural networks

Neural networks are dynamical systems that compute with their dynamics. One example is the Hopfield model, forming an associative memory which stores patterns as global attractors of the network dynamics. From studies of dynamical networks it is well known that localized attractors also exist. Yet, they have not been used in computing paradigms. Here we show that interacting localized attractors in threshold networks can result in universal computation. We develop a rewiring algorithm that builds universal Boolean gates in a biologically inspired two-dimensional threshold network with randomly placed and connected nodes using collision-based computing. We aim at demonstrating the computational capabilities and the ability to control local limit cycle attractors in such networks by creating simple Boolean gates by means of these local activations. The gates use glider guns, i.e., localized activity that periodically generates "gliders" of activity that propagate through space. Several such gliders are made to collide, and the result of their interaction is used as the output of a Boolean gate. We show that these gates can be used to build a universal computer.
Fast computation of distance-generalized cores using sampling

Core decomposition is a classic technique for discovering densely connected regions in a graph with large range of applications. Formally, a $k$-core is a maximal subgraph where each vertex has at least $k$ neighbors. A natural extension of a $k$-core is a $(k, h)$-core, where each node must have at least $k$ nodes that can be reached with a path of length $h$. The downside in using $(k, h)$-core decomposition is the significant increase in the computational complexity: whereas the standard core decomposition can be done in $O(m)$ time, the generalization can require $O(n^2m)$ time, where $n$ and $m$ are the number of nodes and edges in the given graph.
SHGNN: Structure-Aware Heterogeneous Graph Neural Network

Many real-world graphs (networks) are heterogeneous with different types of nodes and edges. Heterogeneous graph embedding, aiming at learning the low-dimensional node representations of a heterogeneous graph, is vital for various downstream applications. Many meta-path based embedding methods have been proposed to learn the semantic information of heterogeneous graphs in recent years. However, most of the existing techniques overlook the graph structure information when learning the heterogeneous graph embeddings. This paper proposes a novel Structure-Aware Heterogeneous Graph Neural Network (SHGNN) to address the above limitations. In detail, we first utilize a feature propagation module to capture the local structure information of intermediate nodes in the meta-path. Next, we use a tree-attention aggregator to incorporate the graph structure information into the aggregation module on the meta-path. Finally, we leverage a meta-path aggregator to fuse the information aggregated from different meta-paths. We conducted experiments on node classification and clustering tasks and achieved state-of-the-art results on the benchmark datasets, which shows the effectiveness of our proposed method.
Neural Network Acceleration of Large-scale Structure Theory Calculations

We make use of neural networks to accelerate the calculation of power spectra required for the analysis of galaxy clustering and weak gravitational lensing data. For modern perturbation theory codes, evaluation time for a single cosmology and redshift can take on the order of two seconds. In combination with the comparable time required to compute linear predictions using a Boltzmann solver, these calculations are the bottleneck for many contemporary large-scale structure analyses. In this work, we construct neural network-based surrogate models for Lagrangian perturbation theory (LPT) predictions of matter power spectra, real and redshift space galaxy power spectra, and galaxy--matter cross power spectra that attain $\sim 0.1\%$ (at one sigma) accuracy over a broad range of scales in a $w$CDM parameter space. The neural network surrogates can be evaluated in approximately one millisecond, a factor of 1000 times faster than the full Boltzmann code and LPT computations. In a simulated full-shape redshift space galaxy power spectrum analysis, we demonstrate that the posteriors obtained using our surrogates are accurate compared to those obtained using the full LPT model. We make our surrogate models public at this https URL, so that others may take advantage of the speed gains they provide to enable rapid iteration on analysis settings, something that is essential in complex contemporary large-scale structure analyses.
Mysterious "GPU-N" in research paper could be GH100 NVIDIA Hopper GPU with 100GB of HBM2 VRAM, 8576 CUDA Cores, and 779 TFLOPs of FP16 compute

Twitter user @Redfire75369 discovered a research paper on using low-precision GPU compute for deep learning with GPU-N. The mystery card appears to be a single-die GH100 NVIDIA Hopper GPU, offering a mammoth 779 TFLOPs of FP16 compute. Twitter user @Redfire75369 recently discovered a research paper talking about using low-precision GPU...
Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structures can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document.
Acceleration techniques for optimization over trained neural network ensembles

We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit (ReLU) activation. Recent literature has explored the use of a single neural network to model either uncertain or complex elements within an objective function. However, it is well known that ensembles of neural networks produce more stable predictions and have better generalizability than models with single neural networks, which suggests the application of ensembles of neural networks in a decision-making pipeline. We study how to incorporate a neural network ensemble as the objective function of an optimization model and explore computational approaches for the ensuing problem. We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network. We develop two acceleration techniques for our model, the first one is a preprocessing procedure to tighten bounds for critical neurons in the neural network while the second one is a set of valid inequalities based on Benders decomposition. Experimental evaluations of our solution methods are conducted on one global optimization problem and two real-world data sets; the results suggest that our optimization algorithm outperforms the adaption of an state-of-the-art approach in terms of computational time and optimality gaps.
