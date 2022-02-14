ContributorsPublishersAdvertisers
Science

Random walk informed community detection reveals heterogeneities in large networks

By Solène Song, Malek Senoussi, Paul Escande, Paul Villoutreix
arxiv.org
 2 days ago

Random walks on networks are widely used to model stochastic processes such as search strategies, transportation problems or disease propagation. A prominent biological example of search by random walkers on a network is the guiding of naive T cells by the lymphatic conduits network in the lymph node. Motivated by this...

arxiv.org

arxiv.org

Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine

Graph Neural Networks (GNNs) have emerged as a powerful model for ML over graph-structured data. Yet, scalability remains a major challenge for using GNNs over billion-edge inputs. The creation of mini-batches used for training incurs computational and data movement costs that grow exponentially with the number of GNN layers as state-of-the-art models aggregate information from the multi-hop neighborhood of each input node. In this paper, we focus on scalable training of GNNs with emphasis on resource efficiency. We show that out-of-core pipelined mini-batch training in a single machine outperforms resource-hungry multi-GPU solutions. We introduce Marius++, a system for training GNNs over billion-scale graphs. Marius++ provides disk-optimized training for GNNs and introduces a series of data organization and algorithmic contributions that 1) minimize the memory-footprint and end-to-end time required for training and 2) ensure that models learned with disk-based training exhibit accuracy similar to those fully trained in mixed CPU/GPU settings. We evaluate Marius++ against PyTorch Geometric and Deep Graph Library using seven benchmark (model, data set) settings and find that Marius++ with one GPU can achieve the same level of model accuracy up to 8$\times$ faster than these systems when they are using up to eight GPUs. For these experiments, disk-based training allows Marius++ deployments to be up to 64$\times$ cheaper in monetary cost than those of the competing systems.
CODING & PROGRAMMING
arxiv.org

Reliable Community Search in Dynamic Networks

Local community search is an important research topic to support complex network data analysis in various scenarios like social networks, collaboration networks, and cellular networks. The evolution of networks over time has motivated several recent studies to identify local communities from dynamic networks. However, they only utilized the aggregation of disjoint structural information to measure the quality of communities, which ignores the reliability of communities in a continuous time interval. To fill this research gap, we propose a novel $(\theta,k)$-$core$ reliable community (CRC) model in the weighted dynamic networks, and define the problem of the most reliable community search that couples the desirable properties of connection strength, cohesive structure continuity, and the maximal member engagement. To solve this problem, we first develop an online CRC search algorithm by proposing a definition of eligible edge set and deriving the eligible edge set based pruning rules. % called the Eligible Edge Filtering-based CRC algorithm. After that, we devise a Weighted Core Forest-Index and index-based dynamic programming CRC search algorithm, which can prune a large number of insignificant intermediate results according to the maintained weight and structure information in the index, as well as the proposed upper bound properties. % our proposed pruning properties and upper bound properties. Finally, we conduct extensive experiments to verify the efficiency of our proposed algorithms and the effectiveness of our proposed community model on eight real datasets under different parameter settings.
TECHNOLOGY
arxiv.org

Bandwidth-Constrained Distributed Quickest Change Detection in Heterogeneous Sensor Networks: Anonymous vs Non-Anonymous Settings

The heterogeneous distribute quickest changed detection (HetDQCD) problem with 1-bit feedback is studied, in which a fusion center monitors an abrupt change through a bunch of heterogeneous sensors via anonymous 1-bit feedbacks. Two fusion rules, one-shot and voting rules, are considered. We analyze the performance in terms of the worst-case expected detection delay and the average run length to false alarm for the two fusion rules. Our analysis unveils the mixed impact of involving more sensors into the decision and enables us to find near optimal choices of parameters in the two schemes. Notably, it is shown that, in contrast to the homogeneous setting, the first alarm rule may no longer lead to the best performance among one-shot schemes. The non-anonymous setting is also investigated where a simple scheme that only accepts alarms from the most informative sensors is shown to outperform all the above schemes and the mixture CUSUM scheme for the anonymous HetDQCD, hinting at the price of anonymity.
COMPUTERS
arxiv.org

Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks

Seyyedali Hosseinalipour, Su Wang, Nicolo Michelusi, Vaneet Aggarwal, Christopher G. Brinton, David J. Love, Mung Chiang. Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices, via iterative local updates (at devices) and global aggregations (at the server). In this paper, we develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions: (i) Network, allowing decentralized cooperation among the devices via device-to-device (D2D) communications. (ii) Heterogeneity, interpreted at three levels: (ii-a) Learning: PSL considers heterogeneous number of stochastic gradient descent iterations with different mini-batch sizes at the devices; (ii-b) Data: PSL presumes a dynamic environment with data arrival and departure, where the distributions of local datasets evolve over time, captured via a new metric for model/concept drift. (ii-c) Device: PSL considers devices with different computation and communication capabilities. (iii) Proximity, where devices have different distances to each other and the access point. PSL considers the realistic scenario where global aggregations are conducted with idle times in-between them for resource efficiency improvements, and incorporates data dispersion and model dispersion with local model condensation into FedL. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning. We then propose network-aware dynamic model tracking to optimize the model learning vs. resource efficiency tradeoff, which we show is an NP-hard signomial programming problem. We finally solve this problem through proposing a general optimization solver. Our numerical results reveal new findings on the interdependencies between the idle times in-between the global aggregations, model/concept drift, and D2D cooperation configuration.
COMPUTERS
#Random Walk
arxiv.org

GraphEye: A Novel Solution for Detecting Vulnerable Functions Based on Graph Attention Network

With the continuous extension of the Industrial Internet, cyber incidents caused by software vulnerabilities have been increasing in recent years. However, software vulnerabilities detection is still heavily relying on code review done by experts, and how to automatedly detect software vulnerabilities is an open problem so far. In this paper, we propose a novel solution named GraphEye to identify whether a function of C/C++ code has vulnerabilities, which can greatly alleviate the burden of code auditors. GraphEye is originated from the observation that the code property graph of a non-vulnerable function naturally differs from the code property graph of a vulnerable function with the same functionality. Hence, detecting vulnerable functions is attributed to the graph classification problem.GraphEye is comprised of VecCPG and GcGAT. VecCPG is a vectorization for the code property graph, which is proposed to characterize the key syntax and semantic features of the corresponding source code. GcGAT is a deep learning model based on the graph attention graph, which is proposed to solve the graph classification problem according to VecCPG. Finally, GraphEye is verified by the SARD Stack-based Buffer Overflow, Divide-Zero, Null Pointer Deference, Buffer Error, and Resource Error datasets, the corresponding F1 scores are 95.6%, 95.6%,96.1%,92.6%, and 96.1% respectively, which validate the effectiveness of the proposed solution.
SOFTWARE
arxiv.org

Physics-informed neural networks for solving parametric magnetostatic problems

The optimal design of magnetic devices becomes intractable using current computational methods when the number of design parameters is high. The emerging physics-informed deep learning framework has the potential to alleviate this curse of dimensionality. The objective of this paper is to investigate the ability of physics-informed neural networks to learn the magnetic field response as a function of design parameters in the context of a two-dimensional (2-D) magnetostatic problem. Our approach is as follows. We derive the variational principle for 2-D parametric magnetostatic problems, and prove the existence and uniqueness of the solution that satisfies the equations of the governing physics, i.e., Maxwell's equations. We use a deep neural network (DNN) to represent the magnetic field as a function of space and a total of ten parameters that describe geometric features and operating point conditions. We train the DNN by minimizing the physics-informed loss function using a variant of stochastic gradient descent. Subsequently, we conduct systematic numerical studies using a parametric EI-core electromagnet problem. In these studies, we vary the DNN architecture trying more than one hundred different possibilities. For each study, we evaluate the accuracy of the DNN by comparing its predictions to those of finite element analysis. In an exhaustive non-parametric study, we observe that sufficiently parameterized dense networks result in relative errors of less than 1%. Residual connections always improve relative errors for the same number of training iterations. Also, we observe that Fourier encoding features aligned with the device geometry do improve the rate of convergence, albeit higher-order harmonics are not necessary. Finally, we demonstrate our approach on a ten-dimensional problem with parameterized geometry.
COMPUTERS
arxiv.org

Influence maximization under limited network information: Seeding high-degree neighbors

The diffusion of information, norms, and practices across a social network can be initiated by compelling a small number of seed individuals to adopt first. Strategies proposed in previous work either assume full network information or large degree of control over what information is collected. However, privacy settings on the Internet and high non-response in surveys often severely limit available connectivity information. Here we propose a seeding strategy for scenarios with limited network information: Only the degrees and connections of some random nodes are known. This new strategy is a modification of "random neighbor sampling" and seeds the highest-degree neighbors of randomly selected nodes. In simulations of a linear threshold model on a range of synthetic and real-world networks, we find that this new strategy outperforms other seeding strategies, including high-degree seeding and clustered seeding.
SCIENCE
arxiv.org

Mean field limits of co-evolutionary heterogeneous networks

Many science phenomena are modelled as interacting particle systems (IPS) coupled on static networks. In reality, network connections are far more dynamic. Connections among individuals receive feedback from nearby individuals and make changes to better adapt to the world. Hence, it is reasonable to model myriad real-world phenomena as co-evolutionary (or adaptive) networks. In this paper, we propose a rigorous formulation for limits of a sequence of co-evolutionary Kuramoto oscillators coupled on heterogeneous co-evolutionary networks, which receive feedback from the dynamics of the oscillators on the networks. We show under mild conditions, the mean field limit (MFL) of the co-evolutionary network exists and the sequence of co-evolutionary Kuramoto networks converges to this MFL. Such MFL is described by solutions of a generalized Vlasov type equation. We treat the graph limits as graph measures, motivated by the recent work in [Kuehn, Xu. Vlasov equations on digraph measures, arXiv:2107.08419, 2021]. Under a mild condition on the initial graph measure, we show that the graph measures are positive over a finite time interval. In comparison to the recently emerging works on MFLs of IPS coupled on non-co-evolutionary networks (i.e., static networks or time-dependent networks independent of the dynamics of the IPS), our work is the first to rigorously address the MFL of a co-evolutionary network model. The approach is based on our formulation of a generalization of the co-evolutionary network as a hybrid system of ODEs and measure differential equations parametrized by a vertex variable, together with an analogue of the variation of parameters formula, as well as the generalized Neunzert's in-cell-particle method developed in [Kuehn, Xu. Vlasov equations on digraph measures, arXiv:2107.08419, 2021].
SCIENCE
arxiv.org

NIMSA: Non-Interactive Multihoming Security Authentication Scheme for vehicular communications in Mobile Heterogeneous Networks

In vehicular communications, in-vehicle devices' mobile and multihoming characteristics bring new requirements for devicevsecurity authentication. On the one hand, the existing network layer authentication methods rely on the PKI system; on the other hand, key negotiation needs interaction. These two points determine that the traditional security authentication method requires bandwidth consumption and additional delay. It is unsuitable for heterogeneous wireless scenarios with a high packet loss rate and limited bandwidth resources. In addition, the establishment of a security association state is contrary to the original design that the network layer only provides a forwarding function. We proposed a non-interactive multihoming security authentication (NIMSA) scheme, a stateless network layer security authentication scheme triggered by data forwarding. Our scheme adopts an identity-based non-interactive key agreement strategy to avoid the interaction of signaling information, which is lightweight and has good support for mobile and multipath parallel transmission scenarios. The comparison with IKEv2 and its mobility and multihoming extension scheme (MOBIKE) shows that the proposed scheme has shorter authentication and handover delay and data transmission delay and can bring better bandwidth aggregation effect in the scenario of multipath parallel transmission.
CELL PHONES
arxiv.org

Spectrally Adapted Physics-Informed Neural Networks for Solving Unbounded Domain Problems

Solving analytically intractable partial differential equations (PDEs) that involve at least one variable defined in an unbounded domain requires efficient numerical methods that accurately resolve the dependence of the PDE on that variable over several orders of magnitude. Unbounded domain problems arise in various application areas and solving such problems is important for understanding multi-scale biological dynamics, resolving physical processes at long time scales and distances, and performing parameter inference in engineering problems. In this work, we combine two classes of numerical methods: (i) physics-informed neural networks (PINNs) and (ii) adaptive spectral methods. The numerical methods that we develop take advantage of the ability of physics-informed neural networks to easily implement high-order numerical schemes to efficiently solve PDEs. We then show how recently introduced adaptive techniques for spectral methods can be integrated into PINN-based PDE solvers to obtain numerical solutions of unbounded domain problems that cannot be efficiently approximated by standard PINNs. Through a number of examples, we demonstrate the advantages of the proposed spectrally adapted PINNs (s-PINNs) over standard PINNs in approximating functions, solving PDEs, and estimating model parameters from noisy observations in unbounded domains.
SCIENCE
arxiv.org

Hybridization of Capsule and LSTM Networks for unsupervised anomaly detection on multivariate data

Deep learning techniques have recently shown promise in the field of anomaly detection, providing a flexible and effective method of modelling systems in comparison to traditional statistical modelling and signal processing-based methods. However, there are a few well publicised issues Neural Networks (NN)s face such as generalisation ability, requiring large volumes of labelled data to be able to train effectively and understanding spatial context in data. This paper introduces a novel NN architecture which hybridises the Long-Short-Term-Memory (LSTM) and Capsule Networks into a single network in a branched input Autoencoder architecture for use on multivariate time series data. The proposed method uses an unsupervised learning technique to overcome the issues with finding large volumes of labelled training data. Experimental results show that without hyperparameter optimisation, using Capsules significantly reduces overfitting and improves the training efficiency. Additionally, results also show that the branched input models can learn multivariate data more consistently with or without Capsules in comparison to the non-branched input models. The proposed model architecture was also tested on an open-source benchmark, where it achieved state-of-the-art performance in outlier detection, and overall performs best over the metrics tested in comparison to current state-of-the art methods.
COMPUTERS
arxiv.org

Hardware calibrated learning to compensate heterogeneity in analog RRAM-based Spiking Neural Networks

Filippo Moro, E. Esmanhotto, T. Hirtzlin, N. Castellani, A. Trabelsi, T. Dalgaty, G. Molas, F. Andrieu, S. Brivio, S. Spiga, G. Indiveri, M. Payvand, E. Vianello. Spiking Neural Networks (SNNs) can unleash the full power of analog Resistive Random Access Memories (RRAMs) based circuits for low power signal processing. Their inherent computational sparsity naturally results in energy efficiency benefits. The main challenge implementing robust SNNs is the intrinsic variability (heterogeneity) of both analog CMOS circuits and RRAM technology. In this work, we assessed the performance and variability of RRAM-based neuromorphic circuits that were designed and fabricated using a 130\,nm technology node. Based on these results, we propose a Neuromorphic Hardware Calibrated (NHC) SNN, where the learning circuits are calibrated on the measured data. We show that by taking into account the measured heterogeneity characteristics in the off-chip learning phase, the NHC SNN self-corrects its hardware non-idealities and learns to solve benchmark tasks with high accuracy. This work demonstrates how to cope with the heterogeneity of neurons and synapses for increasing classification accuracy in temporal tasks.
COMPUTERS
arxiv.org

Branching Random Walks with Two Types of Particles on Multidimensional Lattices

We consider a continuous-time branching random walk on a multidimensional lattice with two types of particles and an infinite number of initial particles. The main results are devoted to the study of the generating function and the limiting behavior of the moments of subpopulations generated by a single particle of each type. We assume that particle types differ from each other not only by the laws of branching, as in multi-type branching processes, but also by the laws of walking. For a critical branching process at each lattice point and recurrent random walk of particles, the effect of limit spatial clustering of particles over the lattice is studied. A model illustrating epidemic propagation is also considered. In this model, we consider two types of particles: infected and immunity generated. Initially, there is an infected particle that can infect others. Here, for the local number of particles of each type at a lattice point, we study the moments and their limiting behavior. Also, the effect of intermittency of the infected particles is studied for a supercritical branching process at each lattice point. Simulations are presented to demonstrate the effect of limit clustering for the epidemiological model.
SCIENCE
arxiv.org

Improving Fraud detection via Hierarchical Attention-based Graph Neural Network

Graph neural networks (GNN) have emerged as a powerful tool for fraud detection tasks, where fraudulent nodes are identified by aggregating neighbor information via different relations. To get around such detection, crafty fraudsters resort to camouflage via connecting to legitimate users (i.e., relation camouflage) or providing seemingly legitimate feedbacks (i.e., feature camouflage). A wide-spread solution reinforces the GNN aggregation process with neighbor selectors according to original node features. This method may carry limitations when identifying fraudsters not only with the relation camouflage, but with the feature camouflage making them hard to distinguish from their legitimate neighbors. In this paper, we propose a Hierarchical Attention-based Graph Neural Network (HA-GNN) for fraud detection, which incorporates weighted adjacency matrices across different relations against camouflage. This is motivated in the Relational Density Theory and is exploited for forming a hierarchical attention-based graph neural network. Specifically, we design a relation attention module to reflect the tie strength between two nodes, while a neighborhood attention module to capture the long-range structural affinity associated with the graph. We generate node embeddings by aggregating information from local/long-range structures and original node features. Experiments on three real-world datasets demonstrate the effectiveness of our model over the state-of-the-arts.
arxiv.org

Coded ResNeXt: a network for designing disentangled information paths

To avoid treating neural networks as highly complex black boxes, the deep learning research community has tried to build interpretable models allowing humans to understand the decisions taken by the model. Unfortunately, the focus is mostly on manipulating only the very high-level features associated with the last layers. In this work, we look at neural network architectures for classification in a more general way and introduce an algorithm which defines before the training the paths of the network through which the per-class information flows. We show that using our algorithm we can extract a lighter single-purpose binary classifier for a particular class by removing the parameters that do not participate in the predefined information path of that class, which is approximately 60% of the total parameters. Notably, leveraging coding theory to design the information paths enables us to use intermediate network layers for making early predictions without having to evaluate the full network. We demonstrate that a slightly modified ResNeXt model, trained with our algorithm, can achieve higher classification accuracy on CIFAR-10/100 and ImageNet than the original ResNeXt, while having all the aforementioned properties.
CODING & PROGRAMMING
helpnetsecurity.com

Cynamics launches cloud NDR to strengthen network monitoring and detection capabilities

Cynamics launched a cloud NDR offering, removing network visibility barriers by strengthening network monitoring, detection and prediction capabilities across any cloud-native or hybrid-cloud environments into a single-pane view, accommodating shifting requirements of modern organizations. Cynamics’ next-gen Cloud NDR collects small network samples from the cloud networks, infers from these on...
SOFTWARE
arxiv.org

Active Random Walks in One and Two Dimensions

We investigate active lattice walks: biased continuous time Random Walks which perform orientational diffusion between lattice directions in one and two spatial dimensions. We study the occupation probability of an arbitrary site on the lattice in one and two dimensions, and derive exact results in the continuum limit. Next, we compute the large deviation free energy function in both one and two dimensions, which we use to compute the moments and the cumulants of the displacements exactly at late times. Our exact results demonstrate that the cross-correlations between the motion in the $x$ and $y$ directions in two dimensions persist in the large deviation function. We also demonstrate that the large deviation function of an active particle with diffusion displays two regimes, with differing diffusive behaviors. We verify our analytic results with kinetic Monte Carlo simulations of an active lattice walker in one and two dimensions.
LIFESTYLE
arxiv.org

Towards machine learning for microscopic mechanisms: a formula search for crystal structure stability based on atomic properties

Machine Learning (ML) techniques are revolutionizing the way to perform efficient materials modeling. Nevertheless, not all the ML approaches allow for the understanding of microscopic mechanisms at play in different phenomena. To address the latter aspect, we propose a combinatorial machine-learning approach to obtain physical formulas based on simple and easily-accessible ingredients, such as atomic properties. The latter are used to build materials features that are finally employed, through Linear Regression, to predict the energetic stability of semiconducting binary compounds with respect to zincblende and rocksalt crystal structures. The adopted models are trained using dataset built from first-principles calculations. Our results show that already one-dimensional (1D) formulas well describe the energetics; a simple grid-search optimization of the automatically-obtained 1D-formulas enhances the prediction performances at a very small computational cost. In addition, our approach allows to highlight the role of the different atomic properties involved in the formulas. The computed formulas clearly indicate that "spatial" atomic properties (i.e. radii indicating maximum probability densities for $s,p,d$ electronic shells) drive the stabilization of one crystal structure with respect to the other, suggesting the major relevance of the radius associated to the $p$-shell of the cation species.
COMPUTERS
arxiv.org

Identifying strongly correlated groups of sections in a large motorway network

In a motorway network, correlations between the different links, i.e. between the parts of (different) motorways, are of considerable interest. Knowledge of fluxes and velocities on individual motorways is not sufficient, rather, their correlations determine or reflect, respectively, the functionality of and the dynamics on the network as a whole. These correlations are time dependent as the dynamics on the network is highly non-stationary, as it strongly varies during the day and over the week. Correlations are indispensable to detect risks of failure in a traffic network. Discovery of alternative routes less correlated with the vulnerable ones helps to make the traffic network robust and to avoid a collapse. Hence, the identification of, especially, groups of strongly correlated road sections is needed. To this end, we employ an optimized $k$-means clustering method. A major ingredient is the spectral information of certain correlation matrices in which the leading collective motion of the network has been removed. We identify strongly correlated groups of sections in the large motorway network of North Rhine-Westphalia (NRW), Germany. The groups classify the motorway sections in terms of spectral and geographic features as well as of traffic phases during different time periods. The representation and visualization of the groups on the real topology, i.e. on the road map, provides new results on the dynamics on the motorway network. Our approach is very general and can also be applied to other correlated complex systems.
TRAFFIC
arxiv.org

Fast and accurate dose predictions for novel radiotherapy treatments in heterogeneous phantoms using conditional 3D-UNet generative adversarial networks

Florian Mentzel, Kevin Kröninger, Michael Lerch, Olaf Nackenhorst, Jason Paino, Anatoly Rosenfeld, Ayu Saraswati, Ah Chung Tsoi, Jens Weingarten, Markus Hagenbuchner, Susanna Guatelli. Novel radiotherapy techniques like synchrotron X-ray microbeam radiation therapy (MRT), require fast dose distribution predictions that are accurate at the sub-mm level, especially close to tissue/bone/air...
SCIENCE

