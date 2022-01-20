ContributorsPublishersAdvertisers
High Performance Parallel I/O and In-Situ Analysis in the WRF Model with ADIOS2

By Michael Laufer, Erick Fredj
arxiv.org
 4 days ago

As the computing power of large-scale HPC clusters approaches the Exascale, the gap between compute capabilities and storage systems is ever widening. In particular, the popular High Performance Computing (HPC) application, the Weather Research and Forecasting Model (WRF) is being currently being utilized for high resolution forecasting and research...

arxiv.org

Comments / 0

hackaday.com

Exploring Tesla Model S High Voltage Cabling

When he’s not busy with his day job as professor of computer and automotive engineering at Weber State University, [John Kelly] is a prolific producer of educational videos. We found his video tracing out the 22+ meters of high voltage cabling in a Tesla Model S (below the break) quite interesting. [John] does warn that his videos are highly detailed and may not be for everyone:
CARS
Carscoops

This Original Tesla Model S Performance Model Is Nearing 1,000,000 Miles

A Tesla owner in Germany has just recorded more than 1,500,000 kilometers (932,256 miles) in his Model S P85. For those unfamiliar with the P85, it’s the original performance version of the Model S. Long before the Model S Plaid became the fastest accelerating production sedan on the planet,...
CARS
arxiv.org

Performance Evaluation of Stochastic Bipartite Matching Models

We consider a stochastic bipartite matching model consisting of multi-class customers and multi-class servers. Compatibility constraints between the customer and server classes are described by a bipartite graph. Each time slot, exactly one customer and one server arrive. The incoming customer (resp. server) is matched with the earliest arrived server (resp. customer) with a class that is compatible with its own class, if there is any, in which case the matched customer-server couple immediately leaves the system; otherwise, the incoming customer (resp. server) waits in the system until it is matched. Contrary to classical queueing models, both customers and servers may have to wait, so that their roles are interchangeable. While (the process underlying) this model was already known to have a product-form stationary distribution, this paper derives a new compact and manageable expression for the normalization constant of this distribution, as well as for the waiting probability and mean waiting time of customers and servers. We also provide a numerical example and make some important observations.
CODING & PROGRAMMING
vmware.com

Paravirtual RDMA for High Performance Computing

A new paper shows that paravirtual RDMA (PVRDMA) is a viable option for using vSphere remote direct memory access (RDMA) in vSphere for virtualized high performance computing (HPC) instead of the usual use of PCI passthrough (also known as vSphere DirectPath I/O), which doesn’t let you use typical vSphere features like high availability, DRS, vMotion, and others.
SOFTWARE
arxiv.org

Calipers: A Criticality-aware Framework for Modeling Processor Performance

Computer architecture design space is vast and complex. Tools are needed to explore new ideas and gain insights quickly, with low efforts and at a desired accuracy. We propose Calipers, a criticality-based framework to model key abstractions of complex architectures and a program's execution using dynamic event-dependence graphs. By applying graph algorithms, Calipers can track instruction and event dependencies, compute critical paths, and analyze architecture bottlenecks. By manipulating the graph, Calipers enables architects to investigate a wide range of Instruction Set Architecture (ISA) and microarchitecture design choices/"what-if" scenarios during both early- and late-stage design space exploration without recompiling and rerunning the program. Calipers can model in-order and out-of-order microarchitectures, structural hazards, and different types of ISAs, and can evaluate multiple ideas in a single run. Modeling algorithms are described in detail.
CODING & PROGRAMMING
arxiv.org

Analysis of a five-factor capital market model

In this paper we analyse the five-factor capital market model of Munk et al.(2004). The model features a Vasicek interest rate model, an equity index with mean-reverting excess return and an index for realized inflation with mean-reverting expectation. The primary aim of the analysis is to facilitate so-called exact simulation from the model on a set of discrete time points. It turns out that this can be achieved by sampling from a (degenerate) seven-dimensional normal distribution. We derive the distributional results necessary and describe how to overcome the rank deficiency of the variance-covariance matrix in practice.
BUSINESS
arxiv.org

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

Zhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee, John Lundell, Changkyu Kim, Arun Kejariwal, John D. Owens. We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel runtimes) and the device idle time are important components of the overall device time. We therefore tackle them separately by (1) flexibly adopting heuristic-based and ML-based kernel performance models for operators that dominate the device active time, and (2) categorizing operator overheads into five types to determine quantitatively their contribution to the device active time. Combining these two parts, we propose a critical-path-based algorithm to predict the per-batch training time of DLRM by traversing its execution graph. We achieve less than 10% geometric mean average error (GMAE) in all kernel performance modeling, and 5.23% and 7.96% geomean errors for GPU active time and overall end-to-end per-batch training time prediction, respectively. We show that our general performance model not only achieves low prediction error on DLRM, which has highly customized configurations and is dominated by multiple factors, but also yields comparable accuracy on other compute-bound ML models targeted by most previous methods. Using this performance model and graph-level data and task dependency analyses, we show our system can provide more general model-system co-design than previous methods.
CODING & PROGRAMMING
arxiv.org

Near-Field Modelling and Performance Analysis of Modular Extremely Large-Scale Array Communications

This letter studies a new array architecture, termed as modular extremely large-scale array (XL-array), for which a large number of array elements are arranged in a modular manner. Each module consists of a moderate number of array elements and the modules are regularly arranged with the inter-module space typically much larger than signal wavelength to cater to the actual mounting structure. We study the mathematical modelling and conduct the performance analysis for modular XL-array communications, by considering the non-uniform spherical wave (NUSW) characteristic that is more suitable than the conventional uniform plane wave (UPW) assumption for physically large arrays. A closed-form expression is derived for the maximum signal-to-noise ratio (SNR) in terms of the geometries of the modular XL-array, including the total array size and module separation, as well as the user's location. The asymptotic SNR scaling law is revealed as the size of modular array goes to infinity. Furthermore, we show that the developed modelling and performance analysis include the existing results for collocated XL-array or far-field UPW assumption as special cases. Numerical results demonstrate the importance of near-field modelling for modular XL-array communications since it leads to significantly different results from the conventional far-field UPW modelling.
COMPUTERS
arxiv.org

Automated machine learning for secure key rate in discrete-modulated continuous-variable quantum key distribution

Continuous-variable quantum key distribution (CV QKD) with discrete modulation has attracted increasing attention due to its experimental simplicity, lower-cost implementation and compatibility with classical optical communication. Correspondingly, some novel numerical methods have been proposed to analyze the security of these protocols against collective attacks, which promotes key rates over one hundred kilometers of fiber distance. However, numerical methods are limited by their calculation time and resource consumption, for which they cannot play more roles on mobile platforms in quantum networks. To improve this issue, a neural network model predicting key rates in nearly real time has been proposed previously. Here, we go further and show a neural network model combined with Bayesian optimization. This model automatically designs the best architecture of neural network computing key rates in real time. We demonstrate our model with two variants of CV QKD protocols with quaternary modulation. The results show high reliability with secure probability as high as $99.15\%-99.59\%$, considerable tightness and high efficiency with speedup of approximately $10^7$ in both cases. This inspiring model enables the real-time computation of unstructured quantum key distribution protocols' key rate more automatically and efficiently, which has met the growing needs of implementing QKD protocols on moving platforms.
SOFTWARE
towardsdatascience.com

Increasing model velocity for complex models by leveraging hybrid pipelines, parallelization and GPU acceleration

Data science is facing an overwhelming demand for CPU cycles as scientists try to work with datasets that are growing in complexity faster than Moore’s Law can keep up. Considering the need to iterate and retrain quickly, model complexity has been outpacing available compute resources and CPUs for several years, and the problem is growing quickly. The data science industry will need to embrace parallelization and GPU processing to efficiently utilize increasingly complex datasets.
COMPUTERS
arxiv.org

Backdoor Defense with Machine Unlearning

Backdoor injection attack is an emerging threat to the security of neural networks, however, there still exist limited effective defense methods against the attack. In this paper, we propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning. Specifically, BAERASE mainly implements backdoor defense in two key steps. First, trigger pattern recovery is conducted to extract the trigger patterns infected by the victim model. Here, the trigger pattern recovery problem is equivalent to the one of extracting an unknown noise distribution from the victim model, which can be easily resolved by the entropy maximization based generative model. Subsequently, BAERASE leverages these recovered trigger patterns to reverse the backdoor injection procedure and induce the victim model to erase the polluted memories through a newly designed gradient ascent based machine unlearning method. Compared with the previous machine unlearning solutions, the proposed approach gets rid of the reliance on the full access to training data for retraining and shows higher effectiveness on backdoor erasing than existing fine-tuning or pruning methods. Moreover, experiments show that BAERASE can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99\% on four benchmark datasets.
SOFTWARE
arxiv.org

A Machine Learning Framework for Distributed Functional Compression over Wireless Channels in IoT

IoT devices generating enormous data and state-of-the-art machine learning techniques together will revolutionize cyber-physical systems. In many diverse fields, from autonomous driving to augmented reality, distributed IoT devices compute specific target functions without simple forms like obstacle detection, object recognition, etc. Traditional cloud-based methods that focus on transferring data to a central location either for training or inference place enormous strain on network resources. To address this, we develop, to the best of our knowledge, the first machine learning framework for distributed functional compression over both the Gaussian Multiple Access Channel (GMAC) and orthogonal AWGN channels. Due to the Kolmogorov-Arnold representation theorem, our machine learning framework can, by design, compute any arbitrary function for the desired functional compression task in IoT. Importantly the raw sensory data are never transferred to a central node for training or inference, thus reducing communication. For these algorithms, we provide theoretical convergence guarantees and upper bounds on communication. Our simulations show that the learned encoders and decoders for functional compression perform significantly better than traditional approaches, are robust to channel condition changes and sensor outages. Compared to the cloud-based scenario, our algorithms reduce channel use by two orders of magnitude.
SOFTWARE
arxiv.org

PaRT: Parallel Learning Towards Robust and Transparent AI

Mahsa Paknezhad, Hamsawardhini Rengarajan, Chenghao Yuan, Sujanya Suresh, Manas Gupta, Savitha Ramasamy, Lee Hwee Kuan. This paper takes a parallel learning approach for robust and transparent AI. A deep neural network is trained in parallel on multiple tasks, where each task is trained only on a subset of the network resources. Each subset consists of network segments, that can be combined and shared across specific tasks. Tasks can share resources with other tasks, while having independent task-related network resources. Therefore, the trained network can share similar representations across various tasks, while also enabling independent task-related representations. The above allows for some crucial outcomes. (1) The parallel nature of our approach negates the issue of catastrophic forgetting. (2) The sharing of segments uses network resources more efficiently. (3) We show that the network does indeed use learned knowledge from some tasks in other tasks, through shared representations. (4) Through examination of individual task-related and shared representations, the model offers transparency in the network and in the relationships across tasks in a multi-task setting. Evaluation of the proposed approach against complex competing approaches such as Continual Learning, Neural Architecture Search, and Multi-task learning shows that it is capable of learning robust representations. This is the first effort to train a DL model on multiple tasks in parallel. Our code is available at this https URL.
COMPUTERS
arxiv.org

Bias in Automated Speaker Recognition

Automated speaker recognition uses data processing to identify speakers by their voice. Today, automated speaker recognition technologies are deployed on billions of smart devices and in services such as call centres. Despite their wide-scale deployment and known sources of bias in face recognition and natural language processing, bias in automated speaker recognition has not been studied systematically. We present an in-depth empirical and analytical study of bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. Drawing on an established framework for understanding sources of harm in machine learning, we show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge, including model building, implementation, and data generation. Most affected are female speakers and non-US nationalities, who experience significant performance degradation. Leveraging the insights from our findings, we make practical recommendations for mitigating bias in automated speaker recognition, and outline future research directions.
SOFTWARE
arxiv.org

Communication-Efficient Stochastic Zeroth-Order Optimization for Federated Learning

Federated learning (FL), as an emerging edge artificial intelligence paradigm, enables many edge devices to collaboratively train a global model without sharing their private data. To enhance the training efficiency of FL, various algorithms have been proposed, ranging from first-order to second-order methods. However, these algorithms cannot be applied in scenarios where the gradient information is not available, e.g., federated black-box attack and federated hyperparameter tuning. To address this issue, in this paper we propose a derivative-free federated zeroth-order optimization (FedZO) algorithm featured by performing multiple local updates based on stochastic gradient estimators in each communication round and enabling partial device participation. Under the non-convex setting, we derive the convergence performance of the FedZO algorithm and characterize the impact of the numbers of local iterates and participating edge devices on the convergence. To enable communication-efficient FedZO over wireless networks, we further propose an over-the-air computation (AirComp) assisted FedZO algorithm. With an appropriate transceiver design, we show that the convergence of AirComp-assisted FedZO can still be preserved under certain signal-to-noise ratio conditions. Simulation results demonstrate the effectiveness of the FedZO algorithm and validate the theoretical observations.
COMPUTERS
arxiv.org

Propagating uncertainty in a network of energy models

Victoria Volodina (UCL), Nikki Sonenberg (Heilbronn Institute of Mathematical Research, University of Bristol), Jim Q. Smith (University of Warwick), Peter G. Challenor (University of Exeter), Chris J. Dent (University of Edinburgh), Henry P. Wynn (London School of Economics) Computational models are widely used in decision support for energy system operation,...
ENERGY INDUSTRY
arxiv.org

Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models

Realistic fine-grained multi-agent simulation of real-world complex systems is crucial for many downstream tasks such as reinforcement learning. Recent work has used generative models (GANs in particular) for providing high-fidelity simulation of real-world systems. However, such generative models are often monolithic and miss out on modeling the interaction in multi-agent systems. In this work, we take a first step towards building multiple interacting generative models (GANs) that reflects the interaction in real world. We build and analyze a hierarchical set-up where a higher-level GAN is conditioned on the output of multiple lower-level GANs. We present a technique of using feedback from the higher-level GAN to improve performance of lower-level GANs. We mathematically characterize the conditions under which our technique is impactful, including understanding the transfer learning nature of our set-up. We present three distinct experiments on synthetic data, time series data, and image domain, revealing the wide applicability of our technique.
CODING & PROGRAMMING
arxiv.org

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

Multi-agent path planning (MAPP) in continuous spaces is a challenging problem with significant practical importance. One promising approach is to first construct graphs approximating the spaces, called roadmaps, and then apply multi-agent pathfinding (MAPF) algorithms to derive a set of conflict-free paths. While conventional studies have utilized roadmap construction methods developed for single-agent planning, it remains largely unexplored how we can construct roadmaps that work effectively for multiple agents. To this end, we propose a novel concept of roadmaps called cooperative timed roadmaps (CTRMs). CTRMs enable each agent to focus on its important locations around potential solution paths in a way that considers the behavior of other agents to avoid inter-agent collisions (i.e., "cooperative"), while being augmented in the time direction to make it easy to derive a "timed" solution path. To construct CTRMs, we developed a machine-learning approach that learns a generative model from a collection of relevant problem instances and plausible solutions and then uses the learned model to sample the vertices of CTRMs for new, previously unseen problem instances. Our empirical evaluation revealed that the use of CTRMs significantly reduced the planning effort with acceptable overheads while maintaining a success rate and solution quality comparable to conventional roadmap construction approaches.
COMPUTERS
arxiv.org

Exponential ergodicity for a stochastic two-layer quasi-geostrophic model

Ergodic properties of a stochastic medium complexity model for atmosphere and ocean dynamics are analysed. More specifically, a two-layer quasi-geostrophic model for geophysical flows is studied, with the upper layer being perturbed by additive noise. This model is popular in the geosciences, for instance to study the effects of a stochastic wind forcing on the ocean. A rigorous mathematical analysis however meets with the challenge that in the model under study, the noise configuration is spatially degenerate as the stochastic forcing acts only on the top layer. Exponential convergence of solutions laws to the invariant measure is established, implying a spectral gap of the associated Markov semigroup on a space of Hölder continuous functions. The approach provides a general framework for generalised coupling techniques suitable for applications to dissipative SPDEs. In case of the two-layer quasi-geostrophic model, the results require the second layer to obey a certain passivity condition.
SCIENCE
arxiv.org

On the microlocal regularity of the analytic vectors for "sums of squares" of vector fields

We prove via FBI-transform a result concerning the microlocal Gevrey regularity of analytic vectors for operators sums of squares of vector fields with real-valued real analytic coefficients of Hörmander type, thus providing a microlocal version, in the analytic category, of a result due to M. Derridj in "Local estimates for Hörmander's operators of first kind with analytic Gevrey coefficients and application to the regularity of their Gevrey vectors", concerning the problem of the local regularity for the Gevrey vectors for sums of squares of vector fields with real-valued real analytic/Gevrey coefficients.
MATHEMATICS

