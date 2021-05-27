Cancel
CreatorsPublishersAdvertisers
View more in
Software

Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights

By Bo Fang, Daoce Wang, Sian Jin, Quincey Koziol, Zhao Zhang, Qiang Guan, Suren Byna, Sriram Krishnamoorthy, Dingwen Tao
arxiv.org
 22 days ago

Bo Fang, Daoce Wang, Sian Jin, Quincey Koziol, Zhao Zhang, Qiang Guan, Suren Byna, Sriram Krishnamoorthy, Dingwen Tao. In recent years, the increasing complexity in scientific simulations and emerging demands for training heavy artificial intelligence models require massive and fast data accesses, which urges high-performance computing (HPC) platforms to equip with more advanced storage infrastructures such as solid-state disks (SSDs). While SSDs offer high-performance I/O, the reliability challenges faced by the HPC applications under the SSD-related failures remains unclear, in particular for failures resulting in data corruptions. The goal of this paper is to understand the impact of SSD-related faults on the behaviors of complex HPC applications. To this end, we propose FFIS, a FUSE-based fault injection framework that systematically introduces storage faults into the application layer to model the errors originated from SSDs. FFIS is able to plant different I/O related faults into the data returned from underlying file systems, which enables the investigation on the error resilience characteristics of the scientific file format. We demonstrate the use of FFIS with three representative real HPC applications, showing how each application reacts to the data corruptions, and provide insights on the error resilience of the widely adopted HDF5 file format for the HPC applications.

arxiv.org
IN THIS ARTICLE
#Data Storage#Solid State Storage#Hpc#Cluster Computing#Application Layer#Hpc#Ssd#Ffis#Ieee Cluster#Parallel
YOU MAY ALSO LIKE
News Break
Technology
News Break
Computers
News Break
Software
Related
Computerscloudsavvyit.com

What Is Multi-Tenancy and How Does It Impact SaaS Applications?

Multi-tenancy describes a software architecture where a single physical installation can provide multiple logical installations. Each logical installation serves a dedicated userbase dubbed a “tenant.”. Multi-tenancy is most frequently seen in the context of cloud SaaS services. Organizations that sign-up to a service become “tenants.” The tenant encapsulates organization-level configuration...
ComputersLumia UK

General availability: Application Insights Node.js pre-aggregated standard metrics

Azure Monitor Application Insights is a cloud native application monitoring offering which enables you to observe failures, bottlenecks, and usage patterns to resolve incidents faster and reduce downtime. With the release of v2.0.0, the Node.js SDK now calculates standard metrics and dimensions independently of sampling decisions, allowing you to optimize...
EconomyPosted by
@growwithco

Project Management Methodologies

With thousands of project methodologies to choose from, here are five popular options and their pros and cons. There are more than 8,000 different project management methodologies from which to choose. With this many frameworks at your disposal, how can you discern the best approach for your organization?. Different project...
SoftwareHPCwire

Building highly-available HPC infrastructure on AWS

Reminder: You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube channel, and following the AWS HPC Blog channel. AWS provides an elastic and scalable cloud infrastructure to run high performance computing (HPC) applications, so that engineers are no longer constrained to run their job on limited on-premises infrastructure. Every workload can run on its own on-demand cluster with access to virtually unlimited capacity, and HPC can focus in services to meet the infrastructure requirements of almost any application. This mitigates the risk of on-premises HPC clusters becoming obsolete or poorly utilized as needs change over time.
Industrydenversun.com

Fault Detection & Monitoring Based Analytics Market Growth to 2028 - Edison Energy, EnerNOC, Engie Insight, EcoVox, Gridpoint, Honeywell International

A Broad Analysis of Fault Detection & Monitoring Based Analytics Industry on Market Strategy, Industry Share, Growth Factors, Revenues, Opportunities, Demand and Forecasts. It helps identify each of the significant hurdles to growth, apart from identifying trends in the Fault Detection & Monitoring Based Analytics market. Fault detection & monitoring...
ComputersHEXUS.net

Choose NVIDIA-Certified Servers to Discover the Future of HPC

GIGABYTE has a growing list of high-performing GPU servers that are part of the NVIDIA-Certified Systems™. This program validates the ideal configuration of servers that utilizes hardware such as GPUs based on NVIDIA Ampere architecture and NVIDIA SmartNICs, including the newly added NVIDIA BlueField®-2 DPUs. The performance, capabilities, scalability, and security of these products are certified so customers can deploy hardware and software solutions to fulfill the demands of AI and data science applications in modern data centers.
Marketsinternationalinvestment.net

KBI Global Investors reveals impact scores for its strategies using novel methodology

Dublin-based manager boutique asset manager KBI Global Investors confirmed the Impact measurement scores for its investment strategies for 2020 - the percentage of revenues contributing to the achievement of the UN Sustainable Development Goals. Its Revenue Alignment SDG Scores (RASS) are based on portfolio holdings at the end of December...
Marketsnysenasdaqlive.com

Genomics Market: Reveune Growth and Applications Insights

Prophecy Market Insights recently presented Genomics market report which provides reliable and sincere insights related to the various segments and sub-segments of the market. The market study throws light on the various factors that are projected to impact the overall dynamics of the Genomics market over the forecast period (2019-2029).
Sciencearxiv.org

Using HPC infrastructures for deep learning applications in fusion research

In the fusion community, the use of high performance computing (HPC) has been mostly dominated by heavy-duty plasma simulations, such as those based on particle-in-cell and gyrokinetic codes. However, there has been a growing interest in applying machine learning for knowledge discovery on top of large amounts of experimental data collected from fusion devices. In particular, deep learning models are especially hungry for accelerated hardware, such as graphics processing units (GPUs), and it is becoming more common to find those models competing for the same resources that are used by simulation codes, which can be either CPU- or GPU-bound. In this paper, we give examples of deep learning models -- such as convolutional neural networks, recurrent neural networks, and variational autoencoders -- that can be used for a variety of tasks, including image processing, disruption prediction, and anomaly detection on diagnostics data. In this context, we discuss how deep learning can go from using a single GPU on a single node to using multiple GPUs across multiple nodes in a large-scale HPC infrastructure.
PhysicsPhys.org

Technique characterizes phases of superfluids changing to supersolids and back

A team of researchers from the Institute for Quantum Optics and Quantum Information and the University of Innsbruck, has developed a technique for characterizing the phases a superfluid undergoes as it changes to a supersolid and then back again. The group has written a paper describing their technique and have uploaded it to the arXiv preprint server.
Computersarxiv.org

Characterizing the Gap Between Actor-Critic and Policy Gradient

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Although it is understood that AC methods are closely related to policy gradient (PG), their precise connection has not been fully characterized previously. In this paper, we explain the gap between AC and PG methods by identifying the exact adjustment to the AC objective/gradient that recovers the true policy gradient of the cumulative reward objective (PG). Furthermore, by viewing the AC method as a two-player Stackelberg game between the actor and critic, we show that the Stackelberg policy gradient can be recovered as a special case of our more general analysis. Based on these results, we develop practical algorithms, Residual Actor-Critic and Stackelberg Actor-Critic, for estimating the correction between AC and PG and use these to modify the standard AC algorithm. Experiments on popular tabular and continuous environments show the proposed corrections can improve both the sample efficiency and final performance of existing AC methods.
SoftwareHPCwire

ISC Keynote: Glimpse into Microsoft’s View of the Quantum Computing Landscape

Looking for a dose of reality and realistic optimism about quantum computing? Matthias Troyer, Microsoft distinguished scientist, plans to do just that in his ISC2021 keynote in two weeks – Quantum Computing: From Academic Research to Real-world Applications. He notes wryly that classical computers enjoy a roughly billion times advantage (op/s) over quantum systems at the moment. So why is Microsoft betting heavily on quantum computing, you ask?
ComputersHPCwire

Disrupting the HPC balance – Arm-based processors in the cloud

Driven by cloud economics, the AWS-designed Arm-based Graviton2 processor is leading an HPC performance revolution. Reminder: You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube channel, and following the AWS HPC Blog channel. Since the debut of the first ‘Beowulf’ cluster in...
Softwarearxiv.org

Structured DropConnect for Uncertainty Inference in Image Classification

With the complexity of the network structure, uncertainty inference has become an important task to improve the classification accuracy for artificial intelligence systems. For image classification tasks, we propose a structured DropConnect (SDC) framework to model the output of a deep neural network by a Dirichlet distribution. We introduce a DropConnect strategy on weights in the fully connected layers during training. In test, we split the network into several sub-networks, and then model the Dirichlet distribution by match its moments with the mean and variance of the outputs of these sub-networks. The entropy of the estimated Dirichlet distribution is finally utilized for uncertainty inference. In this paper, this framework is implemented on LeNet$5$ and VGG$16$ models for misclassification detection and out-of-distribution detection on MNIST and CIFAR-$10$ datasets. Experimental results show that the performance of the proposed SDC can be comparable to other uncertainty inference methods. Furthermore, the SDC is adapted well to different network structures with certain generalization capabilities and research prospects.
Coding & Programmingarxiv.org

PatchNet: Unsupervised Object Discovery based on Patch Embedding

We demonstrate that frequently appearing objects can be discovered by training randomly sampled patches from a small number of images (100 to 200) by self-supervision. Key to this approach is the pattern space, a latent space of patterns that represents all possible sub-images of the given image data. The distance structure in the pattern space captures the co-occurrence of patterns due to the frequent objects. The pattern space embedding is learned by minimizing the contrastive loss between randomly generated adjacent patches. To prevent the embedding from learning the background, we modulate the contrastive loss by color-based object saliency and background dissimilarity. The learned distance structure serves as object memory, and the frequent objects are simply discovered by clustering the pattern vectors from the random patches sampled for inference. Our image representation based on image patches naturally handles the position and scale invariance property that is crucial to multi-object discovery. The method has been proven surprisingly effective, and successfully applied to finding multiple human faces and bodies from natural images.
Softwarearxiv.org

Cascading Modular Network (CAM-Net) for Multimodal Image Synthesis

Deep generative models such as GANs have driven impressive advances in conditional image synthesis in recent years. A persistent challenge has been to generate diverse versions of output images from the same input image, due to the problem of mode collapse: because only one ground truth output image is given per input image, only one mode of the conditional distribution is modelled. In this paper, we focus on this problem of multimodal conditional image synthesis and build on the recently proposed technique of Implicit Maximum Likelihood Estimation (IMLE). Prior IMLE-based methods required different architectures for different tasks, which limit their applicability, and were lacking in fine details in the generated images. We propose CAM-Net, a unified architecture that can be applied to a broad range of tasks. Additionally, it is capable of generating convincing high frequency details, achieving a reduction of the Frechet Inception Distance (FID) by up to 45.3% compared to the baseline.
Computerslinuxtoday.com

How to Create 100% CPU Load on Linux System

(Other stories by LinuxShellTips) To ensure that your Linux machine is stable and reliable, you need to stress test and benchmark certain key aspects of it, including CPU performance. This helps you foresee how it will respond in real-world situations in which it is subjected to computing demands.
Softwarearxiv.org

Scaling optical computing in synthetic frequency dimension using integrated cavity acousto-optics

Optical computing with integrated photonics brings a pivotal paradigm shift to data-intensive computing technologies. However, the scaling of on-chip photonic architectures using spatially distributed schemes faces the challenge imposed by the fundamental limit of integration density. Synthetic dimensions of light offer the opportunity to extend the length of operand vectors within a single photonic component. Here, we show that large-scale, complex-valued matrix-vector multiplications on synthetic frequency lattices can be performed using an ultra-efficient, silicon-based nanophotonic cavity acousto-optic modulator. By harnessing the resonantly enhanced strong electro-optomechanical coupling, we achieve, in a single such modulator, the full-range phase-coherent frequency conversions across the entire synthetic lattice, which constitute a fully connected linear computing layer. Our demonstrations open up the route towards the experimental realizations of frequency-domain integrated optical computing systems simultaneously featuring very large-scale data processing and small device footprints.