Task allocation for decentralized training in heterogeneous environment

By Yongyue Chao, Mingxue Liao, Jiaxin Gao
 8 days ago

The demand for large-scale deep learning is increasing, and distributed training is the current mainstream solution. Ring AllReduce is widely used as a data parallel decentralized algorithm. However, in a heterogeneous environment, each worker calculates the same amount of data, so that there is a lot of waiting time loss among...

Related
arxiv.org

Detecting Quality Problems in Data Models by Clustering Heterogeneous Data Values

Data is of high quality if it is fit for its intended use. The quality of data is influenced by the underlying data model and its quality. One major quality problem is the heterogeneity of data as quality aspects such as understandability and interoperability are impaired. This heterogeneity may be caused by quality problems in the data model. Data heterogeneity can occur in particular when the information given is not structured enough and just captured in data values, often due to missing or non-suitable structure in the underlying data model. We propose a bottom-up approach to detecting quality problems in data models that manifest in heterogeneous data values. It supports an explorative analysis of the existing data and can be configured by domain experts according to their domain knowledge. All values of a selected data field are clustered by syntactic similarity. Thereby an overview of the data values' diversity in syntax is provided. It shall help domain experts to understand how the data model is used in practice and to derive potential quality problems of the data model. We outline a proof-of-concept implementation and evaluate our approach using cultural heritage data.
COMPUTERS
arxiv.org

Multi-Task Neural Processes

Neural processes have recently emerged as a class of powerful neural latent variable models that combine the strengths of neural networks and stochastic processes. As they can encode contextual data in the network's function space, they offer a new way to model task relatedness in multi-task learning. To study its potential, we develop multi-task neural processes, a new variant of neural processes for multi-task learning. In particular, we propose to explore transferable knowledge from related tasks in the function space to provide inductive bias for improving each individual task. To do so, we derive the function priors in a hierarchical Bayesian inference framework, which enables each task to incorporate the shared knowledge provided by related tasks into its context of the prediction function. Our multi-task neural processes methodologically expand the scope of vanilla neural processes and provide a new way of exploring task relatedness in function spaces for multi-task learning. The proposed multi-task neural processes are capable of learning multiple tasks with limited labeled data and in the presence of domain shift. We perform extensive experimental evaluations on several benchmarks for the multi-task regression and classification tasks. The results demonstrate the effectiveness of multi-task neural processes in transferring useful knowledge among tasks for multi-task learning and superior performance in multi-task classification and brain image segmentation.
COMPUTERS
arxiv.org

HADFL: Heterogeneity-aware Decentralized Federated Learning Framework

Federated learning (FL) supports training models on geographically distributed devices. However, traditional FL systems adopt a centralized synchronous strategy, putting high communication pressure and model generalization challenge. Existing optimizations on FL either fail to speedup training on heterogeneous devices or suffer from poor communication efficiency. In this paper, we propose HADFL, a framework that supports decentralized asynchronous training on heterogeneous devices. The devices train model locally with heterogeneity-aware local steps using local data. In each aggregation cycle, they are selected based on probability to perform model synchronization and aggregation. Compared with the traditional FL system, HADFL can relieve the central server's communication pressure, efficiently utilize heterogeneous computing power, and can achieve a maximum speedup of 3.15x than decentralized-FedAvg and 4.68x than Pytorch distributed training scheme, respectively, with almost no loss of convergence accuracy.
CODING & PROGRAMMING
arxiv.org

Differentially Private Federated Learning on Heterogeneous Data

Federated Learning (FL) is a paradigm for large-scale distributed learning which faces two key challenges: (i) efficient training from highly heterogeneous user data, and (ii) protecting the privacy of participating users. In this work, we propose a novel FL approach (DP-SCAFFOLD) to tackle these two challenges together by incorporating Differential Privacy (DP) constraints into the popular SCAFFOLD algorithm. We focus on the challenging setting where users communicate with a ''honest-but-curious'' server without any trusted intermediary, which requires to ensure privacy not only towards a third-party with access to the final model but also towards the server who observes all user communications. Using advanced results from DP theory, we establish the convergence of our algorithm for convex and non-convex objectives. Our analysis clearly highlights the privacy-utility trade-off under data heterogeneity, and demonstrates the superiority of DP-SCAFFOLD over the state-of-the-art algorithm DP-FedAvg when the number of local updates and the level of heterogeneity grow. Our numerical results confirm our analysis and show that DP-SCAFFOLD provides significant gains in practice.
CODING & PROGRAMMING
IN THIS ARTICLE
#Cluster Computing#Allreduce#Parallel
CoinTelegraph

NnsDAO, a globe-spanning network of decentralized autonomous organizations

The concept of decentralized autonomous organizations (DAOs) is perhaps the best management strategy that blockchain technology has offered yet: a completely decentralized setup where all qualifying stakeholders have a say in the operation of an organization. This means that there is no top management that controls the platform or organization, no board rooms, no board members and certainly no development team in charge. The power is distributed to the people.
MARKETS
theblockcrypto.com

Kava Network: Powering a Decentralized Future

In our last article on Kava, we covered its origins and explored the suite of high-yield DeFi protocols that make up the Kava Platform. In the second article of our three-part series, we dive deeper into the Kava network and take a look at the ecosystem of best-in-class DeFi, NFT, and GameFi services that are being built on top of it.
COMPUTERS
arxiv.org

Extending the coefficient of variation for measuring heterogeneity following a meta-regression

Meta-regression is often used to form hypotheses about what is associated with heterogeneity in a meta-analysis and to estimate the extent to which effects can vary between cohorts and other distinguishing factors. However, study-level variables, called moderators, that are available and used in the meta-regression analysis will rarely explain all of the heterogeneity. Therefore, measuring and trying to understand residual heterogeneity is still important in a meta-regression, although it is not clear how some heterogeneity measures should be used in the meta-regression context. The coefficient of variation, and its variants, are useful measures of relative heterogeneity. We consider these measures in the context of meta-regression which allows researchers to investigate heterogeneity at different levels of the moderator and also average relative heterogeneity overall. We also provide CIs for the measures and our simulation studies show that these intervals have good coverage properties. We recommend that these measures and corresponding intervals could provide useful insights into moderators that may be contributing to the presence of heterogeneity in a meta-analysis and lead to a better understanding of estimated mean effects.
SCIENCE
arxiv.org

A Survey on Task Assignment in Crowdsourcing

Quality improvement methods are essential to gathering high-quality crowdsourced data, both for research and industry applications. A popular and broadly applicable method is task assignment that dynamically adjusts crowd workflow parameters. In this survey, we review task assignment methods that address: heterogeneous task assignment, question assignment, and plurality problems in crowdsourcing. We discuss and contrast how these methods estimate worker performance, and highlight potential challenges in their implementation. Finally, we discuss future research directions for task assignment methods, and how crowdsourcing platforms and other stakeholders can benefit from them.
TECHNOLOGY
arxiv.org

Submodular Optimization for Coupled Task Allocation and Intermittent Deployment Problems

In this paper, we demonstrate a formulation for optimizing coupled submodular maximization problems with provable sub-optimality bounds. In robotics applications, it is quite common that optimization problems are coupled with one another and therefore cannot be solved independently. Specifically, we consider two problems coupled if the outcome of the first problem affects the solution of a second problem that operates over a longer time scale. For example, in our motivating problem of environmental monitoring, we posit that multi-robot task allocation will potentially impact environmental dynamics and thus influence the quality of future monitoring, here modeled as a multi-robot intermittent deployment problem. The general theoretical approach for solving this type of coupled problem is demonstrated through this motivating example. Specifically, we propose a method for solving coupled problems modeled by submodular set functions with matroid constraints. A greedy algorithm for solving this class of problem is presented, along with sub-optimality guarantees. Finally, practical optimality ratios are shown through Monte Carlo simulations to demonstrate that the proposed algorithm can generate near-optimal solutions with high efficiency.
ENGINEERING
cryptoslate.com

GridZoneDAO: The Decentralized Metaverse Is About To Launch

Disclosure: This is a sponsored post. Readers are encouraged to conduct further research prior to taking any actions. Learn more ›. GridZoneDAO is an approach to create a next-gen art-focused Metaverse on Ethereum with unique digital identities, a 3D VR world, and interactive 3D NFT art. It is a place where users connect, interact, play, create and participate in a decentralized economy with profound societal impact.
COMPUTERS
arxiv.org

A Global Two-stage Algorithm for Non-convex Penalized High-dimensional Linear Regression Problems

By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally challenging. Almost all existing algorithms converge locally, and the proper selection of initial values is crucial. Therefore, in actual operation, they often combine a warm-starting technique to meet the rigid requirement that the initial value must be sufficiently close to the optimal solution of the corresponding problem. In this paper, based on the DC (difference of convex functions) property of MCP and SCAD penalties, we aim to design a global two-stage algorithm for the high-dimensional least squares linear regression problems. A key idea for making the proposed algorithm to be efficient is to use the primal dual active set with continuation (PDASC) method, which is equivalent to the semi-smooth Newton (SSN) method, to solve the corresponding sub-problems. Theoretically, we not only prove the global convergence of the proposed algorithm, but also verify that the generated iterative sequence converges to a d-stationary point. In terms of computational performance, the abundant research of simulation and real data show that the algorithm in this paper is superior to the latest SSN method and the classic coordinate descent (CD) algorithm for solving non-convex penalized high-dimensional linear regression problems.
CODING & PROGRAMMING
arxiv.org

Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless Cellular Networks

Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach to enable future intelligent and autonomous systems that rely on real-time decision-making in complex dynamic environments. Nonetheless, in practical scenarios, CDRL faces many challenges due to the heterogeneity of agents and their learning tasks, different environments, time constraints of the learning, and resource limitations of wireless networks. To address these challenges, in this paper, a novel semantic-aware CDRL method is proposed to enable a group of heterogeneous untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network. To this end, a new heterogeneous federated DRL (HFDRL) algorithm is proposed to select the best subset of semantically relevant DRL agents for collaboration. The proposed approach then jointly optimizes the training loss and wireless bandwidth allocation for the cooperating selected agents in order to train each agent within the time limit of its real-time task. Simulation results show the superior performance of the proposed algorithm compared to state-of-the-art baselines.
COMPUTERS
arxiv.org

A Conservative Finite Element Solver for MHD Kinematics equations: Vector Potential method and Constraint Preconditioning

A new conservative finite element solver for the three-dimensional steady magnetohydrodynamic (MHD) kinematics equations is presented.The solver utilizes magnetic vector potential and current density as solution variables, which are discretized by H(curl)-conforming edge-element and H(div)-conforming face element respectively. As a result, the divergence-free constraints of discrete current density and magnetic induction are both satisfied. Moreover the solutions also preserve the total magnetic helicity. The generated linear algebraic equation is a typical dual saddle-point problem that is ill-conditioned and indefinite. To efficiently solve it, we develop a block preconditioner based on constraint preconditioning framework and devise a preconditioned FGMRES solver. Numerical experiments verify the conservative properties, the convergence rate of the discrete solutions and the robustness of the preconditioner.
MATHEMATICS
arxiv.org

Consensus formation on heterogeneous networks

Reaching consensus (a macroscopic state where almost all the system constituents display the same microscopic state) is a necessity in multiple complex socio-technical and econo-technical systems. In many distributed systems (of which blockchain-based applications are just the last example), the process of consensus formation is crucial not only for the emergence of a leading majority but for the very system to function. Inspired on this application, but with a broader applicability, we build a minimalistic network model of consensus formation for quantifying how central nodes - with respect to their average distance to others - can leverage on their position to obtain competitive advantage in the consensus process. We show that in a wide range of network topologies, the probability of forming a majority can significantly increase depending on the centrality of nodes that initiate the spreading. Further, we study the role that network topology plays on the consensus process: we show that central nodes in scale-free networks can win consensus in the network even if they broadcast states significantly later than peripheral ones.
COMPUTERS
arxiv.org

Hyperfine Interaction in a MoS$_2$ Quantum Dot: Decoherence of a Spin-Valley Qubit

A successful and promising device for the physical implementation of electron spin-valley based qubits is the Transition Metal Dichalcogenide monolayer (TMD-ML) semiconductor quantum dot. The electron spin in TMD-ML semiconductor quantum dots can be isolated and controlled with high accuracy, but it still suffers from decoherence due to the unavoidable coupling with the surrounding environment, such as nuclear spin environments. A common tool to investigate systems like the one considered in this work is the density matrix formalism by presenting an exact master equation for a central spin (spin-qubit) system in a time-dependent and coupled to a nuclear spin bath in terms of hyperfine interaction. The master equation provides a unified description of the dynamics of the central spin. Analyzing this in more detail, we calculate fidelity loss due to the Overhauser field from hyperfine interaction in a wide range number of nuclear spins $\mathcal{N}$.
PHYSICS
zycrypto.com

Hector DAO Introduces Decentralized Stablecoins In Hopes Of Delivering True Decentralization

Hector DAO plans to build a truly decentralized ecosystem by introducing algorithmic decentralized stablecoins. Hector DAO, a fork of OHM built on the Fantom network, wants to change how the current cryptocurrency space operates, hoping to realize Satoshi Nakamoto’s vision for true decentralization. According to the announcement, Hector DAO’s decentralized stablecoins will provide the perfect solution against price fluctuation experienced by stablecoins during volatility. Notably, inflation is also highly influenced by the fiat cryptocurrencies the tokens are pegged on, which are highly prone to inflation.
COMMODITIES & FUTURE
arxiv.org

Lifetime of skyrmions in discrete systems with infinitesimal lattice constant

Topological protection of chiral magnetic structures is investigated by taking a two-dimensional magnetic skyrmion as an example. The skyrmion lifetime is calculated based on harmonic transition state theory for a discrete lattice model using various values of the ratio of the lattice constant and the skyrmion size. Parameters of the system corresponding to exchange, anisotropy and Dzyaloshinsky-Moriya interaction are chosen in such a way as to keep the energy and size of the skyrmion unchanged for small values of the lattice constant, using scaling relations derived from continuous micromagnetic description. The number of magnetic moments included in the calculations reaches more than a million. The results indicate that in the limit of infinitesimal lattice constant, the energy barrier for skyrmion collapse approaches the Belavin-Polyakov lower bound of the energy of a topological soliton in the $\sigma$-model, the entropy contribution to the pre-exponential factor in the Arrhenius rate expression for collapse approaches a constant and the skyrmion lifetime can, for large enough number of spins, correspond to thermally stable skyrmion at room temperature even without magnetic dipole-dipole interaction.
PHYSICS
arxiv.org

The Pareto-Optimal Temporal Aggregation of Energy System Models

The growing share of intermittent renewable energy sources, storage technologies, and the increasing degree of so-called sector coupling necessitates optimization-based energy system models with high temporal and spatial resolutions, which significantly increases their runtimes and limits their maximum sizes. In order to maintain the computational viability of these models for large-scale application cases, temporal aggregation has emerged as a technique for reducing the number of considered time steps by reducing the original time horizon down to fewer, more representative ones. This study presents advanced but generally applicable clustering techniques that allow for ad-hoc improvements of state-of-the-art approaches without requiring profound knowledge of the individual energy system model. These improvements comprise the optimal tradeoff between the number of typical days and inner-daily temporal resolutions, as well as constituting a representation method that can reproduce the value distribution of the original time series. We prove the superiority of these approaches by applying them to two fundamentally different model types, namely a single-node building energy system and a European carbon-neutral energy scenario, and benchmark these against state-of-the-art approaches. This is performed for a variety of temporal resolutions, which leads to many hundreds of model runs. The results show that the proposed improvements on current methods strictly dominate the status quo with respect to Pareto-optimality in terms of runtime and accuracy. Although a speeding up factor of one magnitude could be achieved using traditional aggregation methods within a cost deviation range of two percent, the algorithms proposed herein achieve this accuracy with a runtime speedup by a factor of two orders of magnitude.
ENERGY INDUSTRY
arxiv.org

QuantumCircuitOpt: An Open-source Framework for Provably Optimal Quantum Circuit Design

In recent years, the quantum computing community has seen an explosion of novel methods to implement non-trivial quantum computations on near-term hardware. An important direction of research has been to decompose an arbitrary entangled state, represented as a unitary, into a quantum circuit, that is, a sequence of gates supported by a quantum processor. It has been well known that circuits with longer decompositions and more entangling multi-qubit gates are error-prone for the current noisy, intermediate-scale quantum devices. To this end, there has been a significant interest to develop heuristic-based methods to discover compact circuits. We contribute to this effort by proposing QuantumCircuitOpt (QCOpt), a novel open-source framework which implements mathematical optimization formulations and algorithms for decomposing arbitrary unitary gates into a sequence of hardware-native gates. A core innovation of QCOpt is that it provides optimality guarantees on the quantum circuits that it produces. In particular, we show that QCOpt can find up to 57% reduction in the number of necessary gates on circuits with up to four qubits, and in run times less than a few minutes on commodity computing hardware. We also validate the efficacy of QCOpt as a tool for quantum circuit design in comparison with a naive brute-force enumeration algorithm. We also show how the QCOpt package can be adapted to various built-in types of native gate sets, based on different hardware platforms like those produced by IBM, Rigetti and Google. We hope this package will facilitate further algorithmic exploration for quantum processor designers, as well as quantum physicists.
CODING & PROGRAMMING

