Cancel
CreatorsPublishersAdvertisers
View more in
Industry

Learning to Optimize Industry-Scale Dynamic Pickup and Delivery Problems

By Xijun Li, Weilin Luo, Mingxuan Yuan, Jun Wang, Jiawen Lu, Jie Wang, Jinhu Lu, Jia Zeng
arxiv.org
 22 days ago

The Dynamic Pickup and Delivery Problem (DPDP) is aimed at dynamically scheduling vehicles among multiple sites in order to minimize the cost when delivery orders are not known a priori. Although DPDP plays an important role in modern logistics and supply chain management, state-of-the-art DPDP algorithms are still limited on their solution quality and efficiency. In practice, they fail to provide a scalable solution as the numbers of vehicles and sites become large. In this paper, we propose a data-driven approach, Spatial-Temporal Aided Double Deep Graph Network (ST-DDGN), to solve industry-scale DPDP. In our method, the delivery demands are first forecast using spatial-temporal prediction method, which guides the neural network to perceive spatial-temporal distribution of delivery demand when dispatching vehicles. Besides, the relationships of individuals such as vehicles are modelled by establishing a graph-based value function. ST-DDGN incorporates attention-based graph embedding with Double DQN (DDQN). As such, it can make the inference across vehicles more efficiently compared with traditional methods. Our method is entirely data driven and thus adaptive, i.e., the relational representation of adjacent vehicles can be learned and corrected by ST-DDGN from data periodically. We have conducted extensive experiments over real-world data to evaluate our solution. The results show that ST-DDGN reduces 11.27% number of the used vehicles and decreases 13.12% total transportation cost on average over the strong baselines, including the heuristic algorithm deployed in our UAT (User Acceptance Test) environment and a variety of vanilla DRL methods. We are due to fully deploy our solution into our online logistics system and it is estimated that millions of USD logistics cost can be saved per year.

arxiv.org
IN THIS ARTICLE
#Deep Learning#Ddqn#Uat#Drl#Usd#Artificial Intelligence
YOU MAY ALSO LIKE
News Break
Economy
News Break
Industry
News Break
Cars
Related
Industryautomationworld.com

Supply Chain Optimization and the Future of Industry

After March’s Suez Canal blockage and last year’s bout of empty store shelves during the fallout from the COVID-19 pandemic, the integrity of supply chains is on everyone’s mind. Yet even before these recent debacles, similar disruptions had begun to creep into view. In many ways, current trends merely represent the result of long-gestating issues.
Softwarecentricconsulting.com

What is Modern Software Delivery? Software Delivery at Speed and Scale

In the world of software development, speed is everything. If you don’t deliver fast enough, you can’t compete. The persistent question is, how can my organization truly achieve software delivery at speed and scale?. Businesses must keep pace with their competitive markets to remain viable. Every organization faces challenges from...
Engineeringarxiv.org

Optimized Power Control Design for Over-the-Air Federated Edge Learning

This paper investigates the transmission power control in over-the-air federated edge learning (Air-FEEL) system. Different from conventional power control designs (e.g., to minimize the individual mean squared error (MSE) of the over-the-air aggregation at each round), we consider a new power control design aiming at directly maximizing the convergence speed. Towards this end, we first analyze the convergence behavior of Air-FEEL (in terms of the optimality gap) subject to aggregation errors at different communication rounds. It is revealed that if the aggregation estimates are unbiased, then the training algorithm would converge exactly to the optimal point with mild conditions; while if they are biased, then the algorithm would converge with an error floor determined by the accumulated estimate bias over communication rounds. Next, building upon the convergence results, we optimize the power control to directly minimize the derived optimality gaps under both biased and unbiased aggregations, subject to a set of average and maximum power constraints at individual edge devices. We transform both problems into convex forms, and obtain their structured optimal solutions, both appearing in a form of regularized channel inversion, by using the Lagrangian duality method. Finally, numerical results show that the proposed power control policies achieve significantly faster convergence for Air-FEEL, as compared with benchmark policies with fixed power transmission or conventional MSE minimization.
Electronicstechxplore.com

Creating 'digital twins' at scale to improve drone deliveries

Picture this: A delivery drone suffers some minor wing damage on its flight. Should it land immediately, carry on as usual, or reroute to a new destination? A digital twin, a computer model of the drone that has been flying the same route and now experiences the same damage in its virtual world, can help make the call.
arxiv.org

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller. Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or...
Mathematicsarxiv.org

The quantum annealing gap and dynamical quantum phase transitions in complex optimization problems

Quenching and annealing are extreme opposites in the time evolution of a quantum system: Annealing explores equilibrium phases of a Hamiltonian with slowly changing parameters and can be exploited as a tool for solving complex optimization problems. In contrast, quenches are sudden changes of the Hamiltonian, producing a non-equilibrium situation in which dynamical phase transitions can occur. Here, we investigate the relation between the two cases. Specifically, we show that the minimum of the annealing gap, which is an important bottleneck of quantum annealing algorithms, can be revealed from the order parameter which describes the dynamical quantum phase transition after the quench. Combined with statistical tools including the training of a neural network, the relation between quench and annealing dynamics can be exploited to reproduce the full functional behavior of the annealing gap from the quench data. We show that the partial or full knowledge about the annealing gap which can be gained in this way can be used to design optimized quantum annealing protocols with a practical time-to-solution benefit. Our results are obtained from simulating random Ising Hamiltonians, representing hard-to-solve instances of the exact cover problem.
Computersarxiv.org

Cardinality-constrained optimization problems in general position and beyond

We study cardinality-constrained optimization problems (CCOP) in general position, i. e. those optimization-related properties that are fulfilled for a dense and open subset of their defining functions. We show that the well-known cardinality-constrained linear independence constraint qualification (CC-LICQ) is generic in this sense. For M-stationary points we define nondegeneracy and show that it is a generic property too. In particular, the sparsity constraint turns out to be active at all minimizers of a generic CCOP. Moreover, we describe the global structure of CCOP in the sense of Morse theory, emphasizing the strength of the generic approach. Here, we prove that multiple cells need to be attached, each of dimension coinciding with the proposed M-index of nondegenerate M-stationary points. Beyond this generic viewpoint, we study singularities of CCOP. For that, the relation between nondegeneracy and strong stability in the sense of Kojima (1980) is examined. We show that nondegeneracy implies the latter, while the reverse implication is in general not true. To fill the gap, we fully characterize the strong stability of M-stationary points under CC-LICQ by first- and second-order information of CCOP defining functions. Finally, we compare nondegeneracy and strong stability of M-stationary points with second-order sufficient conditions recently introduced in the literature.
Beauty & Fashiontrendynews9.com

Skills to Learn in the Cosmetic Packaging Suppliers Industry

The cosmetic industry is a huge one. There are just too many brands present producing makeup. It makes it tough on consumers also about which one to try out. There are established brands, and new ones have to stand out in front of these. They have to produce good quality makeup for the purpose. Cosmetic packaging suppliers can help them create attractive packaging that can look wonderful in a store and attract consumers. The box can be prominent in front of the competition.
Computersarxiv.org

Extracting Global Dynamics of Loss Landscape in Deep Learning Models

Deep learning models evolve through training to learn the manifold in which the data exists to satisfy an objective. It is well known that evolution leads to different final states which produce inconsistent predictions of the same test data points. This calls for techniques to be able to empirically quantify the difference in the trajectories and highlight problematic regions. While much focus is placed on discovering what models learn, the question of how a model learns is less studied beyond theoretical landscape characterizations and local geometric approximations near optimal conditions. Here, we present a toolkit for the Dynamical Organization Of Deep Learning Loss Landscapes, or DOODL3. DOODL3 formulates the training of neural networks as a dynamical system, analyzes the learning process, and presents an interpretable global view of trajectories in the loss landscape. Our approach uses the coarseness of topology to capture the granularity of geometry to mitigate against states of instability or elongated training. Overall, our analysis presents an empirical framework to extract the global dynamics of a model and to use that information to guide the training of neural networks.
Mathematicsarxiv.org

Variable metric backward-forward dynamical systems for monotone inclusion problems

This paper investigates first-order variable metric backward forward dynamical systems associated with monotone inclusion and convex minimization problems in real Hilbert space. The operators are chosen so that the backward-forward dynamical system is closely related to the forward-backward dynamical system and has the same computational complexity. We show existence, uniqueness, and weak asymptotic convergence of the generated trajectories and strong convergence if one of the operators is uniformly monotone. We also establish that an equilibrium point of the trajectory is globally exponentially stable and monotone attractor. As a particular case, we explore similar perspectives of the trajectories generated by a dynamical system related to the minimization of the sum of a nonsmooth convex and a smooth convex function. Numerical examples are given to illustrate the convergence of trajectories.
Softwarethe-next-tech.com

Machine Learning is set to change these 5 Industries

Machine learning is enabling a smooth shift in this COVID-19 struck world. Machine studying is among the most used technologies within this creation. It’s diverse capacities that could transform companies across businesses for the greater. From being regarded as a market technology, machine learning is currently seeing an increased adoption inside firms in most industries.
Mathematicsarxiv.org

Learning the optimal regularizer for inverse problems

In this work, we consider the linear inverse problem $y=Ax+\epsilon$, where $A\colon X\to Y$ is a known linear operator between the separable Hilbert spaces $X$ and $Y$, $x$ is a random variable in $X$ and $\epsilon$ is a zero-mean random process in $Y$. This setting covers several inverse problems in imaging including denoising, deblurring, and X-ray tomography. Within the classical framework of regularization, we focus on the case where the regularization functional is not given a priori but learned from data. Our first result is a characterization of the optimal generalized Tikhonov regularizer, with respect to the mean squared error. We find that it is completely independent of the forward operator $A$ and depends only on the mean and covariance of $x$. Then, we consider the problem of learning the regularizer from a finite training set in two different frameworks: one supervised, based on samples of both $x$ and $y$, and one unsupervised, based only on samples of $x$. In both cases, we prove generalization bounds, under some weak assumptions on the distribution of $x$ and $\epsilon$, including the case of sub-Gaussian variables. Our bounds hold in infinite-dimensional spaces, thereby showing that finer and finer discretizations do not make this learning problem harder. The results are validated through numerical simulations.
Sciencearxiv.org

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg. The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.
Coding & Programmingarxiv.org

Neural Optimization Kernel: Towards Robust Deep Learning

Recent studies show a close connection between neural networks (NN) and kernel methods. However, most of these analyses (e.g., NTK) focus on the influence of (infinite) width instead of the depth of NN models. There remains a gap between theory and practical network designs that benefit from the depth. This paper first proposes a novel kernel family named Neural Optimization Kernel (NOK). Our kernel is defined as the inner product between two $T$-step updated functionals in RKHS w.r.t. a regularized optimization problem. Theoretically, we proved the monotonic descent property of our update rule for both convex and non-convex problems, and a $O(1/T)$ convergence rate of our updates for convex problems. Moreover, we propose a data-dependent structured approximation of our NOK, which builds the connection between training deep NNs and kernel methods associated with NOK. The resultant computational graph is a ResNet-type finite width NN. Our structured approximation preserved the monotonic descent property and $O(1/T)$ convergence rate. Namely, a $T$-layer NN performs $T$-step monotonic descent updates. Notably, we show our $T$-layered structured NN with ReLU maintains a $O(1/T)$ convergence rate w.r.t. a convex regularized problem, which explains the success of ReLU on training deep NN from a NN architecture optimization perspective. For the unsupervised learning and the shared parameter case, we show the equivalence of training structured NN with GD and performing functional gradient descent in RKHS associated with a fixed (data-dependent) NOK at an infinity-width regime. For finite NOKs, we prove generalization bounds. Remarkably, we show that overparameterized deep NN (NOK) can increase the expressive power to reduce empirical risk and reduce the generalization bound at the same time. Extensive experiments verify the robustness of our structured NOK blocks.
Softwareautomotiveworld.com

Is explainable machine learning the key to realising Industry 4.0?

Artificial intelligence (AI) is increasingly making its way onto the factory floor, where it can be harnessed to optimise numerous aspects of manufacturing operations. It can improve planning and sourcing strategies in support of lean manufacturing techniques, as well as uncover factors impacting product quality and production costs. It can also tackle increasingly urgent issues around plant emissions and climate impact.
Coding & Programmingarxiv.org

A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems

Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks~(GANs). However, most of the recent efforts for solving them are limited to special regimes such as convex-concave games. Further, it is customarily assumed that the underlying optimization problem is solved either by a single machine or in the case of multiple machines connected in centralized fashion, wherein each one communicates with a central node. The latter approach becomes challenging, when the underlying communications network has low bandwidth. In addition, privacy considerations may dictate that certain nodes can communicate with a subset of other nodes. Hence, it is of interest to develop methods that solve min-max games in a decentralized manner. To that end, we develop a decentralized adaptive momentum (ADAM)-type algorithm for solving min-max optimization problem under the condition that the objective function satisfies a Minty Variational Inequality condition, which is a generalization to convex-concave case. The proposed method overcomes shortcomings of recent non-adaptive gradient-based decentralized algorithms for min-max optimization problems that do not perform well in practice and require careful tuning. In this paper, we obtain non-asymptotic rates of convergence of the proposed algorithm (coined DADAM$^3$) for finding a (stochastic) first-order Nash equilibrium point and subsequently evaluate its performance on training GANs. The extensive empirical evaluation shows that DADAM$^3$ outperforms recently developed methods, including decentralized optimistic stochastic gradient for solving such min-max problems.
Computersarxiv.org

Hyperspace Neighbor Penetration Approach to Dynamic Programming for Model-Based Reinforcement Learning Problems with Slowly Changing Variables in A Continuous State Space

Slowly changing variables in a continuous state space constitute an important category of reinforcement learning and see its application in many domains, such as modeling a climate control system where temperature, humidity, etc. change slowly over time. However, this subject is less addressed in recent studies. Classical methods with certain variants, such as Dynamic Programming with Tile Coding which discretizes the state space, fail to handle slowly changing variables because those methods cannot capture the tiny changes in each transition step, as it is computationally expensive or impossible to establish an extremely granular grid system. In this paper, we introduce a Hyperspace Neighbor Penetration (HNP) approach that solves the problem. HNP captures in each transition step the state's partial "penetration" into its neighboring hyper-tiles in the gridded hyperspace, thus does not require the transition to be inter-tile in order for the change to be captured. Therefore, HNP allows for a very coarse grid system, which makes the computation feasible. HNP assumes near linearity of the transition function in a local space, which is commonly satisfied. In summary, HNP can be orders of magnitude more efficient than classical method in handling slowly changing variables in reinforcement learning. We have made an industrial implementation of NHP with a great success.
Computersarxiv.org

Stateless Reinforcement Learning for Multi-Agent Systems: the Case of Spectrum Allocation in Dynamic Channel Bonding WLANs

Spectrum allocation in the form of primary channel and bandwidth selection is a key factor for dynamic channel bonding (DCB) wireless local area networks (WLANs). To cope with varying environments, where networks change their configurations on their own, the wireless community is looking towards solutions aided by machine learning (ML), and especially reinforcement learning (RL) given its trial-and-error approach. However, strong assumptions are normally made to let complex RL models converge to near-optimal solutions. Our goal with this paper is two-fold: justify in a comprehensible way why RL should be the approach for wireless networks problems like decentralized spectrum allocation, and call into question whether the use of complex RL algorithms helps the quest of rapid learning in realistic scenarios. We derive that stateless RL in the form of lightweight multi-armed-bandits (MABs) is an efficient solution for rapid adaptation avoiding the definition of extensive or meaningless RL states.
Sciencearxiv.org

Complete Realization of Energy Landscape and Non-equilibrium Trapping Dynamics in Spin Glass and Optimization Problem

Energy landscapes are high-dimensional surfaces representing the dependence of system energy on variable configurations, which determine crucially the system's emergent behavior but are difficult to be analyzed due to their high-dimensional nature. In this article, we introduce an approach to reveal the complete energy landscapes of small spin glasses and Boolean satisfiability problems, which also unravels their non-equilibrium dynamics at an arbitrary temperature for an arbitrarily long time. In contrary to our common belief, our results show that it can be less likely to identify the ground states when temperature decreases, due to trapping in individual local minima, which ceases at different time, leading to multiple abrupt jumps with time in the ground-state probability. Simulations agree well with theoretical predictions on these remarkable phenomena. Finally, for large systems, we introduce a variant approach to extract partially the energy landscapes and observe both analytically and in simulations similar phenomena. This work introduces new methodology to unravel the non-equilibrium dynamics of glassy systems, and provides us with a clear, complete and new physical picture on their long-time behaviors inaccessible by modern numerics.