Amazon Supply Chain and Logistics

AWS last mile solution for faster delivery, lower costs, and a better customer experience

In an increasingly digital world, the last mile is often the only physical interface a company has with its consumers, which makes it a great opportunity to differentiate in the market. With the rise of e-commerce, consumer preferences have become increasingly important. Consumers worldwide are demanding new experiences on the last mile, such as same-day delivery, ever shorter time window deliveries, and visibility on shipment status. Leading players, as well as start-ups in the parcel delivery market, have therefore set new standards in their last mile offerings and capabilities. These new experiences raise the bar for the overall last mile delivery market.

Last mile is expensive; poor routing and failed delivery attempts drive last mile costs up. The last mile costs for parcel delivery often account for 50% of the overall fulfillment costs, which includes pickup, line-haul, and sorting. Costs for the last mile more than double in cases of failed delivery; a redelivery attempt is needed in cases where the customer is not at home. Four common challenges drive last mile costs: (1) volatility in customer demand translates to a changing number of delivery parcels. This leads to low parcel drop densities and therefore high last mile costs, especially if a last mile provider works within fixed delivery zones; (2) while existing routing solutions tend to focus on finding the shortest possible route, they often struggle with making the required trade-offs between travel time, costs, and finding the best route to meet all time window constraints; (3) standard routing software often does not allow for rerouting once the delivery vehicle has left the sort center or depot; (4) many companies have not yet automated the last mile planning process end to end but instead are still using spreadsheets to plan sorting prior to submitting the input to the last mile routing software.

Driven by new consumer preferences and the need to ensure high asset utilization of drivers and vehicles to reduce costs per delivery, companies are increasingly looking to technology to help them optimize the last mile. In this blog post, we introduce AWS Dynamic Delivery Planner (DDP), an AWS last mile routing technology offering that provides last mile operators with faster delivery, improved reliability, lower costs, and greater flexibility.

Dynamic Delivery Planner overview

Inspired by Amazon’s last mile developments, AWS developed a Last Mile Routing offering for AWS customers that brings machine learning (ML) to the last mile. DDP provides AWS customers with best route sequence, delivery time window, and real-time routing. We applied einforcement learning (RL) and graph neural networks (GNNs) to real-life data from the Amazon Last Mile Routing Research Challenge, which is publicly available at Open Data on AWS. The datasets contain package information, destination locations, parcel specifications, customer-preferred time windows, expected service times, and zone identifiers.

DDP is capable of real-time rerouting that handles constantly changing traffic conditions and customer time windows when drivers are already on the move. By developing two online policy improvement methods to satisfy time window constraints, we extended state-of-the-art research on solving the travelling salesperson problem (TSP) with policy gradient–based RL algorithms. DDP learns how to choose the best delivery routes based on historical data or how to decide the right route order within a set delivery schedule. This could involve selecting the most efficient delivery path, sequencing multiple routes into a single delivery attempt, or planning multiple delivery attempts using an optimal combination of orders.

Our technology offering uses Amazon SageMaker, an AWS machine learning service, to train and fine-tune the reinforcement learning model on multi-GPU and multi-instances through Amazon SageMaker’s data parallelism library. The training approach utilizes the reinforcement learning policy dradient method called REINFORCE. The model training mechanics will look similar to the traditional supervised learning paradigm. The solution performs model inference on either Amazon SageMaker real-time endpoint or using Amazon SageMaker batch transform jobs.

The map shows a complex route with 220 stops.

To give an example of DDP, the map above shows a route sequence with 220 drop-off points produced by DDP based on the Amazon Last Mile Routing Research Challenge dataset. Blue dots represent stops, and arcs between them form the travel sequence. Each arc starts green and ends with red. The time to run the model is about 10 seconds if we impose time window constraints and less than 1 second without considering time windows.

Unique features of Dynamic Delivery Planner

TSP is one of the most widely studied combinatorial optimization problems with many applications in supply chain management (e.g., routing, scheduling). It involves finding the shortest possible route that visits a given set of cities and returns to the starting city, visiting each city exactly once. The problem is NP-hard and has been widely studied, leading to the development of several approximation algorithms that can be used to find near-optimal solutions. DDP further extends the conventional TSP solution by achieving three additional goals: (1) to produce a route sequence that is both cost optimal and driver friendly in terms of driving experience; (2) to perform time window constraints calculation efficiently, and (3) to enable district optimization for reducing total travel time and improve delivery efficiency. In order to achieve these goals, we introduced the following unique features into DDP.

Cost-optimal routing and instant re-routing

Our evaluation on both simulated and actual Amazon routes shows that DDP is able to strike a balanced trade-off amongst optimality, feasibility, and time to run the model when the number of drop-off points is large (e.g., N >= 150).

Optimality– Refers to the travel time en route. DDP achieves better optimality compared to a conventional TSP solver for TSP-time window (TW) problems given a fixed amount of execution time.

Feasibility – Refers to the route sequence as the percentage of nodes that have met their time window constraint. DDP performs better in meeting time window constraints compared to conventional TSP solvers for TSP-TW problems.

Time – Refers to the execution time to run the application. For example, DDP can solve a fully-constrained, 100-node TSP-TW problem (i.e., with 99 narrow time windows) in less than 1 second (on a single Tesla T4 GPU on an AWS G4dn instance) while maintaining a feasibility of around 26 percent and an optimality gap of around 12 percent. On the same TSP problem instances, the baseline solver (a conventional TSP-TW solver) achieves a 3 percent feasibility and an optimality gap of around 604 percent when given 60 seconds of execution time budget.

We performed 160 tests on DDP using TSP instances of varying input. We used an existing open-source TSP solver as the baseline method. Our evaluation of results shows that as the number of stops increases, the baseline solver has difficulty maintaining a usable level of optimality gap and feasibility. In contrast, both variants of DDP (with two sets of different configurations: DDP-greedy and DDP-rollout-pi) perform reasonably well on all three evaluation criteria. Particularly in the case of 150 nodes, DDP-greedy is not only the fastest (as on all four lengths) but also has the highest feasibility, on par with the DDP-rollout-pi algorithm, but with better stability (i.e., a slightly smaller error bar). DDP-rollout-pi clearly has an advantage in optimality. This is because the value function (during rollout) was approximated by repeatedly evaluating a policy network that was trained solely for the purpose of reducing the optimality gap.

Data-driven and continuous learning

Equally important and to the best of our knowledge, few TSP solvers on the market are able to produce a driver-friendly route sequence while at the same time keeping the delivery cost low. While decades of research and developments in route optimization have made remarkable progress in solving TSP and vehicle routing problems, it is still challenging to directly apply these route optimization methods for last mile delivery. This is in part because the objectives (such as cost, time, distance) that these methods strive to optimize are not necessarily in line with preference and behavior of delivery drivers, who often face unexpected challenges while executing route plans on the ground. DDP tackles this challenge by encoding driver know-how and learning a sequence model from historical routes, which already imply drivers’ preferences. The learned model combined with DDP route optimization engine produces route sequences that narrow the gap between cost-optimal TSP solutions and the real-life delivery problems. Moreover, since the DDP solution is learning-based and data-driven, it does not require any hand-crafted heuristics or their maintenance. Therefore, DDP has the sustainable potential to solve large-scale route sequence planning problem in a continuously changing operating environment.

Illustration shows two alternative routes for the same area.

The maps show two example sequences generated for the same Amazon route with 100 drop points for parcel deliveries without time window constraints. DDP calculated the sequence on the left and a conventional TSP solver generated the one on the right. While the cost for both routes are similar and close to optimal, the one on the right presents several narrow and sharp U-turns and leaps. This is unappealing to drivers who are not ready to reverse course and such turns may require additional driving time to find suitable places to reverse. The sequence on the left, in comparison, forms a convex hull that avoids those U-turns altogether. For example, the first stop that enters the delivery cluster from the southeast side is far away from the last stop that leaves the cluster in the southwest. The DDP sequence on the left improves driving experience.

Efficient and driver-friendly district optimization

DDP is able to perform optimization and refinement on existing delivery district plans and the package allocation scheme. For example, DDP is able to evenly distribute packages to different districts and drivers based on projected delivery demand. It can further reduce total travel time across multiple districts for a given route planner. Moreover, DDP enables last mile planners to quantitatively assess the trade-off between delivery efficiency and driver workload.

Easy integration with a company’s existing transport management system and order management system

DDP integrates easily via APIs with a transport management system (TMS). This means that once the company receives a shipment order, DDP is triggered and starts the batching and routing process. Next, the output of the route planning is uploaded into the TMS for dispatching. The AWS offering uses the latest technology on RL for the routing part. In essence, it helps to automate and optimize a company’s last mile planning.

DDP improves Aramex Australia operations

AWS developed the DDP solution jointly with Aramex Australia. Aramex Australia operates a last mile courier service under a franchise model. Its logistics network includes 29 regional franchises and over 900 franchise partners across the country. Each regional franchise operates a delivery station over a territory. Currently couriers are assigned packages based on predefined geographical regions using postcode areas. However, this allocation scheme may not optimally distribute packages, resulting in a poor driver experience in some cases. Moreover, they also discovered that boundaries based on postcodes between couriers do not necessarily lead to the most efficient delivery practices.

Aramex Australia has explored DDP to improve their driver experience and delivery efficiency. As an outcome, Aramex Australia observed a 29 percent improvement in delivery efficiency[1] during their simulation studies. Moreover, DDP provides a quantitative tool for the operators to determine a sweet spot where the delivery efficiency can be further improved with a reasonable level of additional workload (travel time). Aramex Australia Chief Information Officer Ruby Wolff shares his feedback: “The way the solution is able to balance travel time with delivery efficiency is making a big difference for our drivers.


In this post, we discussed the importance of the last mile in an increasing digital world, not only from a cost but even more from a customer experience perspective. Inspired by Amazon, AWS has built a technology offering for last mile. DDP outperforms traditional TSP solvers in terms of routing, meeting time window delivery requirements, and instant rerouting if one delivery route has at least 100 drop points. As we are eager to learn and discuss your approaches to last mile routing, please leave your opinions in the comments. If you also wish to explore how AWS could support you in cutting through the complexity in your supply chain, please reach out to your account manager to set up a discovery workshop with the Supply Chain, Transportation, and Logistics experts from AWS. We are happy to share our DDP open-source code as well.


We thank members of the AWS DDP team. Eden Duthie launched the DDP project. Yin Song contributed significantly to model design and distributed training for route optimization. Baichuan Sun developed the AutoML model for district optimization. Verdi March benchmarked model performance and Josiah Davis developed the initial data preprocessing pipeline. We thank colleagues from Aramex Australia—Renee Qian, Michael Krason, Tom Parkinson, and Ruby Wolff—for providing us with valuable insight in last mile planning.

  1. Delivery efficiency is measured as the average number of packages delivered by a driver per hour. This means if a driver can deliver 10 packages per hour previously, after using DDP, each driver can deliver 13 packages per hour without additional working hours.
Chen Wu

Chen Wu

Dr. Chen Wu is a Principal Applied Scientist at AWS based in Western Australia. Chen works directly with customers to solve their data science and machine learning problems in various industries such as logistics, mining, automotive, transportation, pharmacology, digital design, and manufacturing. Prior to joining AWS, Chen worked in the field of astronomy and high-performance computing.

Manuel Baeuml

Manuel Baeuml

Dr. Manuel Baeuml is the Head of AWS ProServe Supply Chain & Logistics in Asia Pacific and Japan. Manuel and his team are responsible for sharing leading digital supply chain practices and for solving our customers’ most pressing supply chain problems using AWS cloud technology and offerings. Over the last 15 years, he has had the privilege to work with industry leaders in Asia Pacific and Europe in mining, energy, retail/CPG, as well as transportation and logistics. Manuel is based in Singapore.