Amazon Supply Chain and Logistics

Maximize efficiency for replenishable deliveries with AWS Last Mile Replenishment Scheduler

Last mile delivery is typically one of the most expensive distribution legs in many industries, such as for e-commerce businesses and postal networks. Numerous stops, even more numerous items to deliver (e.g., Amazon orders), and complex routing to minimize overall distances are common challenges in maximizing operational efficiency.

However, there is another subset of challenges in last mile delivery – maintaining critical levels of replenishable products at the consumption sites. For example, think about a network of gas stations. These need replenishment for different types of gasoline depending on their consumption. Topping up too often and in small quantities is an expensive proposition, while waiting until the last minute when tanks are almost empty is risky because of stock-outs and losing demand to competition.

Running out of stock becomes a more dangerous situation for products like medical consumables or vaccines. Deciding when and in what quantities to replenish stock levels and overlaying this with transport capacity and route design to minimize the cost of logistics becomes a nontrivial and highly combinatorial challenge for schedulers.

This blog post describes an AWS Last Mile Replenishment Scheduler to predict product usage at the consumption sites and decide on the optimal product volume to allocate to delivery vehicles, with the goal of preventing product stock-outs and optimizing delivery routes to maximize logistics efficiency.

Three modules of Last Mile Replenishment Scheduler (LMRS)

LMRS consists of three key modules: (1) a Demand Forecaster to predict short-term product demand at points of consumption; (2) a Simulator to generate potential customer orders and decide the delivery time window, delivery amount, and delivery priority for each customer order; (3) an Agent to schedule the resources and plan the trips in order to minimize the risk of stock-outs and maximize the logistic efficiency.

The picture shows three modules of the solution: (1) Demand Forecastor; (2) Simulator; (3) Agent. Modules interact with one another to derive the best outcome.

The following steps take place every day:

  1. The Demand Forecaster predicts the consumption rate for each customer in the next few days, according to the recent consumption patterns using machine learning (ML) algorithms and proven ML features.
  2. Using the customers’ forecasted consumption data and the estimation of stock-out risk, the Simulator generates orders for the next five days. For each order, the Simulator decides the delivery time window, range of delivery quantity, and delivery priority.
  3. The Agent takes the generated orders as input and plans the delivery routes for the next five days.
  4. The Agent only returns to the Simulator the next-day trips and discards the trips of the remaining four days. As the Agent will plan the next five days’ trips dynamically each day, we do not need to keep the trips for the following four days because these will be replanned on the next day in any case. This is similar to playing chess, where at each step, we look ahead N steps, but we only move one actual step each time and reevaluate the following one depending on the opponent’s move.
  5. Finally, once the delivery plan is confirmed, the Simulator consumes the first day’s trips and deliveries and correspondingly updates the internal simulated stock levels at the customers.

For the best outcome, it makes sense to run the Simulator and Agent feedback loop several times with different parameters, e.g., with different forecast probabilities, and assess their impact on the daily delivery plan to settle on the optimal one.

Demand Forecaster

The Demand Forecaster uses a time-series forecasting approach. It is trained on historical consumption data to learn the consumption pattern for each customer. When making a prediction about the future, it utilizes the past 4–6 weeks of consumption data as a context input to make predictions for the next week of consumption.

We use Amazon Forecast to find the best time-series model. Amazon Forecast automatically inspects the data, identifies the key attributes, and selects the right algorithm by training different model types and selecting the one with the best forecast properties. Time-series forecast models are able to model and forecast temporal structures like seasonality, and more complex models learn and transfer insights gained from related time series like similar products or customer types.

Amazon Forecast contains algorithms that either train one model for each series or train a single model jointly over all time series. If the model is trained jointly over all time series, the model finds groupings of time series that behave similarly and uses information from each group to predict the members of the group. More advanced algorithms also allow the addition of item metadata in addition to the time series values; for example, product features or types. This allows the model to learn from this metadata and to find hidden connections between the series. We use these advanced features to adjust our forecast to individual consumption patterns as well as patterns found in groups.

Once the model is trained, we can predict future values. Depending on the model type, the prediction can be a single point or a distribution of values. Having a model that predicts a distribution allows us to represent the uncertainty in an estimate and define a confidence band for a prediction. For example, using a 95 percent prediction interval, we can be 95 percent confident that the next new observation will fall within this range.

To simplify the work with Amazon Forecast even more, AWS Solutions provides a one-click deployment using an AWS CloudFormation template. The solution automates the work to generate, test, compare, and iterate on Amazon Forecast predictions and automatically generates visualization dashboards to inspect the results.


The Simulator takes the next five days of predicted consumption data and generates potential orders for each customer. The Simulator will decide on the delivery time window, delivery quantity range, and delivery priority or penalty.

Deciding on the delivery time window

The delivery time window of an order is first defined as [T e, T h], where T e is the earliest time to deliver a customer, and T h is the latest time to deliver a customer (i.e., hit time). We derive T e and T h by looking up the corresponding product stocks levels:

Y te = h + x; Y th = h; x>=0; x<=c-h

Where Y te and Y th are the product stock levels at T e and T h respectively.h is the hit level of the customer’s stock for risk control, which means the stock should not drop below this level. c is the storage capacity of the customer, and x is a decision variable between zero and max top up volume. For example, we can try different values of x and see which one produces the best outcome.

Note: for customers with large storage capacity, we change the hit level to be h = max(c - FTL,h). By making this modification, we force delivery to large storage customers as early as possible as long as we can deliver a full truckload (FTL).

Finally, the delivery time window is refined as an overlap between [T e, T h] and customer access hours, e.g., opening time.

Deciding on delivery quantity

For an order, the quantity (e.g., weight or volume) that can be delivered to the customer is defined as a range [q min, q max], where q max is the maximum amount that can be delivered to the customer on the day T e, and q min is the minimum amount. In our method, q min depends on when the customer will stock out. We would like q min to be large when the customer will stock out immediately on the next day, meaning we want to deliver the customer as much as possible since the customer will run out of stock soon. However, we allow q min to be low when the customer will stock out a couple of days later, meaning we do not necessarily need to deliver a lot to the customer if we project a later stock-out. Therefore, q min can be defined as:

q min = (a * deltaTd + b) q max

where a <= 0 and b > 0 are hyperparameters that can be tuned and decided based on experiments. For example, we can try different values of a and b in the experiments and check which value produces the best results on the key metric (e.g., logistic efficiency). Another example, when a = 0 shows that the dry-out period is very far away and we can ignore this in our volume decision. is the number of days between current time and dry-out time. For example, if we let a = -0.1 and b = 1, we will have:

  • q min = 0.9 * q max if the customer stocks out on the next day (immediately)
  • q min = 0.8 * q max if the customer stocks out on the second day
  • q min = 0.7 * q max if the customer stocks out on the third day

During the optimization, transport will try to first deliver at least q min for the orders and will deliver more than q min if the transport has remaining product left but is not able to deliver a new order for any reason. However, the final delivered quantity will be within [q min, q max].

Deciding on delivery priority

The Simulator will assign priority to each order according to how soon the corresponding customer will stock out. The priority is represented by a penalty by which the algorithm will be penalized for missing that order. As an example, we use a Gaussian kernel to model the penalty:

P = ae to the power of -b(delta T h) to the power of 2

where delta T h is the number of hours left between current time and the projected stock-out event.

From penalty decay diagrams (varying with used parameters and to show the difference in penalty reduction), we can see that an order will have a much higher penalty if the customer will stock out sooner than an order where the customer will stock out a couple of days later.

Once we decide on the delivery time window, deliver quantity range, and deliver priority for each order, we include the order into the Simulator’s output if its delivery time window overlaps with the next five days.


The Agent takes the above generated orders as inputs and plans delivery routes for the next five days. The goal of trip planning is to minimize the objective function for distance traveled, subject to certain constraints.

Objective function

The essential goal of trip planning is to minimize the objective function below:

objective = D + SUM (N, i=o) P i * Phi(i)

where D is the total distance (e.g., miles or kilometers) for all trips, and N is the total number of orders generated by the Simulator. P i is the penalty of order i. Phi (i) is a binary variable with 0 meaning order is delivered, and 1 meaning order is missed in the solution. An order may be missed for reasons such as delivery window constraints or limited resources and capacity on that day. For example, it’s possible that the transport doesn’t have enough vehicles on that day to deliver all the generated orders within the shift duty time.

The Agent aims to find the trips that minimize the objective function with limited search time and computational resources. This means the agent will try to minimize the penalty of missing orders while reducing the travel distance.


We provide a set of example constraints. Depending on the situation, they need adjusting to represent the existing operational logic.

  1. Time
    1. Customer access hours
    2. Service time, i.e., the time spent at a customer’s site to unload the product
      1. Unload time
      2. Transit time
      3. Setup time, e.g., loading
    3. Pre- and post-trip time, e.g., an initial trip from the delivery base to the first drop-off point
  2. Vehicle
    1. Vehicle type, e.g., van, semitrailer
    2. Capacity, i.e., maximum weight or volume of the product
  3. Time and distance matrix to get from every possible origin to every possible destination
  4. Source and product constraints, e.g., which products exist across loading locations, which customer needs specific products, etc.
  5. Shift constraints, e.g., working hours, duty time of drivers, based on the available drivers
  6. Other customer- or situation-specific constraints

LMRS benefits

We ran a proof of concept (PoC) using a dataset from a real business for industrial products. It is a chemical manufacturer that supplies replenishable industrial products in tanker trucks to other manufacturers. Its business critical objective is to always maintain some product stock levels at the customers’ tanks and prevent stock-outs. Naturally, their business objective is to maximize volume of product delivered over total traveled distance and reduce operational delivery costs.

The PoC proved compliant with the mentioned real-world constraints, such as customer access hours, vehicle capacity and labor constraints, source and product constraints, and demonstrated potential for multimillion-dollar savings on last-mile deliveries or up to 10 percent improvement for the weight over distance metric for the pilot country. Furthermore, the benefits go beyond pure operational cost savings. LMRS enables expansion of the planning horizon to improve scheduling agility and reporting accuracy, as well as to optimize network efficiency when breaking the planning siloes between each region.

LMRS also improves the quality of strategic investment decisions by simulating the Opex reduction when making Capex investment. For example, simulating delivery plans and total delivery cost over a period of time when opening a new truck loading site or adding a certain number of vehicles. LMRS reduces the efforts and increases the accuracy for this type of strategic decisions so strategic improvements initiatives are not bounded to the regular strategy planning cycle.

Architecture and integration with customer systems

LMRS is built using a serverless and event-driven architecture on AWS to optimize for cost, reduce the required operational effort to run and maintain LMRS, and focus on delivering the expected business value. In order to enable seamless integration of LMRS with existing scheduling systems and processes, the integration is driven by loosely coupled APIs that allow for on-demand forecasting of customer demand and optimization of delivery schedule and route.

The illustration shows an architecture starting from a scheduling system, Amazon API Gateway, AWS Lambda and moving onto decision-making using AWS Step Functions workflow and additional services to feed outputs back to the scheduling system.

Schedulers execute the optimization workflow from their current scheduling system, which provides the required data input for the optimization logic. With the execution, the latest snapshot of relevant data is collected by the existing scheduling system and sent to a REST API endpoint using Amazon API Gateway with Lambda proxy integration. The API receives the data and stores it in Amazon Simple Storage Service (Amazon S3) for further processing.

Once the data is added to the S3 bucket, Amazon S3 Event Notification with Amazon EventBridge is used to start a state machine execution. The state machine is based on AWS Step Functions and AWS Lambda. The implemented workflow loads the data from the S3 bucket and performs preprocessing of the data to match the required input formats for the Demand Forecaster, Simulator, and optimization Agent. Consequently, the optimization logic of the solution framework is executed in sequence to eventually generate the optimized routes and product deliveries as an output. The output is then formatted to be loaded back into the customer scheduling system. Using the AWS Step Functions observability features, schedulers can use Amazon CloudWatch to have full auditability of the previous data processing and generation of optimization results.

The output data is then stored in Amazon S3. Once the data is added to Amazon S3, an AWS Lambda function is executed, which is again triggered by the respective Amazon S3 change event and respective Amazon EventBridge rule. This AWS Lambda function fetches the output data from the S3 bucket and pushes the data via a REST API call to the customer scheduling system. The customer scheduling system then stores and loads the data into the existing user interface of the customer scheduling system.


Deciding when to replenish products at the points of their consumption is a complex combinatorial problem, and it is challenging for the human mind to evaluate all possible scenarios that might occur over the course of a few days in order to arrive at the most cost-efficient delivery option. In this post, we proposed a three-module LMRS to predict replenishable product consumption at the customer sites, simulate potential deliveries and optimize their selection to prevent stock-outs at the customer and minimize delivery costs at the same time.

LMRS is applicable to various products and industries, such as. gas networks, chemical distribution, construction materials, and pharma and hospital networks. This allows them to maintain the minimum stock levels of required products, avoid costly stock-outs, and minimize logistics costs. If you wish to explore how LMRS could optimize your replenishable last-mile deliveries, then please reach out to your account manager to set up a discovery workshop with the AWS Supply Chain, Transportation, and Logistics team.

Alex Artamonov

Alex Artamonov

Alex Artamonov is a Principal in the AWS Supply Chain, Transportation, and Logistics. He started his Amazon journey in 2017 as a Senior Program Manager in Amazon Transportation Services and he joined AWS in 2020. Alex works with AWS customers to baseline supply chain challenges and jointly innovate and co-create cloud-based and data-driven solutions for the immediate business impact. Alex holds a PhD in Operations Research, and he has 17+ years of cross-industry consulting experience with a long successful track record of efficiency improvement and cost reduction using data, advanced analytics, and technology. Alex works at Amazon EU HQ in Luxembourg.

Andreas Braun

Andreas Braun

Andreas Braun is an AI/ML Data Scientist with the Emerging Technologies and Intelligent Platforms team at Amazon Web Services. By scoping and building solutions with a focus on AI/ML, Andreas helps customers innovate and solve their business challenges.

Feng Shi

Feng Shi

Dr. Feng Shi is a Data Scientist for Emergent Technologies and Intelligence Platform at AWS Professional Services. He received both his master’s and bachelor's degrees in Mechanical Engineering from Northwestern Polytechnical University in Xi'an, China. He earned his doctoral degree in Engineering Informatics from Imperial College London, specializing in semantic graphs and natural language processing. As a postdoctoral researcher at Imperial College London and the Alan Turing Institute, Dr. Feng Shi worked on applied machine learning for engineering problems. Feng is now working with customers to solve the most challenging real-world problem through AI/ML and optimization.

Patrick Thoben

Patrick Thoben

Patrick Thoben is Enterprise Service Manager for the Travel, Transportation, and Logistics practice of AWS Professional Services. In his role, he is responsible for helping customers engage with AWS Professional Services to define and accelerate their strategic cloud initiatives, develop implementation roadmaps, and build industry-specific solutions using AWS.

Rui Kang

Rui Kang

Rui Kang is a Practice Manager at Amazon Web Services (AWS), managing AWS Professional Services’ business in the Media, Entertainment, and Digital Native Business sector. She joined AWS from Amazon in 2021 and led cloud innovation projects across different verticals. Rui holds two bachelor of science degrees, one in computer science and one in mechanical engineering, as well as an MBA. Rui has a track record of leading and owning business targets using technologies.