Vehicle routing optimization with Amazon Aurora PostgreSQL-Compatible Edition
Vehicle route planning with the goals of meeting customer satisfaction expectations, reducing fuel consumption, and reducing emissions can be challenging.
The vehicle routing problem (VRP) goal is to find optimal routes for multiple vehicles visiting multiple locations to fulfill business-specific constraints such as cost and time.
The VRP has various use cases in industries such as transport, travel, and logistics because it has a direct impact on customer experience and savings. For example, it’s applicable to last-mile delivery, international shipping, air route networks, city traffic routes and safety, or security service scenarios. Imagine a clothing retailer that’s planning to deliver a given number of customer orders using the company-owned vehicle fleet. Which customer location should be put on which vehicle’s route (starting and ending at the warehouse) and in which order, such that the total distance traveled is minimized?
In this post, I show you how to use Amazon Aurora PostgreSQL-Compatible Edition to efficiently store map topology data and provide geospatial routing functionality at scale. You can also use this solution with Amazon Relational Database Service (Amazon RDS) for PostgreSQL.
Aurora and PostGIS
Aurora is built for the cloud and combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open-source databases.
Aurora delivers up to three times the throughput of standard PostgreSQL running on the same hardware, which enables existing PostgreSQL applications and tools to run without requiring modification. Moreover, you can use replicas to support high-volume application requests or automate failover and Amazon Aurora Global Database for globally distributed applications.
PostGIS is open-source software that extends capabilities of PostgreSQL by adding geospatial types and functions to enhance spatial data management within a relational database structure, making it a fast, feature-plenty, and robust system.
For the geospatial functionality, we combine Aurora PostgreSQL with PostGIS.
Vehicle routing problems
One of the oldest vehicle routing problems is the traveling salesman problem (TSP), first formulated in 1930 by Karl Menger. The traveling salesman problem helps find the shortest way to visit different locations and return to the original point of departure given a finite number of locations and the distance of travel between them (see the following diagrams for an example).
The diagram illustrates a routable map as a series of edges linked by vertices (A, B, C) with an associated cost of travel (2, 3, 5). For instance, the cost can simply be the length of the edge or the length divided by Km/h.
In reality, the distance is only one factor to consider. For instance, the capacitated vehicle routing problem (CVRP) takes into consideration the weight and volume of what’s being transported, because each vehicle has a maximum load capacity, whereas the vehicle routing problem with time windows (VRPTW) deals with customers available during a specific period of time. As you can imagine, there are many other implementations of the TSP and more data points to consider, such as traffic, directions, weather, fuel, location of petrol stations or electric charging stations, and more.
In this solution, you connect to your Aurora cluster from your AWS Cloud9 environment with a psql shell and run queries to solve routing problems.
The following architecture describes the setup for this walkthrough.
The process involves using PostgreSQL, PostGIS, and pgRouting to provide geospatial routing functionality over an OpenStreet map.
The setup includes the following steps:
- Configure Aurora PostgreSQL to use PostGIS and pgRouting.
- Prepare your database.
- Import a map.
- Solve different type of routing problems using SQL queries.
For this walkthrough, you should have an AWS account.
Set up Aurora PostgreSQL
Working with a production-grade geographic information system requires particular attention to performance, reliability, and security. Aurora combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases.
To follow along, launch the Aurora database cluster using the provided AWS CloudFormation template. The template uses PostgreSQL 12.4, Aurora PostgreSQL release 4.0.
For this post, the template creates a single Aurora instance, but while running in production, we recommend having multiple DB instances in the same cluster for failover and high availability. For more information, see Replication with Amazon Aurora.
The database credential is stored in AWS Secrets Manager under a secret named
RDSSecret. With Secrets Manager, you can easily rotate, manage, and retrieve database credentials, API keys, and other secrets through their lifecycle.
After you launch the CloudFormation template, wait 5–10 minutes for the instance to become available.
You can check the instance status on the AWS Management Console:
- On the Amazon RDS console, choose DB Instances.
- Make sure the status shows as
- Choose the main DB identifier and save the writer endpoint name for later use.
Prepare your database
The CloudFormation template creates an AWS Cloud9 instance to perform operations on the database.
AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug code with just a browser. It comes with a terminal that includes a pre-authenticated AWS Command Line Interface (AWS CLI). This makes it easy for you to quickly run commands and directly access AWS services as Aurora.
- On the AWS Cloud9 console, choose Your environments.
- On your environment’s details page, choose Open IDE.
- Resize the volume to 20 GiB.
This allows you to operate with large map files.
Now it’s time to add the PostGIS and pgRouting extension and configure the database.
- On the Window menu, choose New Terminal.
- In the AWS Cloud9 terminal, enter the following command to install the PostgreSQL client:
- Get the database password from Secrets Manager with the following code:
- Establish a connection using the psql client:
DB_WRITER_ENDPOINT with the database cluster endpoint (writer endpoint) you saved earlier.
- When prompted, enter the database password obtained in the previous statement.
- First, let’s create a new database called
- Then we add extensions for PostGIS and pgRouting:
It’s important to see what version of the extension you’re using for later reference.
- Confirm the versions of PostGis and pgRouting with the following code:
The output of the preceding statements should show 2.4 version for PostGIS and 2.5.2 for pgRouting. Don’t worry if you’re running different version, just make a note of the
pgr_version() output for later reference.
Download the map
For this exercise, you download an OpenStreetMap for UK and Ireland. The map comes in PBF format (Protocolbuffer binary format), which is intended as an alternative to the XML format and primarily supported by the OSM project. This data is served under the Open Database License 1.0.
- In the AWS Cloud9 environment, on the Window menu, choose New Terminal so you can keep the database open for later use.
- In the new terminal, download and rename the map with the following code:
You should see a file called
mymap.pbf in the document tree.
The next step is to parse the OpenStreetMap in a routable format.
For this task, you use a tool called osm2po. The osm2po converter parses OpenStreetMap’s XML-Data and makes it routable. In other words, the OpenStreetMap file needs to be imported to a PostgreSQL database with the correct geometry projection, spatial reference (PostGIS), and directed topology for routing (pgRouting). The data generated is later imported in your Aurora PostgreSQL database.
- To install osm2po, enter the following in the terminal:
- With the following command, you convert the PBF map to a series of statements available in a SQL file under a folder called
The parameters are as follows:
- Xmx – Allocates 1 Gb of memory (JAVA parameter for memory reservation).
- prefix – The prefix for generated files or tables.
- tileSize – Controls the balance between available memory and data size. The x disables tiling, which is optimal.
- cmd – Tells osm2po what to do (
The output of the previous command should already give you a clue as to the next command to run.
You use the psql client to import the data into PostgreSQL.
- In the first terminal you opened for connection to Aurora (the one with the psql prompt), enter the following:
The statement takes approximately 12 minutes to complete; wait until you see the psql prompt again.
You can inspect the table schema with
\d table_name and because the table you just created is called
uk_2po_4pgr, the SQL statement is as follows:
The following screenshot shows our output.
The main elements are as follows:
- id – The unique identifier of the edge
- osm_name – The name of the edge (for example, Baker Street)
- km – The length of the edge in Km
- kmh – The speed limit
- source/target – The identifier of the first and last endpoint of the edge
- cost/reverse cost – The length of the edge divided by Km/h depending on the direction of travel (source to target or target to source).
- x/y – The latitude (x) and longitude (y) coordinates of the source (x1,y1) and target vertex (x2,y2).
This table contains approximately 3,797,170 rows, and working with geospatial data can be a challenge because the data can quickly grow in size. With Aurora, the size of your database volume automatically grows in increments of 10 GB, up to a maximum of 128 TB. For more information, see Amazon Aurora storage and reliability.
Solve routing problems
Finally, you’ve arrived at the fun part: using pgRouting to solve common routing problems.
The pgRouting library contains several features and algorithms. For more information about using pgRouting in your application, see the official documentation.
Make sure to read the documentation for the appropriate pgRouting version. For this post, we use version 2.5.
The Dijkstra algorithm finds the shortest paths between nodes in a graph, for example, road networks. The pgRouting library contains the
pgr_dijkstra function, which returns the shortest paths using the Dijkstra algorithm.
The following output shows a sample usage of the
pgr_dijkstra function. It takes as input a source node ID, a target node ID (in our example, 1 and 10, respectively), and returns a sequence of nodes, edges, and costs to traverse from one node using the edge to the next node, in the path sequence.
pgr_dijkstra function uses node IDs to calculate the route, whereas in real scenarios you might want to use geographic coordinates such as latitude and longitude.
Because the map is composed of nodes and edges, and not all geographical coordinates are represented by a specific node, you need a way to find the closest node to your coordinates.
With PostGIS, you can select the closest node to a specified latitude/longitude location using the
Three steps are involved:
- Create a point geometry.
- Set a spatial reference identifier (SRID) for the geometry.
- Use a function to calculate the distance between points to obtain the closest point ID.
The following code is a sample implementation:
The query has the following components:
- ST_MakePoint – Makes a point
- ST_SetSRID – Sets the spatial reference identifier (SRID) on a geometry
- ST_Distance – Returns the minimum 2D Cartesian (planar) distance between two geometries
Let’s say we want to generate the shortest path between the National Gallery in Trafalgar Square (latitude 51.5080, longitude -0.1291) and a north entrance of Hyde Park (latitude 51.5106, longitude -0.1643), as shown in the following map.
First, we need to identify the closest node ID given the National Gallery in Trafalgar Square and Hyde Park coordinates.
We can do so with the following queries. We first determine the node of Trafalgar Square:
The output should be 66763, which is the closest node identifier to the coordinates provided.
The following code is our second query:
The output should be 2515220.
Finally, we can calculate the shortest distance between the nodes 66763 (Trafalgar Square) and 2515220 (Hyde Park):
The following screenshot shows our output.
The output includes the following components:
- seq – The sequential value starting from 1.
- path_seq – The relative position in the path. This has the value 1 for the beginning of a path.
- node – The identifier of the node in the path from source to target.
- edge – The identifier of the edge used to go from one node to the next node in the path sequence. We use -1 for the last node of the path.
- cost – The cost to traverse from the node using the edge to the next node in the path sequence.
- agg_cost – The aggregate cost from the initial node.
As you can see, it’s difficult to understand the route from the output.
One option is to convert the output to GeoJSON and plot the route in a map. GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes.
You can run the following statement to obtain the GeoJSON representation of the previous result and save the output to file called
Now you can visualize the result using a service like geojson.io:
- Open the
out.jsonfile from the AWS Cloud9 environment.
- Copy only line 3, which starts with
- Open geojson.io in your browser.
- Enter the line you copied.
The route respects the direction of the traffic. The
pgr_dijkstra function accepts an additional parameter called
directed, which is by default set to
TRUE to consider the graph having a direction.
If you want to run the previous query ignoring the direction of the graph, you can set the
directed parameter to
You might have noticed that the query takes some time to run (approximately 11 seconds).
This is due to the size of the data. To improve the query, you can restrict the search scope within geographical boundaries.
Because the source and target location are Trafalgar Square and Hyde Park, the area can be restricted within the following latitudes and longitudes: -0.2090, 51.5160 and -0.1150, 51.4838.
We can then refactor the query using the
ST_MakeEnvelope function, which creates a rectangular polygon from the minimum and maximum values for X and Y:
Run the following query and you can see how much faster the response is:
Let’s go one step further. The
pgr_dijkstra supports different type of routings, such as one-to-many or many-to-many. To change the type of routing, you can simply submit an array instead of a single node parameter:
- One-to-one – One source node to one target node, with the parameters start, end
- One-to-many – One source node to many target nodes, with the parameters start, ARRAY[end_a, end_b]
- Many-to-one – Many source nodes to one target node, with the parameters ARRAY[start_a, start_b], end
- Many-to-many – Many source nodes to many target nodes, with the parameters ARRAY[start_a, start_b], ARRAY[end_a, end_b]
Consider the following table, which contains additional points on top of the others already provided.
|Name||ID||Source Node||Target Node|
|Natural History Museum||2789707||2318147||2535858|
A many-to-many query with two starting points (the National History Museum and the National Gallery) and two target points (Covent Garden and Hyde Park North entrance) could look like the following:
In the preceding image, every source node is reaching every target node.
What about the traveling salesman problem, which tries to find the shortest route that visits each location exactly once and returns to the origin location?
pgRouting provides additional functions to solve this challenge:
- pgr_TSP – When input is given as matrix cell information
- pgr_TSPeuclidean – When the input is coordinates
For simplicity, we use the
The main input parameter is another SQL statement that returns latitude and longitude coordinates for the chosen nodes.
Given a delivery service in Covent Garden (node ID 2097118) and customers based in our National History Museum (node ID 2789707) and Hyde Park (node ID 2763353) locations, what is the shortest route that visits each site once and returns to the origin?
You can find the solution using the following query:
The following screenshot shows our output.
The output indicates that the first stop should be Hyde Park (node ID 2763353) and the second is the National History Museum (node ID 2789707).
In this post, I walked you through how to use Aurora PostgreSQL to store and query GIS data to optimize and make informed decisions about which route to use for improved lead time estimation while saving costs.
Using Aurora instead a self-hosted solution has several benefits, because you no longer need to worry about hardware provisioning, patching, scaling, replicas, backup, and restore. Aurora is designed to offer greater than 99.99% availability, replicating six copies of your data across three Availability Zones and backing up your data continuously to Amazon Simple Storage Service (Amazon S3). With Global Database, a single Aurora database can span multiple AWS Regions to enable fast local reads and quick disaster recovery. All those benefits allow Aurora PostgreSQL to deliver GIS capability at scale, enabling you to thrive in this new era of logistics management.
I encourage you to experiment more with Aurora PostgreSQL and PostGIS, and also check out Amazon Location Service.
Please share your experience and your feedback in the comments section.
About the Author
Nicola Pietroluongo is an AWS Senior Solutions Architect supporting large enterprises in their strategic implementation solving technical and business challenges. He enjoys sharing innovative ways of tackling problems or delivering ideas. Follow him on Twitter at @niklongstone.