Develop and extract value from open data

Open data is fostering new opportunities for innovation, both in terms of entrepreneurship and public service. AWS embraces open data, providing the tools to develop and extract value in a single place. This includes direct hosting of public datasets at no cost on Amazon Simple Storage Service (Amazon S3).

In this blog post, we explore a use case for government organizations using the OpenStreetMap (OSM) dataset, a free, editable map of the world, created and maintained by volunteers and available for use with an open license. Using open source tools, we generate and render custom maps for a government’s digital property. By leveraging Amazon S3, Amazon EC2, Amazon ECS, and multi-tiered architectures, map tiles server can run in an efficient and highly available infrastructure.

Generate and render map tiles with OpenStreetMap

Governments often provide geographic information to users on their webpages. A country’s ministry of foreign affairs may display a map with the location and contact information of each of its global embassies. In other cases, cities may use it to provide directions and other details about tourist attractions.

OSM can provide agencies with geographic services with no data licensing costs, and with full control over how they use the data. This blog post will explain how to use Amazon S3 to access OSM data, how to use an EC2 instance to generate and compute OSM map tiles, and how to build a multi-tier, highly available architecture to serve map content.

The process to generate the tiles from OSM requires a number of open source tools. In addition to a PostgreSQL database, this includes:

PostGIS Extensions: a spatial database extender for PostgreSQL object-relational database
osm2pgsql: a tool that converts OSM data to postGIS-enabled PostgreSQL databases
mapnik: a map-rendering toolkit that includes bindings in Node, Python, and C++
renderd: a rendering daemon used with mapnik and OSM
mod_tile: an Apache module that renders and serves map tiles
OpenLayers: a mapping library that includes markers and tiled layers

Using Docker containers to render OpenStreetMap tiles

OSM provides an overview on how to install, configure, and use these tools from scratch to get the tiles rendered. For the purpose of this post, we use a pre-built container image with the set of tools needed to create and generate the map tiles from the OSM data.

For this example, let’s use a container image from the GitHub community, built by the National Center for Atmospheric Research Earth Observing Laboratory (NCAR EOL) that is based on the openstreetmap-docker-files image. The process is outlined below:

To begin, we need to start a Linux EC2 instance, which is where the rendering job will take place. After the tiles are created, they can be moved to S3, and the EC2 instance can be turned off until the next time we need to generate tiles.

Since the tile-rendering job is CPU-intensive, we recommend using an instance from the compute optimized C family. We also need an EBS volume attached to the instance with enough capacity for the raw PBF OSM map data.

Estimating infrastructure requirements before rendering tiles

In this example, we will render OSM PBF data from Spain, obtained from Geofabrik, which provides OSM map extracts. While we’re focusing on a single country, S3 provides direct access to data on a planetary level, accessible with any S3-compatible tool and through the same rendering process.

We need around 100Gb of storage for Spain, but If you are going to build the whole planet, consider up to 400Gb. As shown in the image below, an EBS volume of 500GiB is more than enough space to store the raw PBF data and the PostgreSQL database. Once created in the console, you need to attach it to the instance.

With our instance ready and the EBS volume attached, there are two steps required before we can begin the actual rendering job:

Install Docker in our Amazon Linux instance.
Move the PBF file with the OSM data to the newly created EBS volume (you can access the AWS CLI S3 tool through the Geofabrik provided link or the S3 instructions at the OpenStreetMap on AWS information site). The container image will run using Docker Compose, so we need to install it on our instance.

Time to render

Now, we are ready to start the rendering. Initiate the PostgreSQL database with the following command:

docker-compose run osm initdb

After the database is ready, we can start to import the PBF data into the database with the following command:

docker-compose run osm import

Below is a partial summary of the imported Spain PBF file:

Processing: Node(75938k 303.8k/s) Way(6063k 13.21k/s) Relation(161710 388.73/s)  parse time: 1125s
All indexes on  planet_osm_point created  in 180s
Completed planet_osm_point
Creating osm_id index on  planet_osm_polygon
Creating indexes on  planet_osm_polygon finished
All indexes on  planet_osm_polygon created  in 208s
Completed planet_osm_polygon
Creating osm_id index on  planet_osm_line
Creating indexes on  planet_osm_line finished
All indexes on  planet_osm_line created  in 257s
Completed planet_osm_line

The process may take longer or shorter depending on the instance type selected and the size of the PBF file. Our example takes around seven hours. When all the data has been imported, we are ready to render the tiles. The Docker Compose command for the task is:

docker-compose run osm render

Note: We need to take into account the NCAR EOL warning at the container-image wiki at this stage. Ensure /var/lib/mod_tile directory exists and Docker’s containers www-data user has write permissions before rendering. This can be done by accessing the Docker image:

$ docker-compose run osm bash
docker # mkdir -p /var/lib/mod_tile/default
docker # chown www-data /var/lib/mod_tile/default

Assess the output

Total for all tiles rendered
renderd[36]: DEBUG: Connection 0, fd 8 closed, now 0 left
Meta tiles rendered: Rendered 349528 tiles in 26755.87 seconds (9.78 tiles/s)
Total tiles rendered: Rendered 22369792 tiles in 26755.87 seconds (625.63 tiles/s)
Total tiles handled: Rendered 349528 tiles in 26755.87 seconds (9.78 tiles/s)

Serve the map tiles

At this point, tiles are rendered and we are ready to start serving the map tiles. The container image also contains an Apache module with mod_tile server. Let’s bring it up with:

docker-compose up osm

We can now connect to the instance at port 8000 and check if the map displays correctly. Remember to allow connectivity on the port 8000 by opening the port in the instance’s security group. The map should display correctly, as shown below.

We can zoom in and out of the map at the maximum level we specified in the Docker Compose YAML file. Let’s zoom in on Spain’s capital, Madrid.

Specify points of interest

As a final step, we are going to set markers on the map to specify points of interest. For this task, OpenLayers is the tool of choice. With OpenLayers, we can add div elements into webpages containing maps with markers and related information.

The best way to show this at work is to show a sample piece of code below as a HTML/JS file. It is important to note the format of the directive OpenLayers.Layers.OSM:

<code class="lang-apacheconf">var newL = new OpenLayers.Layer.OSM("Default", "/osm_tiles/${z}/${x}/${y}.png", {numZoomLevels: 12});</code>

OpenLayers connects to the tile server, in this case seated on the localhost path /osm_tiles and with the tile pattern ${z}/${x}/${y}.png. You should change the path to the tile’s URL. For more information, check the OpenLayers library for Layer.OSM.

<html>
  <head>
    <title>Dan OSM in AWS Blog</title>
    <style type="text/css">
      html, body, #basicMap {
          width: 100%;
          height: 100%;
          margin: 0;
      }
    </style>
    <script src="http://www.openlayers.org/api/OpenLayers.js"></script>
    <script>
      function init() {
           var options = {
                projection: new OpenLayers.Projection("EPSG:900913"),
                displayProjection: new OpenLayers.Projection("EPSG:4326"),
                units: "m",
                maxResolution: 156543.0339,
                maxExtent: new OpenLayers.Bounds(-20037508.34, -20037508.34,
                                                 20037508.34, 20037508.34),
                numZoomLevels: 12,
                controls: [
                        new OpenLayers.Control.Navigation(),
                        new OpenLayers.Control.PanZoomBar(),
                        new OpenLayers.Control.Permalink(),
                        new OpenLayers.Control.ScaleLine(),
                        new OpenLayers.Control.MousePosition(),
                        new OpenLayers.Control.KeyboardDefaults()

                  ]
            };
        map = new OpenLayers.Map("basicMap",options);
        var newL = new OpenLayers.Layer.OSM("Default", "/osm_tiles/${z}/${x}/${y}.png", {numZoomLevels: 19});
        map.addLayer(newL);
        map.zoomIn();
var lonLat = new OpenLayers.LonLat( -3.6896 , 40.4531 )
              .transform(
            new OpenLayers.Projection("EPSG:4326"), // transform from WGS 1984
            map.getProjectionObject() // to Spherical Mercator Projection
          );
        var zoom=10;
        var markers = new OpenLayers.Layer.Markers( "Markers" );
        map.addLayer(markers);
        markers.addMarker(new OpenLayers.Marker(lonLat));
        map.setCenter (lonLat, zoom);

      }
    </script>
  </head>
  <body onload="init();">
    <div id="basicMap"></div>
  </body>
</html>

The marked point on the map is established in the variable lonLat, and the new marker layer is built along with it. A transformation is needed from the spatial reference system, WGS 1984 geographic coordinates, to the Web or Mecartor projection coordinates reference system, which is the usual standard for web mapping apps.

In the example, we provided the longitude and latitude of one of the most representative business districts in Madrid. The figure below is the result of the code and you can see the marker showing up with the map.

With the tiles generated and rendered, we are set to start serving maps by deploying them to Apache servers using mod_tile. It is important to consider the right architectures to provide geographic services, or embed them into webpages across digital properties.

Choose the right architecture

AWS helps provide a highly available and efficient multi-tier architecture to provide the mapping service inside an organization logic. All the tiles data can be transferred to Amazon S3 for persistent and durable storage.

Amazon Cloudfront, the AWS global content delivery network service that securely delivers data, videos, applications, and APIs to viewers with low latency and high transfer speeds, can help to make the user experience faster and smoother, while also being cost-effective.

The Apache mod_tile is capable of serving tiles stored in S3, allowing them to have a common shared storage for tiles, instead of deploying them individually on each Apache server.

With AWS, we can launch servers in EC2 instances leveraging AWS Availability Zones for increased availability of the service. A draft architecture is shown in the figure below.

We could also adopt a microservice-oriented architecture. This would entail customizing the container image that we have used for the rendering, where the tile serving is a microservice inside an application logic. Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Container Service for Kubernetes (Amazon EKS) are also services worth considering.

The potential of open data

Combining AWS and OSM, we have created a solution that can serve maps on government websites and can also be used as the basis for government organizations to create other map-based services for their citizens. OpenCycleMap is one such OSM-based service that displays bike routes around the world that cities could provide using the approach we’ve shown here.

This is just one example of what AWS can deliver with open data. With new public datasets increasingly becoming available on AWS, the opportunities for new, innovative public services are endless.

Read some of my other blog posts on the AWS Public Sector Blog, including, “Grandma Emergency Button – A simple emergency alert solution with AWS IoT Button,” “Using a serverless architecture to collect and prioritize citizen feedback,” and “Creating a serverless GPS monitoring and alerting solution.”

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.