AWS Compute Blog

Building a location-based, scalable, serverless web app – part 2

Part 1 introduces the Ask Around Me web application that allows users to send questions to other local users in real time. I explain the app’s functionality and how using a single-page application (SPA) framework complements a serverless backend. I configure Auth0 for authentication and show how to deploy the frontend and backend. I also introduce how SPA frontends can send and receive data using both a traditional API and real-time messaging via a WebSocket.

In this post, I review the backend architecture, Amazon API Gateway’s HTTP APIs, and the geohashing implementation. The code and instructions for this application are available in the GitHub repo.

Architecture overview

After deploying the application using the repo’s README.md instructions, the backend architecture looks like this:

Ask Around Me backend architecture

The Vue.js frontend primarily interacts with the backend via HTTP APIs using Amazon API Gateway. When users submit questions or answers, the data is sent via the POST API endpoints. When the frontend requests lists of questions or answers, this occurs via the GET API endpoints.

Incoming questions and answers are posted to separate Amazon SQS queues. These queues invoke AWS Lambda functions that process and store the data in the application’s Amazon DynamoDB tables. In the Questions table, the application saves geo-location data and aggregated statistics for each question. The Answers table maintains a record of user IDs and answers to ensure that each user can only post one answer per question.

When new answers are stored in the Answers tables, a DynamoDB stream triggers a Lambda aggregation function with the update. This calculates average scores for questions and aggregates data for the heat map, then stores the result in the main Questions table. When the Questions table is updated, this DynamoDB stream invokes the Publish Lambda function. This publishes updates to the relevant topic in AWS IoT Core, which the front-end application subscribes to.

Using HTTP APIs

API Gateway is a common integration service used between the frontend and backend of serverless web applications. You can choose between the standard REST APIs, and the newer HTTP APIs. The choice depends upon which features you need, and cost considerations for your workload.

This application uses JWT authentication via Auth0 and Lambda proxy integration, and both are supported by HTTP APIs. Many advanced features like API key management, Amazon Cognito integration, and usage plans are not required in this application. It’s also important to compare to cost of each service:

API type Hourly Daily Annually
PUT questions 1,000 24,000 8,760,000
GET questions 50,000 1,200,000 438,000,000
PUT answers 10,000 240,000 87,600,000
Total API requests 534,360,000
REST APIs cost $1,870.26
HTTP APIs $534.36

Using the predicted API usage covered in part 1, you can compare the REST APIs and HTTP APIs overall cost. At an estimated $534 annually, the HTTP APIs option is approximately 30% of the cost of REST APIs.

The AWS Serverless Application Model (SAM) template in the repo defines the HTTP API resource and CORS configuration. It also includes the Auth0 authorizer used to validate each API request:

  MyApi:
    Type: AWS::Serverless::HttpApi
    Properties:
     Auth:
        Authorizers:
          MyAuthorizer:
            JwtConfiguration:
              issuer: !Ref Auth0issuer
              audience:
                - https://auth0-jwt-authorizer
            IdentitySource: "$request.header.Authorization"
        DefaultAuthorizer: MyAuthorizer

      CorsConfiguration:
        AllowMethods:
          - GET
          - POST
          - DELETE
          - OPTIONS
        AllowHeaders:
          - "*"   
        AllowOrigins: 
          - "*"   

With the HTTP API resource defined, each Lambda function has an event configuration referencing this resource. All the functions referencing the HTTP API resource automatically use the Auth0 authorizer.

  GetAnswersFunction: 
    Type: AWS::Serverless::Function
    Properties:
      Description: Get all answers for a question
      ... 
      Events:
        Get:
          Type: HttpApi
          Properties:
            Path: /answers/{Key}
            Method: get
            ApiId: !Ref MyApi    

Using geohashing in web applications

A key part of the functionality in Ask Around Me is the ability to find and answer questions near the user. Given the expected volume of questions in this system, this requires an efficient way to query based upon location that maintains performance as traffic grows.

In a naïve implementation, you might compare the current geographical position of the user with the geo-location of each question and answer in the database. But with an expected 1,000 questions per hour, this would soon become a slow operation with O(n) performance.

A more efficient solution is geohashing. This divides the geographical area of the planet into series of grid cells that are identified by an alphanumeric hash. The first character of the hash identifies one of 32 cells in the grid, roughly 5000 km x 5000 km on the planet. The second character identifies one of 32 squares in that first cell, so combining the first two characters provides a resolution of approximately 1250 km x 1250 km. By the 12th character in the hash, you can identify an area as small as a couple of square inches on Earth. For a more detailed explanation, see this geohashing site.

When using this algorithm, it’s important to choose the correct level of resolution. For Ask Around Me, the frontend searches for questions within 5 miles of the user. You can identify these areas with a 5-character hash. This means you can compare the user’s current location using their geohash, to the geohash stored in the Questions table. This comparison allows you to immediately discard most questions from the search and quickly find the relevant items.

This solution uses the Geo Library for Amazon DynamoDB npm library. Both the GET and POST questions APIs use this library to calculate the geohash when storing and fetching questions. The library requires a dedicated DynamoDB table, which is why user answers are stored in a separate table.

The GET questions API uses the latitude and longitude from the query parameters to query the underlying DynamoDB using this library:

const AWS = require('aws-sdk')
AWS.config.update({region: process.env.AWS_REGION})

const ddb = new AWS.DynamoDB() 
const ddbGeo = require('dynamodb-geo')
const config = new ddbGeo.GeoDataManagerConfiguration(ddb, process.env.TableName)
config.hashKeyLength = 5

const myGeoTableManager = new ddbGeo.GeoDataManager(config)
const SEARCH_RADIUS_METERS = 4000

exports.handler = async (event) => {

  const latitude = parseFloat(event.queryStringParameters.lat)
  const longitude = parseFloat(event.queryStringParameters.lng)

  // Get questions within geo range
  const result = await myGeoTableManager.queryRadius({
    RadiusInMeter: SEARCH_RADIUS_METERS,
    CenterPoint: {
      latitude,
      longitude
    }
  })

  return {
    statusCode: 200,
    body: JSON.stringify(result)
  }
}

The publish/subscribe pattern for real time in web apps

Modern web applications frequently use real-time notifications to keep users informed of state changes. You could achieve this with frequently polling of the APIs to fetch new information. However, this approach is usually wasteful, both in cost and compute terms, because most API calls do not return new information. Additionally, if updates are evenly distributed and you poll every n seconds, there is an average delay of n/2 seconds between data becoming available and your application receiving it.

Instead of polling, a better option for many web applications is a WebSocket. Data availability is closer to real time, and the messaging is less frequent. This can be important for web applications used on mobile devices where unnecessary messaging can impact battery life.

This approach uses the publish-subscribe pattern. The frontend makes subscriptions to a backend service, indicating topics of interest. The backend service receives messages from publishers, which are upstream processes in the application. It filters the messages and routes to the appropriate subscribers.

Although powerful, this can be complex to implement due to connectivity issues over networks. For a web application, users may turn off their devices, disconnect Wi-Fi, or become unreachable due to limited coverage. This pattern is generally forward-only, meaning you only receive messages after the point of subscription.

AWS IoT Core simplifies this process, and the JavaScript SDK handles the common reconnection issues. The backend application sends messages to topics in AWS IoT Core, and the frontend application subscribes to topics of interest. The service maintains the list of active publishers and subscribers, and routes messages between the two. It also automatically manages fan-out, which occurs when there are many subscribers to a single topic.

From a pricing perspective, this is also a cost-efficient approach. At the time of writing, AWS IoT Core costs $0.08 per million minutes of connection, and $1.00 per million messages. There are also no servers to manage, and the service scales automatically to handle your application’s load.

In the example application, the real-time connection is configured and managed in a single component, IoT.vue. This initiates a connection to an IoT endpoint when the application first starts, and listens for messages on subscribed topics. It passes data back to the global Vuex store so other components automatically receive updates with no dependency on the IoT component.

Choosing publish-subscribe topics for web apps

In a typical synchronous API call, the client application makes a specific request and receives a response from a backend service. With a topic-based subscription, the topic itself is the equivalent of the request, but you usually don’t receive immediate information.

In this web application, there are a number of topics that are potentially important to users. Some topics are shared across multiple users, while other are private to a single user:

  • Account-level topic: messages relating only to a single user ID, such as billing and notifications. These are intended for any devices where that user is logged in.
  • Per-question topic: when a user asks a question, they need alerts when new answers arrive. Each question ID maps to an individual topic. Anyone who asks or watches a question subscribes to this topic.
  • Geo-fenced alert topic: a user receives alerts when new questions are asked in their local area. In this case, the geohash of their location is the topic identifier. New questions are published to their geohash topics, and users within the same geohash area receive those messages.
  • A system-wide topic: this is a single topic that all users subscribe to. This is reserved for important messages for all application users.

In web applications, you subscribe to some topics when the application initializes, such as account-level or system-wide topics. Other subscriptions are dynamic. For example, you subscribe to a question ID topic only after posting a question, or subscribe to different geo-fence hashes when the user’s location changes.

Conclusion

This post explores the backend architecture of the Ask Around Me application. I compare the cost and features in deciding between REST APIs and HTTP APIs in API Gateway. I introduce geohashing and the npm library used to handle geo-location queries in DynamoDB. And I show how you can build real-time messaging into your web applications using the publish-subscribe pattern with AWS IoT Core.

Continue to part 3, where I cover queuing messages, data aggregation, and deploying the application with AWS Amplify Console.