Enriching Snowflake Data with Amazon Location Service and AWS Lambda
By Andrea Amorosi, Sr. Partner Solutions Architect – AWS
By Pascal Vogel, Solutions Architect – AWS
By Bosco Albuquerque, Sr. Partner Solutions Architect – AWS
Organizations are collecting more data than ever, including location data such as street addresses, zip codes, GPS coordinates, and other geospatial data types. Combining this location data with existing business data can lead to the discovery of valuable insights and enables organizations to solve meaningful business challenges.
The integration of geospatial data into the broader business intelligence and decision-making process is referred to as location intelligence. This enables organizations to optimize marketing and sales activities with geo-targeting, improve supply chain and route planning processes, or offer unique localized customer experiences.
While transactional business data is oftentimes readily accessible for consumption by data analysts or downstream business processes through organizational data platforms, it may require enrichment to be actionable for location intelligence. For instance, street addresses may need to be geocoded into coordinates or vice versa for certain data-driven use cases.
On Amazon Web Services (AWS), you can use the Snowflake Data Cloud to integrate fragmented data, discover and securely share data, and execute diverse analytic workloads. Snowflake is an AWS Specialization Partner and AWS Marketplace Seller with the Data and Analytics Competency. It supports analyzing structured and unstructured data and offers comprehensive support for analyzing geospatial data.
This post shows how you can enrich your existing Snowflake data with location-based insights using Amazon Location Service for location intelligence workloads.
Amazon Location Service is a location-based service that developers can use to add geospatial data and location functionality to applications. Customers can visualize data on a map, recommend routes, use geocoding to convert plain text addresses into geographic coordinates, use reverse geocoding to convert latitude and longitude coordinates into addresses, and monitor and track assets such as fleets of vehicles.
Amazon Location Service provides cost-effective location-based services (LBS) using high-quality data from global, trusted providers Esri, HERE, Open Data, and GrabMaps.
External functions are a type of user-defined function (UDF) that calls code stored and executed outside of Snowflake. UDFs allow you to extend built-in system functionality through and can be reused across your code once defined.
Figure 1 – Integration architecture overview.
Implementing the external functions requires resources both in your Snowflake account and outside of Snowflake; in this case in your AWS account.
In your Snowflake account:
- An external function is stored as a database object in Snowflake. Its definition contains the URL of the proxy service endpoint used when the function is called, which arguments it accepts, and which API integration to use to connect to the external service.
- An API integration describes the interface between Snowflake and an external service. This Snowflake object stores configuration and security information necessary to connect to the external service, such as access credentials. In case of this integration, this includes the AWS Identity and Access Management (IAM) role used to set up a trust-relationship for invoking the external service.
In your AWS account:
- Amazon API Gateway acts as a proxy service between Snowflake and the external function implementation in AWS Lambda. It receives and authenticates requests from Snowflake and passes back results.
- Lambda functions implement the external function logic for calling Amazon Location Service. By doing so, they act as the Snowflake remote service.
- AWS Secrets Manager is used to securely store and programmatically access the credentials used to access your Snowflake account.
- AWS AppConfig is used to store securely and programmatically access parameters such as the name of your Snowflake database and warehouse.
Using the Amazon Location Service Integration
Deploying the Integration
Follow the detailed instructions in this GitHub repository to deploy the solution to your AWS and Snowflake accounts. Once deployed, take a look at the available external functions or try out the following examples to get started.
Geocoding Location Data
Amazon Location Service supports geocoding location data, such as determining the coordinates (described as latitude and longitude) from a street address or place name via the SearchPlaceIndexForText API action. This integration supports several location data providers.
Geocoding a Single Address
To use the Esri data provider for geocoding, use the following external function:
In a Snowflake query, you can use the external function as follows:
The result of this query is raw JSON data:
Figure 2 – Result of geocoding a single address.
Take a look at querying semi-structured data to learn more about handling JSON-formatted result data.
Geocoding Multiple Addresses
You can also use the external functions to geocode multiple rows of data in one query. To try this out, create a table to hold sample data:
Insert sample data into the table:
Call the geocoding external function on the data:
As a result, we receive the geocoded
LONGITUDE of the addresses:
Figure 3 – Result of geocoding multiple addresses.
Reverse Geocoding Location Data
Amazon Location Service supports reverse geocoding a location (described by geographic coordinates as latitude and longitude) to a human-readable address via the SearchPlaceIndexForPosition API action.
Reverse Geocoding a Single Coordinate
To reverse geocode a single coordinate—for instance with the Esri data provider—this integration providers the following external function:
The external function returns multiple address data points such as city, municipality, and the full name and address of the point of interest as label. See the documentation for a full overview of the external function response.
Access the results in a Snowflake query as follows:
Figure 4 – Result of reverse geocoding a single coordinate.
Reverse Geocoding Multiple Coordinates
You can also reverse geocode multiple coordinates in a single query. To demonstrate this, empty the sample table and insert some new sample data:
To use Esri to reverse geocode an address from this set of coordinates, call an external function as follows:
As a result, we receive a human-readable address for each coordinate:
Figure 5 – Result of reverse geocoding multiple coordinates.
Best Practices for Location Intelligence at Scale
Managing Amazon Location Service Quotas
The default Amazon Location Service quotas implement rate limits for the API operations performed through the external functions. For instance, the default quotas allow 50 requests per second for both the
Depending on the size of your queries, you need to increase the quotas that apply to your AWS account. Request a quota increase by following the instructions in the Amazon Location Service Developer Guide.
Scaling Your AWS Lambda Functions
With automatic scaling, built-in high availability, and a pay-for-use billing model, AWS Lambda is well-suited as the compute platform for location intelligence workloads.
To ensure your Lambda functions can handle large numbers of invocations—for instance, when geocoding large datasets—make sure to manage Lambda function concurrency appropriately. In Lambda, concurrency is the number of requests your function can handle at the same time. For your Lambda functions, you can configure reserved concurrency and provisioned concurrency.
Configure reserved concurrency to guarantee the maximum number of concurrent instances for the Lambda function. Allocate provisioned concurrency before an increase in invocations to ensure all requests are served by initialized Lambda instances with low latency. Lambda also integrates with application auto scaling, allowing you to manage provisioned concurrency on a schedule or based on utilization.
Learn more about Lambda function scaling and configuring reserved and provisioned concurrency in the AWS Lambda Developer Guide.
Consider enabling API Gateway caching to increase the responsiveness of the integration and to optimize the cost of repeat queries.
Securing Your API Endpoints
The integration presented in this post follows security best practices such as storing your Snowflake credentials in AWS Secrets Manager and utilizing IAM to secure access to resources in your AWS account. To protect your Snowflake account, you should regularly rotate your Snowflake credentials and update them in AWS Secrets Manager.
It’s also possible to set up key pair authentication for your Snowflake account and configure key pair rotation following instructions in the documentation.
Furthermore, set up access logging for the API Gateway with Amazon CloudWatch to keep a record of who accessed your API endpoint and how. For an overview of security recommendations for API Gateway, refer to security best practices in Amazon API Gateway.
Extending the Amazon Location Service Integration
The integration presented in this post serves as a starting point for integrating Snowflake and Amazon Location Service. Depending on your location intelligence use case, there are more capabilities of Amazon Location Service you can leverage.
These include, for example, geofencing, routing capabilities such as determining distance or travel time, generating autocomplete suggestions for locations, and more. Consult the Amazon Location Service API Reference to understand all of the available API actions and data types at your disposal.
This solution can be extended using the Snowflake Native Application Framework which allows user to build, distribute, and use data applications on Snowflake. Application providers or producers can use Snowflake’s core functionality to build data applications, and use Snowflake Marketplace to distribute and monetize them.
These data applications are deployed inside a consumer’s Snowflake account. Data application producers get a wide global audience of Snowflake customers, while customers who consume these data applications keep their data centralized, simplifying their application procurement process.
Location intelligence can contribute to drawing value from your business data by integrating geospatial data into the broader business intelligence and decision-making process.
This post demonstrates how you can enrich your business data stored in the Snowflake Data Cloud with location-based insights using Amazon Location Service for location intelligence workloads.
Consult the GitHub repository for the Amazon Location Service with Snowflake solution presented in this post to learn more about its architecture, capabilities, and for detailed deployment instructions. Open a GitHub issue to provide your feedback or create a pull request to extend the solution.
The Amazon Location Service Developer Guide provides an overview of concepts, functionality, and usage instructions.
To learn more about working with location data in Snowflake, take a look at geospatial data types and the following posts on the Snowflake blog:
- Getting Started with GEOGRAPHY Data—Why Don’t Planes Fly in a Straight Line?
- Getting Started with GEOMETRY Data—When a Map Is More Than a Map
Contact Snowflake to learn more about realizing the full potential of your data for location intelligence workloads and beyond.
Snowflake – AWS Partner Spotlight
Snowflake is an AWS Partner that supports analyzing structured and unstructured data and offers comprehensive support for analyzing geospatial data.