AWS Partner Network (APN) Blog
Batch Geocoding with Precisely Geo Addressing on Amazon EMR
By Diana Smith, Principal Sales Engineer – Precisely
By Mayank Kasturia, Solution Architect – Precisely
By Tamara Astakhova, Sr. Partner Solution Architect – AWS
Precisely |
Accurate address data is vital for business decision-making. Inaccurate addresses can compromise business analytics, increasing risks, fraud potential, and costs – issues that multiply when handling large volumes of data. Geo Addressing, which combines address verification, geocoding, and autocomplete solutions, offers a comprehensive approach to these challenges. Precisely helps accelerate the use of address data and quickly associate relevant contextual information to power faster, more confident decision-making with Geo Addressing solutions.
Precisely, a global leader in data integrity, provides data accuracy, consistency, and context for 12,000 customers worldwide. Precisely is an AWS Data and Analytics ISV Competency and AWS Migration and Modernization ISV Competency partner with service specializations in Amazon Redshift and Amazon Relational Database Service (Amazon RDS).
This blog describes how using the Precisely Geo Addressing solution on Amazon EMR users can take large address datasets, create an accurate master dataset, and extract valuable metadata for further analysis at scale.
Batch Geo Addressing Solution Overview
Amazon EMR is a managed cluster platform that simplifies running big data frameworks. Coupled with Precisely’s Geo Addressing solution and vast enrichment data portfolio, users can quickly take massive volumes of address data, assign a location and unique identifier, and enrich using simple joins.
This solution is also easily deployable on Amazon EMR Serverless.
Figure 1 – Geo Addressing on Amazon EMR Architecture
Solution requirements:
- A Precisely Geo Addressing license and access to the Precisely Data Experience.
- An AWS Account with PowerUserAccess to the AWS Management Console.
- Basic knowledge of AWS and working knowledge of geo addressing concepts.
Getting Started with Precisely Geo Addressing on Amazon EMR with a few simple steps:
- Step 1: Retrieve the software distribution and reference data from the Precisely Data Experience.
- Step 2: Create an EMR cluster and select the Spark application as the software to install on the cluster. Follow Amazon EMR security best practices to meet the security and compliance objective for your business. The Precisely Geocoding Spark driver application, pre-built and delivered with the Spectrum Geocoding for Big Data toolkit, executes the geocoding Spark job. Ensure the Geo Addressing software distribution and geocoding reference data are stored in Amazon Simple Storage Service (Amazon S3). To adhere to the least privilege principle for all services used in this solution, ensure users and services have the minimal access necessary. To deploy the Geocoding application on an EMR cluster, make sure you attach “AmazonElasticMapReduceRole” policy to your role.
- Step 3: Submit the Spark job via the command line or a script to the cluster. Spark 2 and Spark 3 are supported, and the distribution includes a driver jar file for both versions. The user specifies the input, Spark job codes, and locations of output data which can be stored in Amazon S3, Hadoop Distributed File System (HDFS), or another location. The result of a geocoding Spark job is a CSV or Parquet-formatted file with columns from the input and additional requested output fields written back to S3. Reference the user guide for a script sample and instruction.
Terminate the cluster and remove associated resources to avoid unnecessary charges upon completion.
Sample Output of the Geo Addressing Solution
The result of the geocoding Spark job is a CSV file with columns from the input and requested output fields indicating how well the input address matched Precisely’s address database. Response fields to pay attention to include Match Code and Location Code, which provide indicators of the address’s positional accuracy and the strength of the database match. The solution also provides the PreciselyID, a unique and persistent identifier for data enrichment and management. See Table 1 below for a description of each component.
Response Field | Description |
Match Code | Indicates which portions of the address matched to a record in the database. In this example, S903 Signifies the address was standardized to USPS data with changes to the ZIP+4, the street type, street name, and predirectional/postdirectional required to get a match. |
Location Code | Indicates the locational accuracy of the assigned geocode. In this example, AP05 signifies a point-level geocode at the structure centroid. This is the highest accuracy available. |
PreciselyID | A unique and persistent identifier for data management and simple joining to Precisely’s extensive data portfolio. |
Table 1 – Key Response Fields
For example, the following fabricated input address: 49 RANDOM ST, UNIT 100, ANYCITY, CA, the Precisely Geo Addressing solution will return the following address and key metadata:
Field | Response |
Formatted Address | 49 RANDOM ST, UNIT 100, ANYCITY, CA |
Match Code | S903 |
Location Code | AP05 |
Latitude | 41.283772 |
Longitude | -72.811657 |
PreciselyID | P000046IAA3W |
Table 2 – Sample Data with Key Response Fields
The PreciselyID can be used to accelerate enrichment by joining to hundreds of other datasets to add context. Below are sample attributes added using a simple join on the PreciselyID.
Property Attributes | |
PreciselyID | P000046IAA3W |
Property Attributes ID | A00000GDSD45 |
Building Name | Smith Tower |
Square Footage | 710 |
Type | MCE |
Owner Name | John Smith |
USPS Delivery Point Validation | |
Deliverable | Yes |
Fire Protection | |
Fire Dept ID | 107532 |
Type | Mostly Volunteer |
Drive Distance | 0.75 |
Drive Time AM Peak | 2.4 |
Table 3 – Sample Attributes
Benefits of Using the Precisely Geo Addressing Solution and data enrichment capabilities on AWS
- Users can reap the benefits of AWS cloud deployment and cloud-native technology, including elasticity, high availability, scaled costs, and more to accelerate workflows.
- Users can boost their cloud-native microservices with location capabilities, including address management, geo addressing, and location analytics fast and efficient.
- Users benefit from Precisely’s address verification capabilities, high-accuracy geocoding, and high-precision location coordinates (lat/long), ensuring trusted data for confident business decisions.
- Users can accelerate workflows from hours to seconds using the PreciselyID. Using a unique and persistent ID simplifies joining multiple datasets, adds context to business data, reveals hidden relationships, helping to organize, manage, analyze, and visualize business data.
Customer Case Study: Accelerated Market Value Determination Process for Major Mortgage Provider
A leading mortgage provider wanted to enhance access to affordable mortgage options across diverse markets. The company recognized the importance of accurate market value determination for efficient loan processing and sought a solution to expedite this critical process.
Using the Precisely Geo Addressing solution on Amazon EMR, the company significantly improved the accuracy and speed of determining property values. The robust combination of Precisely’s advanced geocoding capabilities and Amazon EMR’s scalable and flexible infrastructure allowed the company to process vast amounts of property data efficiently. This capability can provide throughput of up to 200 transactions per second per core, drastically reducing processing time. With precise geolocation data and comprehensive address verification, the company was able to automate, improve operational efficiency, eliminate errors and streamline its market value determination workflows. This reduced manual effort, eliminated errors, and enabled faster loan approvals, enhancing customer satisfaction and trust in their services.
The company expanded their mortgage offerings to previously underserved markets, providing affordable financing options to a broader range of customers. The accelerated market value determination process allowed them to process loan applications with greater agility, ensuring timely decisions and faster closings. This success story is a testament to the transformative power of leveraging cutting-edge technology to achieve business objectives while delivering exceptional customer experiences.
Conclusion
In this post, you learned how Precisely’s Geo Addressing capabilities can be deployed on AWS infrastructure to accelerate analysis and power real-time decision-making.
For more information about the solution or to see a demonstration, please contact Precisely or test a few of your addresses on the sample site.
Precisely – AWS Partner Spotlight
Precisely is an AWS Advanced Technology Partner and AWS Competency Partner that provides data accuracy, consistency, and context for 12,000 customers worldwide. Precisely’s data integration, data quality, data governance, location intelligence and data enrichment products power better business decisions that create better outcomes. Together, AWS and Precisely deliver the flexibility and agility you need to align real-time data.