ShopFully Improves Marketing Campaign Efficiency by 6x Using AWS Glue

2021

Italian technology company ShopFully wanted to maximize the processing speeds for large amounts of data that were ingested from local shoppers’ use of its app across hundreds of local regions. To manage its data, the company had been using a legacy data warehousing infrastructure and relational database, but these couldn’t scale to the company’s growing demands efficiently and affordably. To meet its goal of processing more than 100 million events in under 20 minutes, ShopFully built a new solution using Amazon Web Services (AWS). ShopFully significantly improved its ability to adjust its marketing campaigns in near real time, optimized its data processing times, and found a cost-effective way to scale automatically as it grows.

ShopFully team gathers in a meeting space to discuss priorities.
kr_quotemark

Using AWS Glue has been an integral part of our data integration strategy. Our retailers and brands have seen new metrics, and our use of AWS is helping us to increase performance.”

Giuliano Formato
Head of Data Engineering, ShopFully

Searching for Speed and Scalability

Founded in 2011 and headquartered in Milan, Italy, ShopFully simplifies local shopping for both retailers and customers by providing a way for local stores to show their prices, products, promotions, and other information through a mobile app or online. More than 320,000 local stores use ShopFully to reach 40 million consumers who are looking to find the best deals from nearby stores. In turn, consumers can save money and time using the information that ShopFully provides, which a shopper might otherwise have to find using printed catalogs. Also, ShopFully offers its own marketing intelligent platform—called Hi!—which helps more than 700 partners globally to create and deliver digital campaigns to multiple channels, including ShopFully’s own online marketplaces.

In looking to build a new real-time architecture, ShopFully was hoping to remove the burden of supporting real-time applications from its data warehouse and create more space to better support analytics. The extract, transform, load (ETL) tool required a lengthy process of extracting data from the original source, cleaning it, and reloading it into the target relational database. This resulted in frequent data load issues that required manual intervention to resolve. Plus, because the process occurred in near real time, ShopFully required 2 hours to load information into its data warehouse for each individual campaign, making it difficult for the company to meet the parameters of its service-level agreements. Furthermore, the relational database had become expensive to use at scale for a company operating from eight offices on three continents, organizing hundreds of hyperlocal advertising campaigns every month, and fielding hundreds of millions of responses a day.

Finding Efficiency Using AWS Glue

ShopFully created a new architecture that was six times quicker than its previous data pipeline process, achieving its goal of processing 100 million events in under 20 minutes. The company also decreased the cost of running its pipeline by 30 percent. The pipeline begins with the users of ShopFully’s website and mobile app, which is where the company delivers its marketing campaigns. These services track the activity of users and send events in real time to a set of data endpoints. To move its data ingestion endpoints closer to users, ShopFully uses Amazon CloudFront, a content delivery network service built for high performance, security, and developer convenience. And behind these endpoints, the company uses Lambda@Edge, a feature of Amazon CloudFront that lets developers run code closer to users of their applications, which improves performance and reduces latency. ShopFully doesn’t have to provision or manage infrastructure, and it pays only for the compute time it uses, which saves on costs. Capturing and analyzing data from the edge—such as web click data or the time a user spends reviewing a page—helps the company reveal insights to improve the performance of its campaigns. ShopFully’s solution uses Lambda@Edge to push all of the data to Amazon Data Firehose—an easy way to reliably load streaming data into data lakes, data stores, and analytics services—which then offloads the data securely without loss. The solution then stores that data in raw format using Amazon Simple Storage Service (Amazon S3), an object storage service offering industry-leading scalability, data availability, security, and performance.

The company runs hundreds of thousands of marketing campaigns every year, and by using AWS, it can now process each campaign separately to track performance. At the core of ShopFully’s processing capabilities is AWS Glue, a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. By using AWS Glue, ShopFully can automate the ETL services that had taken up time on its previous solution and create individual clusters for each campaign, reducing latency and efficiently handling petabytes of data. Furthermore, AWS Glue uses job bookmarks to track data that has been processed by a previous run of an ETL job. Bookmarks prevent the reprocessing of old data, meaning that ShopFully saves time and money by automatically tracking new files instead of doing it manually. The framework that ShopFully uses in its AWS Glue–managed ETL is Apache Spark, an analytics engine for large-scale distributed data processing that runs alongside ShopFully’s AWS technology stack, performing complex calculations in near real time using hundreds of metrics that provide insights on campaign efficiency. Using the functionality of AWS Glue helps ShopFully complete data processing and meet deadlines outlined in its service-level agreements. “Using the faster performance of AWS Glue, the management of our advertising campaigns has shifted from batch to near real time,” says Giuliano Formato, head of data engineering at ShopFully. “Our pipelines are six times faster, and our teams focus on business outcomes instead of server maintenance.”

The company stores the final metrics using Amazon Relational Database Service (Amazon RDS), which makes it easy to set up, operate, and scale a relational database in the cloud. ShopFully can share aggregated data with retailers and brands seeking marketing leads. This in turn helps improve the shopping experiences of ShopFully’s users, who receive targeted messaging based upon the data analysis.

Moving toward Near-Real-Time Data Streaming

Using its suite of AWS solutions, ShopFully has improved the efficiency of the delivery of its marketing campaigns. Because it has confidence in the accuracy of its metrics, the company no longer has to push additional ad content after it reaches campaign targets. ShopFully now plans to improve the speed of data processing even further by using streaming ETL jobs in AWS Glue, which consume, cleanse, and transform data. “Using AWS Glue has been an integral part of our data integration strategy. Our retailers and brands have seen new metrics, and our use of AWS is helping us to increase performance,” says Formato.

ShopFully Reference Architecture

Click to enlarge for fullscreen viewing. 


About ShopFully

ShopFully is a technology company that simplifies local shopping for retailers and brands as well as over 40 million customers by providing a way for local stores to show their prices, products, promotions, and other information.

Benefits of AWS

  • Processes 100 million events in under 20 minutes
  • Decreased the cost of running its data pipeline by 30%
  • Improved data pipeline efficiency by 6x
  • Simplified operations by using serverless solutions
  • Scaled up to handle petabytes of data
  • Saves developer time in provisioning and managing infrastructure

AWS Services Used

AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.

Learn more »

Amazon CloudFront

Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment.

Learn more »

Amazon Data Firehose

Amazon Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services.

Learn more »

Amazon RDS

Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud.

Learn more »


Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.