AWS for Industries

Leveraging AWS Glue and Amazon Redshift to unlock insights from various points of sale for CPGs

While industries are recovering from the disruptions created by the COVID-19 pandemic, consumer packaged goods (CPGs) continue to deal with major challenges like inflation, changing consumer preferences, and fluctuations in the oil and energy markets. A recently published McKinsey study shows that, during the COVID-19 pandemic, 75 percent of US consumers switched brands or stores while 60 percent of them planned to incorporate new buying patterns and habits after the COVID-19 pandemic. Today, it is more important than ever for companies to get faster granular insights into the latest sales trends across geographies, brands, designated market areas, and retailers.

The first step to becoming data driven is to create a data lake containing point-of-sale (POS) data from all the sales channels. These channels include in-person retailers (Walmart, Target, Best Buy, or Tesco), online retailers ( or, for example), direct-to-consumer channels (DTC), and on-trade channels that can act as a single source of truth.

Introduction to POS Data Hub

A data hub is a central mediation point between various data sources and data consumers. It’s not a single technology but rather an architectural approach that unites storage, data integration, and orchestration tools. It serves as a single point of access for all data consumers, like applications, data scientists, or business users. It also helps manage data for various tasks, providing centralized governance and data flow control capabilities.

Over time, businesses felt the need to create separate data hubs for various sources, giving birth to concepts like Marketing Data Hub, POS Data Hub (POSDH), Supply Chain Data Hub, and more. A POSDH is a central point where data from various sales channels, retail, ecommerce, DTC, on trade, and other sources is stored and analyzed.

By creating a POSDH, CPGs can build a single source of truth to capture data from all major sales origins to help them make informed decisions on assortments, merchandising, pricing, demand, and sales. Some data sources of interest for a consumer brand are:

1. Sales data (sellout). Sales data is anything that can be measured in the sales process. For DTC and some retail channels, CPGs have access to the data from the end POS retail touch point. Revenue per sale, average customer lifetime value, net promoter score, and revenue per product are all examples of sales data that an enterprise can analyze. Enterprises use this data to determine the following:

  • Analyze product performance in each retail environment
  • Provide guidance on correct product assortment in store
  • Improve existing price-pack architecture
  • Prevent over- or understocking events
  • Understand and predict customer and shopping behavior

2. Sell-in data. CPGs maintain data on orders, transactions, shipment, inventory, material management, and capacity management in enterprise resource planning systems like SAP, Oracle, and NetSuite, to name a few. These data points are essential for understanding the sales-to-demand ratio, order reconciliation, and management.

3. Market or syndicated data. For offline sales data, consumer companies rely on data providers like Nielsen, IRI, and Ipsos. So-called syndicated data sources provide information about aggregated sales, brand positioning, ad performance, and more. The data is purchased from retailers by third-party market research agencies who classify it into known hierarchies to generate market insights before selling it to CPGs. This data reveals the following insights:

  • Spots opportunity gaps in assortments
  • Compares in-store product performance versus competitors
  • Identifies shifting consumer demand

4. Customer data. Customer data can be broken down into two groups: demographic data and personally identifiable information (PII). Demographic data is available through market research agencies like Nielsen and IRI, and PII customer data is the information shoppers provide themselves, in store or online, when interacting with CPG DTC channels. PII includes information on customer interests, behavior, demographics, and more. The following insights are available when coupled with accurate customer data:

  • Hyperpersonalized shopping experiences
  • Offer personalization
  • Ability to create a pleasant, enjoyable shopping experience

5. Inventory and product availability data. Retailers periodically share inventory and product availability data to CPGs for visibility of out-of-stock events and meeting demand on time in full (OTIF).

One of the biggest challenges of integrating these sources is building data pipelines to harmonize and normalize the data sources and store them in a data lake solution. By using Amazon Web Services (AWS), Sigmoid can facilitate various enterprises with a data lake and a ring of purpose-built solutions.

Reference architecture

Sigmoid worked with the AWS team to create a reference POSDH architecture that broadly comprises ingestion, storage, analytics, dashboarding, reporting, and management layers. Figure 1 shows the typical reference architecture for POSDH.

Fig. 1: Reference architecture for POSDH

Figure 2 shows the logical representation of POSDH and insights that it can unlock.

Fig. 2: POSDH with primary data sources and insights that it can unlock

Customer use cases that POSDH can facilitate

Below are listed the top three customer use cases that POSDH facilitates:

1. Reversing shrinking market share

A large multinational beverage company was losing its share of market (SOM) in the snacks category in South America. We worked with the customer teams to identify drivers of this market share loss and build artificial intelligence (AI)–driven solutions. Some of the customer’s core challenges included:

  • Difficulty tracking store capacities
  • Retailer noncompliance with inventory practices leading to data sharing errors
  • Lack of clear understanding of the effect of contributors (own and competitors’) on SOM at a granular level
  • Lack of available multivariate systems that could help analyze and provide business recommendations based on profitability, investments, competitor’s share, revenue growth rate, and more
  • No guided framework for testing and measuring results

Based on customer interviews and the data analysis, Sigmoid zeroed in on three solutions:

  • Intelligent order recommendation system: The solution analyzed inventory, sell-in and sellout data, profitability, and POS (active/inactive) data between two similar POS units (retailers). It shared recommendations like the number of new products, order quantity, and product rankings to the respective POS unit (retailer). The solution provided a 2 percent improvement in market share and a 1.5 percent improvement in portfolio-level profitability.
  • Competitor intelligence: Competitors use a variety of tactics, including assortments, packs, promotions, and marketing activity, to gain market share. Sigmoid identified two subsolutions to solve this issue based on market data and sellout data: assortment planning and key driver analysis. This in turn helped the customer improve the contribution margin across the entire snacks assortment by 3 percent and its country-level market share by 0.8 percent.
  • SOM prediction: Finally, Sigmoid built an AI-driven predictive model to improve category planning. This model helped to improve overall sales predictions, helping teams to make decisions for the next fiscal year.

2. Personalized shopping experience

Many CPGs sell products through on-trade sources (like restaurant chains), ecommerce ( and, for example), and DTC channels.

CPGs use POS data points to personalize offerings for each consumer segment. They achieve this by profiling consumers based on their preferences, glance views on DTC and ecommerce channels, and existing purchases. In the same way, Sigmoid helped a large garden supply company predict customer churn and identify customer segments. Figure 3 shows the logical representation of the solution.

Fig. 3: Customer segmentation: Data sources and solution flow

This CPG could identify different consumer segments with particular attributes based on the solution. Sigmoid used AWS Analytics and machine learning (ML) services like

  • Amazon Kinesis, a service to easily collect, process, and analyze video and data streams in real time, to autoingest clickstream data from DTC channels;
  • AWS Glue, which helps discover, prepare, and integrate data at any scale, to build ETL/ELT pipelines with online retailers, media data, demographics, and weather data;
  • Amazon EMR, which makes it simple to run and scale Apache Spark, Hive, Presto, and other big data workloads, to prepare and process large volumes of data; and
  • Amazon SageMaker to build, deploy, and operationalize ML models.

Figure 4 shows the technical architecture.

Fig. 4: ETL/ELT flow

Sigmoid also implemented a solution to classify product quality reviews and identify product issues for this customer. By using Amazon Comprehend, which helps derive and understand valuable insights from text within documents, Sigmoid built a natural language processing-based system to analyze customer reviews and ratings on so that the customer can identify dissatisfied customers, provide a better customer experience, reduce churn, and boost product quality. This solution helped the customer to reduce complaints by 20 percent per million and increased its repeat sales by 8 percent in one quarter. Figure 5 shows the logical representation of the solution.

Fig. 5: Product quality review and issue identification

3. Demand forecasting on Amazon data

Sigmoid and the AWS team have built a solution to automate data ingestion from Amazon Vendor Central (or AVC, a centralized portal where CPGs track their out-of-stock events, pure-profit margins, glance views, conversions, chargebacks, and more) into Amazon Simple Storage Service (Amazon S3)—object storage built to retrieve any amount of data from anywhere—or other processed data stores. This makes it possible for CPGs to predict demands accurately, meet OTIF requirements, measure sales effectiveness, and ultimately boost revenue growth. Figure 6 shows the reference architecture.

Fig. 6: Sigmoid and AWS AVC solution


POSDH helps CPGs to identify major sales channels, measure channel effectiveness, raise profit margins, create personalized offers, and enhance the customer experience. With access to essential data points, CPGs can make informed decisions on assortment, merchandising, commercial planning, pricing, demand planning, and sales. Furthermore, if the sales data is analyzed along with media/marketing data and syndicate data, CPGs can attribute marketing spends to sales touch points throughout the entire buying journey.

Blog_logo_box Sigmoid_contactAWS Partner Spotlight

Sigmoid delivers actionable intelligence for CPG enterprises. Sigmoid’s CPG analytics solution portfolio is specifically designed to equip CPG decision makers with targeted consumer insights to drive growth. Sigmoid’s expertise in CPG analytics helps companies build robust data infrastructures that simplify every step of managing big data in the CPG industry. By solving complex analytics use cases, brands can engage effectively with consumers, forecast demand accurately, optimize inventory levels, and take actions based on near-real-time sales data across the ecommerce and Retail Partners community.