AWS for Industries

Ingest Amazon Retail Data into a Serverless Modern Data Architecture

Consumer packaged goods (CPG) companies, which sell a vast number of products through Amazon.com, need scalable, cost-effective, and accessible data visibility into Amazon Seller and Vendor Central. This enables the management of orders, keeping product catalogs up-to-date, accessing inventory insights, and tracking sales, shipments, and payments.

Over the past several years, CPG companies have outsourced or built tools to scrape data from Amazon to get the necessary information. Recently, Amazon created the Selling Partner API (SP-API) to make Amazon retail data programmatically accessible for developers.

In this blog post, we will provide an overview of a Retail and CPG client application leveraging Amazon Web Services, Inc. (AWS) serverless and managed services, and how to integrate that application with a Modern Data Architecture using SP-APIs. By leveraging AWS serverless services, an IDC study found that organizations were able to drive business value in four main areas: cost savings, staff productivity, operational resilience, and business agility. This solution provides the additional value of pulling in valuable Amazon retail data, which can help organizations to make better data-driven decisions.

This solution provides a best practice approach to help CPG companies achieve their Amazon retail data analytics goals by providing an alternative to scraping data—enabling a more modern and efficient API approach. This solution is relevant to sellers, vendors, and third-party Amazon sellers who manage multi-brand Amazon sites.

Overview of solution

Using this solution customers are able to land their Amazon data in their own storage bucket or into their data lake on AWS. AWS works closely with Amazon SPDS (Selling Partner Development Services) and can stay ahead of any changes to the data APIs for interfacing and ingesting data from Amazon Seller and Vendor Central. The AWS solution removes this complexity and speeds up the delivery of the solution for customers.

Figure 1 - High-level overview of Amazon Seller and Vendor Central Data ProducerFigure 1 – High-level overview of Amazon Seller and Vendor Central Data Producer

Walkthrough

The AWS solution consists of four main components:

  • Authentication and Authorization
  • Serverless Reports Application
  • Serverless Catalog Items and Listing Items Applications
  • Data Storage, Movement, and Insights

Figure 2 - High-level overview of Amazon Seller and Vendor Central Data Producer Authentication and AuthorizationFigure 2 – High-level overview of Amazon Seller and Vendor Central Data Producer Authentication and Authorization

Authentication and Authorization

In order to interact with the Selling Partner APIs (SP-APIs), we must first register as a developer. Since this is either a private seller or vendor application, we follow the steps from the SP-API documentation entitled To register as a private developer for private seller applications or To register as a private developer for private vendor applications.

The core component of our Application is AWS Step Functions. This is a serverless orchestration service which allows us to centrally manage a workflow. The steps of our AWS Step Function that make API calls to the SP-API endpoint. To register our application with the SP-API, follow the steps from the SP-API documentation, entitled Registering your Application.

The authorization model for the SP-API is based on Login with Amazon, Amazon’s implementation of OAuth 2.0. Since it is a private application, we use a self-authorization procedure from the SP-API documentation entitled To self-authorize your application (seller application) or To self-authorize your application (vendor application). When we authorize the application, a Login with Amazon refresh token appears each time we choose “Authorize App.” A Login with Amazon refresh token is a long-lived token that we will exchange later for an access token. It’s important to mention that choosing “Authorize App” multiple times will generate a new refresh token each time. Generating a new refresh token does not invalidate previous refresh tokens. If we have multiple seller accounts or vendor groups, we can save a refresh token for each one. To securely store our refresh tokens, we create a secret in AWS Secrets Manager. AWS Secrets Manager is a secrets management service which enables us to rotate, manage, and retrieve our Login with Amazon refresh and access tokens.

For each API call we make to the SP-API, we must include a Login with Amazon access token. To do this, for each application we create, an AWS Lambda function will be used as an authentication function. The authentication function follows this workflow:

  1. Check with AWS Secrets Manager for a valid Login with Amazon access token
  2. If no valid Login with Amazon access token exists, call AWS Secrets Manager to retrieve a Login with Amazon refresh token
  3. Make secure HTTP POST to the Login with Amazon authentication server
  4. A successful response includes a Login with Amazon access token along with an expires_in value represented in seconds (a Login with Amazon access token expires in one hour after it is issued)
  5. The Login with Amazon access token is then cached in AWS Secrets Manager and can be used for additional calls before it expires to avoid having to retrieve a new access token before each call

Figure 3 - High-level overview of Amazon Seller and Vendor Central Data Producer Serverless Reports ApplicationFigure 3 – High-level overview of Amazon Seller and Vendor Central Data Producer Serverless Reports Application

Serverless Reports Application

Now that we have discussed how to register and authorize an SP-API application, we will cover how to build out different applications, starting with the Reports Application. This Reports Application is a serverless application which will interact with the Reports API of the SP-API. For Sellers, our Reports Application is designed to automatically retrieve and process the reports we create through the use of notifications. However, Vendor applications do not yet support the REPORT_PROCESSING_FINISHED notification type, and must instead use a polling method to retrieve reports. This section will cover building the automated notifications workflow for seller applications.

In order to receive these notifications, we must first subscribe to the notification type of interest. This architecture leverages an AWS Lambda function to subscribe our application to the REPORT_PROCESSING_FINISHED notification type. To automate this workflow, we leverage Amazon Simple Queue Service (Amazon SQS) by following this tutorial: Set up notifications (Amazon Simple Queue Service workflow).

Now that we have our notifications configured, we can create reports. To do this, AWS Step Functions is used as our serverless orchestration service to centrally manage the workflow. Within our AWS Step Functions workflow, an AWS Lambda function gets a Login with Amazon access token (described in the Authentication and Authorization section prior). It is used to make the createReport API call to the Reports API using regional endpoint, marketplace ID, and report configuration data stored in AWS Systems Manager Parameter Store. The SP-API will then create this report, and upon completion a REPORT_PROCESSING_FINISHED notification event will be sent to our Amazon SQS queue, which provides information when the report processing is CANCELLED, DONE, or FATAL. This notification event triggers an AWS Lambda function which processes the notification. If the notification event has a status of DONE, a reportDocumentId will be included. This will be passed to a data processing function in our AWS Step Functions workflow. The data processing function uses the reportDocumentId to make a getReportDocument call to the SP-API. The SP-API returns a pre-signed URL for the location of the report document and the compression algorithm used if the report document contents have been compressed. This is then passed to our next AWS Lambda function, a storage function which downloads the report, decompresses it if needed, and stores the report document in an Amazon Simple Storage Service (Amazon S3) bucket.

Now that the report data is in S3, it can be consumed by downstream analytics applications, which we talk more about later. AWS Key Management Service (AWS KMS) is used throughout this architecture to provide secure encryption. AWS KMS allows us to centrally manage encryption keys, which can be used to encrypt our secrets in AWS Secrets Manager and our data stored in Amazon S3 and AWS Systems Manager Parameter Store.

Figure 4 - High-level overview of Amazon Seller and Vendor Central Data Producer Serverless Catalog Items and Listing Items ApplicationsFigure 4 – High-level overview of Amazon Seller and Vendor Central Data Producer Serverless Catalog Items and Listing Items Applications

Serverless Catalog Items and Listing Items Applications

The Catalog Items and Listing Items applications are slightly different than the Reports application because the SP-API does not support notifications for these APIs. However, the same design principles were used for creating these applications. AWS Step Functions is used as a serverless orchestration service to centrally manage our workflow. Within this AWS Step Functions workflow, an AWS Lambda authentication function obtains a Login with Amazon access token (described in the Authentication and Authorization section prior). This token is passed to the data processing function which makes an API call to the SP-API using regional endpoints and marketplace IDs stored in AWS Systems Manager Parameter Store, and Amazon Standard Identification Number (ASIN), Stock Keeping Unit (SKU), and Seller IDs stored in Amazon DynamoDB. When a response is returned, the data is passed to a storage function, which then stores the data in Amazon S3.

Figure 5 – High-level overview of Amazon Seller and Vendor Central Data Producer Data Storage, Movement and Insights

Data Storage, Movement, and Insights

Now that our Amazon retail data has been ingested by way of the serverless applications we created, we can use AWS analytics services to structure, move, and gain insights from that data. Amazon S3 is the main storage service used for our data lake. Amazon S3 is an object storage service capable of storing and retrieving any amount of data from anywhere. AWS Lake Formation is used to create our scalable and secure data lake. With AWS Lake Formation we can ingest, clean, catalog, transform, and secure our data. Lake Formation provides a central location for us to configure granular data access policies, enabling us to protect our data regardless of which services are accessing the data.

For seamless data movement, AWS Glue and AWS Glue DataBrew are used. AWS Glue is a serverless integration service that makes it straightforward for data engineers and ETL (extract, transform, and load) developers to create, run, and monitor ETL workflows with AWS Glue Studio. AWS Glue DataBrew provides an interactive point-and-click visual interface that enables data to be enriched, cleaned, and normalized without writing code.

After processing and preparing our data, there is a whole suite of purpose-built AWS analytics services we can use to consume this data. To learn which AWS purpose-built analytics services may be the best fit for your organizations use case, under the AWS Analytics service section please view the “Predictive analytics and Machine Learning” section and under the Solutions Areas view the “Analytics and Data Warehousing” tab.

Conclusion

In this blog post, we demonstrated how CPG companies can build applications leveraging AWS serverless and managed services to securely integrate with the Selling Partner API. This solution shows how to design for authentication, authorization, and API integration in order to ingest your Amazon retail data into your AWS account. Once the data is ingested, it is then stored in a secure and scalable data lake. Purpose-built AWS analytics services can then be incorporated to move, process, and gain valuable insights from your Amazon retail data by using best practices for building a Modern Data Architecture on AWS.

This solution enables CPG companies to push towards becoming a more data-driven organization by pulling in valuable Amazon retail data to promote more data-driven decisions. In addition, this AWS serverless-based modern architecture approach can also help companies to realize additional business value by way of cost savings, staff productivity, operational resilience, and business agility.

To learn more about AWS ecommerce solutions for CPG, contact an AWS Representative to get started today, or visit the AWS for Consumer Packaged Goods homepage.

Further Reading

Manikanta Gona

Manikanta Gona

Manikanta Gona is a Data and ML Engineer at AWS Professional Services. He joined AWS in 2021 with 6+ years of experience in software engineering in test (data), and teaching experience as community faculty. At AWS Mani currently is engaged with global CPG customers to implement data lake structure around their retail data, generating insight using AWS Services which drives business decisions for the CPGs. He joined AWS from Blue Cross and Blue Shield of Minnesota where he helped design strategy, automation, test design using AWS Services and open sources. He also collaborated across teams in migrating their database from MySQL to Amazon Aurora, modernizing their front-end and back-end application including SAS based products using opensource frameworks.

Daman Orestad

Daman Orestad

Daman Orestad is a Guidance Solutions Architect (SA) for retail and Consumer Packaged Goods (CPG) solutions. Daman Joined AWS in 2019, and currently leads the Solution Guidance program for retail and CPG, which provides prescriptive technical advice to help retail and CPG customers to build solutions on their own, leveraging AWS services. Daman brings a wealth of AWS architecture experience helping AWS customers of all sizes, from small start-ups to large enterprises, to come up with innovative and cost-effective solutions to company challenges. He leverages this experience to develop and publish easy-to-consume retail and CPG industry-oriented, secure, effective, and efficient solutions to real-world retail and CPG company challenges. Daman’s passions include automation, analytics, and serverless event-driven architectures.

Hina Vinayak

Hina Vinayak

Hina Vinayak is a Sr. Solutions Architect on the Selling Partner API team at Amazon. With 5+ years of experience in the technology industry, Hina has a proven track record of designing and implementing scalable and reliable solutions for Selling Partners of all sizes. She specializes in leveraging Selling Partner APIs to create innovative and customized solutions that help Amazon Selling Partners improve their solutions and productivity. With her technical expertise and customer-focused approach, Hina is committed to helping businesses succeed on leveraging AWS Services and Amazon SP-API.

Jani Syed

Jani Syed

Jani Syed is the Principal Architect, Analytics Specialist in AWS Industries - Strategic Accounts. He joined AWS in 2019 with 23+ years of experience. He worked in various sectors like Telecommunications, Banking, Finance, Insurance, Retail consumer products, Manufacturing and Services. He also founded startup companies in big data and analytics. Apart from regular work he mentors students and tech professionals in their career growth.

Norman Kwong

Norman Kwong

Norman Kwong is the Business Development Leader for AWS in the CPG industry. He joined AWS in 2019 with 20 years of experience leading sales, marketing, and business transformation within CPG, Retail, and Industrial/Wholesale distribution industries. At AWS, Norman helps global CPG customers build brands consumers love, increase organizational agility to react to market opportunities, and drive operational efficiency with proven, industry-specific innovations and solutions from AWS. He joined AWS from S.C. Johnson & Son Inc., where he successfully launched their Global Professional division serving commercial and institutional channels as the Head of Sales Customer Marketing. He also led various brands, developed new products, and drove retail category growth in S.C. Johnson’s consumer retail business.