AWS Machine Learning Blog

Capture and Analyze Customer Demographic Data Using Amazon Rekognition & Amazon Athena

This blog post includes a link to code that is no longer available. We are actively working to replace the code and we’ll update this post as soon as it’s available. 

Millions of customers shop in brick and mortar stores every day. Currently, most of these retailers have no efficient way to identify these shoppers and understand their purchasing behavior. They rely on third-party market research firms to provide customer demographic and purchase preference information.

This blog post walks you how you can use AWS services to identify purchasing behavior of your customers. We show you:

  • How retailers can use captured images in real time.
  • How Amazon Rekognition can be used to retrieve face attributes like age range, emotions, gender, etc.
  • How you can use Amazon Athena and Amazon QuickSight to analyze the face attributes.
  • How you can create unique insights and learn about customer emotions and demographics.
  • How to implement serverless architecture using AWS managed services.

The next section describes the basic AWS architecture.

How it works

The following diagram illustrates the steps in the process.

This is what happens in greater detail:

  1. You place the images in an Amazon Simple Storage Service (S3) bucket. This triggers the Lambda function.
  2. The Lambda function calls the Rekognition service to extract the image label information.
  3. Image attributes are stored in the .csv format in another S3 bucket.
  4. The Amazon Athena service reads all of the face attributes in the .csv files and loads the data for ad-hoc queries.
  5. We use Amazon QuickSight to build the customer insight dashboards.

Putting it together

Note:  Before you start, ensure that all the services you need to work with are available in the AWS Region where you plan to implement this solution.

Configuring an IAM role:

You need to create an IAM role to give Lambda function the access to read and write to Amazon S3, and to invoke the Rekognition APIs.

    1. Open the IAM console, and choose Users.
    2. Choose Create new role.
    3. Choose AWS Service in Select RoleType section.
    4. Choose the AWS Lambda as an option.
    5. Choose  “Next:Permissions” button.
    6. Choose Attach Policy and then select the following policies: AmazonS3FullAccess, AWSLambdaExecute, and AmazonRekognitionFullAccess.
    7. Choose Next:Review.
    8. Choose a name in the Role name field and provide a description for the role in the Role description field.
    9.  Choose the Create role button.

    Configuring the Amazon S3 bucket

    Let’s create the S3 bucket where image files will be dropped and processed In the AWS Management Console, under Storage, choose S3.

    To create a new S3 bucket, choose Create bucket.

    For Bucket name, provide a unique name. For Region, select the Region where you want to create the bucket.

    Note: Ensure that you create the bucket in the same Region where the rest of the application will be. The bucket name must be globally unique.

    In the next page, leave the default setting and choose Next. Review the settings and then choose Create Bucket.

    In the bucket, create the following folders:



    This bucket will be used for holding the output in both json and .csv files.

    Configuring AWS Lambda with Amazon S3

    In the AWS Management Console, under Compute select Lambda.

    Choose Create a function, if this is the first time you are creating a Lambda function.

    Choose Select blueprint. On this page, choose the S3-get-object-python option.

  1. On the Basic Information page, fill in following information:
    1. Name for the Lambda Function.
    2. In the Role drop-down list, select “Choose an existing role”
    3. In the Existing Role drop-down list, select the role we created in earlier section.
  2. On the S3 page, fill in following information:
    1. In the Bucket drop-down list, select bucket name you created earlier.
    2. In the EventType drop-down list, select “Object Created(All)”
    3. In the Suffix field specify jpg. This is to ensure that the Lambda function won’t get triggered if other types of files get placed in the bucket.
    4. Choose the Enable trigger check box.

    On the next page you will see auto generated lambda function code. Choose Create function button.

    You will see following screen on successful creation of lambda function.

    You can verify by checking it in the S3 console. Choose Bucket name and then choose the Properties tab. In the Advanced settings section choose the Events button.

    Configuring Lambda function

    Copy the Lambda function from the following link , paste it in the code section, and save it.

    Let’s add in the code, which will actually push the metadata coming from every trigger generated by the object creation events. At the high level the Python code will perform following tasks:

    • Pick up an image from S3
    • Call Rekognition to get the labels for each image
    • Create a .csv file for the corresponding image file
    • Write the .csv file to the S3 bucket for downstream processing

    Here is a snapshot of the Lambda Python code:

    Here is the code snapshot, which invokes the Rekognition service.

    Writing labels to the .csv file

    Writing the .csv file to the S3 bucket

    Now let’s drop some files on the input S3 bucket and generate the .csv files. Output of .csv files will look like this:

    Link to the sample image files:

    Download these images locally, and then upload them to the bucket created earlier in this section.

    Configuring the Athena service

    Open the AWS Management Console. From the Services menu, select Athena.

    Download the Athena table definition from the following link:

    Using this link, copy the SQL statement and paste it in the query editor. You need to replace the bucket name, db_name, and the table name parameters before executing the statement. The bucket name is the same as the bucket name we created in the earlier section. If you don’t have a database already created in Athena, select “default” as the db_name from the Database drop-down list on. the left-hand side of the console. Following is the snapshot of the SQL statement:

    After you create the table, it will query the data from all the .csv files generated by the Lambda function. Go to the Athena Query Editor and run the basic “select *” query on the table.

    You will see the result set on the console. Now you can run various queries, such as how many customers visited between time x and y, how many of them were happy versus sad, their genders, and endless other possible combinations.

    Here is a simple query:

    select * from <database_name>.<table_name> where gender='Female'

    Here is another query to get the details of unhappy customers on a particular day:

    select imagelocation, gender, count(*) as count, lowagerange from <table_name> where imagelocation in(
    		select imagelocation from <table_name>
    		where timestamp between '<timestamp_from>' and '<timestamp_to>' and smiling='False')
    group by gender,imagelocation,lowagerange; 


  3. Note: Replace table_name, timestamp_from, and timestamp_to to the appropriate values for your query.
  4. And here is the output:

    Configuring the QuickSight service

    We saw scanned data from the S3 bucket and the result set of queried data in Athena. Now, let’s use QuickSight to build a real-time dashboard to showcase business insight, as well as various possible customer demographic trends.

    Log in to the QuickSight service menu and select Athena as the data source.

    Specify the Athena DB name as the Data source name and choose Create data source.

    Choose the table name from the drop-down list.

    Now you have to choose whether you want to load the data to the SPICE (Super-fast, Parallel, In-memory Calculation Engine) engine or query directly from the database. The QuickSight service is built on top of SPICE, which does all the hard work so that QuickSight can serve up a visualization of the data quickly. We recommend loading the data source in the SPICE engine for better performance.

    From the designer window, you can choose the type of graph you want to build, as well as the data you want to see.

    This example shows the graph of happy shoppers by their age:

    Full code can be downloaded from the following links:


    In this blog post, I provided you with an example that can be used to design and build a simple application to analyze customer demographics. In addition, we explored the method of integrating Rekognition with other AWS services, such as Lambda, S3, and IAM. This solution is a great example of serverless and event driven architecture. All of the services that we used are AWS managed services, so you don’t have to worry about the scalability of the solution. The simple application that we created can be a building block to create an enterprise data lake. It can also be used to link the customer transactions with their demographics to derive better propensity models in real time.

    Additional Reading

  5. Learn how to build your own face recognition service using Amazon Rekognition!

  6. About the Author

    Amit Agrawal is a Solution Architect with Amazon Web Services. He works with our System Integrator partners and customers to provide leadership on AI and IoT projects, helping them shorten their time to value when using AWS. In his spare time, he enjoys photography and building fun projects with his kid.