AWS for Industries

Driving Business Outcomes with Clickstream Data

In today’s competitive world, customer obsession isn’t an option, it’s an imperative. Mastering customer obsession starts with data—demographics, psychographics, transactions, purchase records, support cases, product usage, shopping habits, content preferences, and more. According to Gartner report, less than 10% of companies have a 360-view of the consumer. Not because they don’t try, but because it has always been hard. The challenge arises from inherent complexities associated with data collection and integration processes, making it a persistent hurdle to achieving a complete view of consumers.

Timely business decisions require access to new data in real-time rather than hours and days due to the fast-paced nature of today’s business environment. In order to stay competitive and make well-informed decisions that align with current market conditions, organizations must have real-time information at their disposal.

As markets fluctuate rapidly, and customer preferences change, working on stale data could result in missed opportunities or outdated insights, leading to a suboptimal customer experience.

Businesses recognize that they must work toward taking back ownership of their data (first-party data) to harness the power of their customer and prospect information in order to compete and win on customer experience. An example of first-party data is clickstream data which holds immense potential for companies to enhance their understanding of customer behavior and preferences.

Clickstream data refers to the collection of digital interactions that occur between a user and a website or mobile application. Capturing and creating usable insights from user data in real-time can be challenging. Amazon Web Services (AWS) serverless services can help by providing a scalable architecture to seamlessly capture, process, visualize and load clickstream data into analytics platforms.

User interactions encompass a wide range of actions, including clicks on links or buttons, views of different pages, the duration of time spent on specific pages, submissions of forms, downloads of files, and many other activities that take place within the digital environment. Clickstream data provides valuable insights into user behavior, preferences, and patterns, allowing organizations to optimize their websites or apps, enhance user experiences, and make data-driven decisions to improve their overall performance.

Here are several benefits showcasing the advantages businesses can gain from utilizing clickstream data:

  • Customer Insights: Clickstream data can provide valuable insights into customer behavior, preferences, and interests. By analyzing clickstream data, businesses can better understand how users navigate their website or mobile app, which pages or features are most popular, and what actions users take before converting.
  • Personalization: Clickstream data can be used to personalize the user experience. By analyzing a user’s clickstream data, businesses can make personalized recommendations, show targeted content, and deliver relevant ads.
  • Optimization: Clickstream data can be used to optimize websites and mobile apps. By analyzing clickstream data, businesses can identify areas of their website or mobile app that are causing friction or hindering conversions. They can then make changes to improve the user experience and increase conversions.
  • Marketing: Clickstream data can be used to optimize marketing campaigns. By analyzing clickstream data, businesses can better understand the customer journey and make more informed decisions about targeting, messaging, and channel selection.

In this blog, we will cover high level architecture for capturing clickstream data in near real-time using AWS serverless services without the need of provisioning and managing servers.


The solution uses Amazon API Gateway, AWS Lambda and Amazon Kinesis Data Streams to ingest and process clickstream data, Amazon Kinesis Data Firehose to save the raw data in Amazon Simple Storage Service (Amazon S3), then Amazon Athena and Amazon QuickSight to analyze and visualize data in a user-friendly manner.

Why did we choose these services?

Clickstream data continuously streams in as a large volume of messages, at highly-variable rates depending on user traffic and behavior. When evaluating the performance of new application features, website layouts, or marketing campaigns, it is crucial to analyze them in real-time to enable prompt actions.

The AWS services selected for this architecture offer autoscaling capabilities and cost-efficient solutions for processing clickstream data. These services dynamically scale resources to accommodate the fluctuations in the incoming workload, ensuring near real-time processing and analysis. With a pay-as-you-go pricing model, you only pay for the resources consumed, eliminating the need for overprovisioning and minimizing costs.

Amazon API Gateway is a fully managed service that makes it straightforward for developers to create, publish, maintain, monitor, and secure APIs at any scale.

AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. You can trigger Lambda from over 200 AWS services and software as a service (SaaS) applications—only paying for what you use.

Amazon Kinesis Data Streams is a serverless streaming data service that facilitates the capture, processing, and storage of data streams at any scale.

Amazon Kinesis Data Firehose is an extract, transform, and load (ETL) service that reliably captures, transforms, and delivers streaming data to data lakes, data stores, and analytics services.

Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps.

Amazon Athena provides a straightforward, flexible way to analyze petabytes of data where it lives. With Athena, you can analyze data or build applications from an Amazon S3 data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python.

Amazon QuickSight powers data-driven organizations with unified business intelligence (BI) at hyperscale. With QuickSight, all users can meet varying analytic needs from the same source of truth through modern interactive dashboards, paginated reports, embedded analytics, and natural language queries.

Neiman Marcus implemented this solution on their ecommerce websites, and Sravan Erukulla, Director of Omni channel personalization, says, “AWS collaborated with The Neiman Marcus Group to create a clickstream repository capable of processing millions of records daily. In just three weeks, the two teams worked together and built the solution and deployed it to production. This clickstream data empowers Neiman Marcus to train and tune the models in near real-time for user browsing behavior, abandoned carts, and more.”

Architecture diagram

Figure 1 presents the clickstream data flow architecture, showcasing how the clickstream payload progresses through a series of steps. The customer web portal in the diagram, which serves as a digital platform, such as a website or mobile application, enables users to interact with the system. As users navigate through the web portal and click on different links, the clickstream data undergoes the following stages of flow.

Figure 1 Architecture

Figure 1 – Architecture

  1. The client (customer web portal) sends the clickstream payload (record) to the API Gateway.
  2. The API Gateway transmits the record to Lambda, where the data is standardized.
  3. Lambda sends the record to Kinesis Data Streams for asynchronous processing.
  4. Kinesis Data Streams transfers the request to Kinesis Data Firehose.
  5. Kinesis Data Firehose buffers the records every minute and uploads them to an S3 bucket.
  6. Athena is used to query and analyze the data stored in the S3 bucket.
  7. QuickSight is used to create dashboards and display the data visually.

An example of clickstream use case:

Clickstream data can be a valuable source of information for providing personalized product recommendations using Amazon Personalize. Amazon Personalize is a machine learning service that enables developers to create personalized recommendations for their applications. Clickstream data can be used by Amazon Personalize to enhance the recommendation capabilities and provide more relevant and personalized experiences for customers. The clickstream solution described in this blog can be extended to feed clickstream data to Amazon Personalize as shown in the Figure 2.

Figure 2 Clickstream use case

Figure 2 – Clickstream use case

  1. Kinesis data streams passes the clickstream payload to Lambda.
  2. Lambda formats the record and sends it to Amazon Personalize.


Leveraging AWS serverless services provides a powerful and scalable solution for capturing clickstream data. By utilizing services such as Amazon API Gateway, AWS Lambda, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, Amazon S3, Amazon Athena and Amazon QuickSight, organizations can seamlessly capture, process, visualize and load clickstream data into analytics platforms. Kinesis Data Firehose facilitates the ingestion process by automatically scaling and buffering incoming data, while Lambda enables the execution of custom code for data transformation and enrichment.

With the serverless architecture, businesses can efficiently handle varying data volumes, reduce operational costs, quickly iterate on and learn from changes to their customer facing digital properties. They can also rapidly extract insights from clickstream data enabling data-driven decision-making and enhanced customer experiences.

In our technical focused blog, Capture clickstream data using AWS serverless services, we will take you through the step-by-step process of instituting this solution to help you get started with capturing your clickstream data.

Contact an AWS Representative to know how we can help accelerate your business.

Further reading

Pritam Bedse

Pritam Bedse

Pritam Bedse is a Senior Solutions Architect at Amazon Web Services, helping Enterprise customers. His interests and experience include AI/ML, Analytics, Serverless Technology, and customer engagement platforms. Outside of work, you can find Pritam outdoors gardening and grilling.

Christin Carter

Christin Carter

Christin Carter is a Principal Account Manager at Amazon Web Services, helping Enterprise customers. Her interests and experience include Digital Transformations, AI/ML, Analytics, Serverless Technology, and Customer Experience solutions. Outside of work, you can find Christin traveling the world and seeking out new adventures big and small.

Jared Warren

Jared Warren

Jared Warren is a Principal Solutions Architect at Amazon Web Services, working with our Enterprise customers. Outside of work, he plays board games (the nerdier the better) and smokes bar-b-que in his backyard.