AWS Partner Network (APN) Blog

Leveraging AWS Analytic Services and HCLTech Frameworks for OLAP Solutions

By Arunkumar Jambulingam, Data & Analytics Lead, ASEAN – HCLTech
By Shishir Choudhary, Sr. Partner Solutions Architect – AWS
By Jerry Li, Sr. Partner Solution Architect – AWS
By Deepak Chandrasekaran, Sr. Partner Development Manager – AWS

HCLTech-AWS-Partners-2022
HCLTech
Connect with HCLTech-2022

Online analytical processing (OLAP) is a method of organizing datasets in a multi-dimensional format for quick analysis. This provides deeper insights for decision-makers by enabling them to analyze datasets, understand trends and patterns, perform root cause analysis, identify drivers, and act.

Multi-dimension analysis is widely adopted by analysts, knowledge users, and power users for their decision support process.

In this post, we will explore utilizing Amazon Web Services (AWS) analytic services and migration tools together with HCLTech frameworks to orchestrate OLAP solutions.

HCLTech is an AWS Premier Tier Services Partner with Competencies in Migration, DevOps, SAP, Storage, Mainframe Modernization, and Cloud Operation Services. HCLTech is also a member of the AWS Managed Service Provider (MSP) and Well-Architected partner programs.

OLAP Architecture and Components

There are four types of online analytical processing:

  • Multi-dimensional OLAP (MOLAP): Stores aggregated and computed datasets in the proprietary multi-dimensional (in memory or on disk) cube format for analysis. Offers high performance and optimized storage.
  • Relational OLAP (ROLAP): Stores datasets on relational data store to perform multi-dimensional analysis. Data aggregation and computation happens on the fly.
  • Hybrid OLAP (HOLAP): Stores aggregated datasets in MOLAP cube and detail datasets in ROLAP database for analysis.
  • Virtual OLAP (VOLAP): Enables multi-dimensional analysis over the datasets available in various data sources through federated queries, optimized columnar storage, and distributed and parallel processing.

AWS services for OLAP are comprised of four layers, as described below:

Insights and Visualization

This layer provides capabilities to interact with datasets using reports and dashboards. It enables end users to perform ad-hoc analysis, author reports, dashboards, scorecards, and charts from the underlying datasets with role-based access controls (RBAC). Amazon QuickSight addresses the capabilities in this layer

Governance and Access

This layer provides the capability to catalog the underlying datasets and enables users to perform search, discovery, and secure data access. It enables federated access to a variety of datasets available in the underlying persistent storages, and to centrally manage permissions on data being shared across your organization. Key AWS services include AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.

Compute and Storage

This layer provides capabilities to persist any kind of datasets and perform data computations (ELT/ETL) over the same. Depending on the data requirements (cold, warm, hot), data will be stored and processed by Amazon Simple Storage Service (Amazon S3), Amazon EMR, Amazon Redshift Serverless, and AWS Glue.

Ingestion

This layer provides the capability to bring all source and upstream datasets onto an AWS data lake or staging layer. It meets all the requirements of batch, real-time streaming, change data capture, and secure transfers.

HCLTech-OLAP-Solutions-1

Figure 1 – AWS services for OLAP.

AWS Services for OLAP Solution

Next, we will discuss the AWS services leveraged in the top three layers for the OLAP solution.

Amazon QuickSight Capabilities

Preparing the QuickSight SPICE Cube

Amazon QuickSight uses SPICE (Super-fast, Parallel, In-memory Calculation Engine) to store data in a multi-dimensional format for faster processing of analytical queries. The cubes are encrypted and compressed for storage optimization.

As part of the demonstration, we’ll leverage a sample retail dataset which comprises product dimension, branches, daily sales transactions, and revenue target. As a broader process, sales data is copied to Amazon S3, loaded to Amazon Redshift Serverless data mart, and reported in QuickSight.

Now, let’s embed OLAP capability for the above process. While preparing your retail dataset for analysis in QuickSight, choose “Query Mode” as SPICE to import the dataset and build the in-memory cube. This imports the dataset from Amazon Redshift Serverless views/tables. Note that we need to regularly refresh the in-memory cube to update it with the latest/incremental records.

HCLTech-OLAP-Solutions-2

Figure 2 – Amazon QuickSight – SPICE configuration.

Next, go to the dataset and select “Refresh Property” to see the status of current and historical refreshes. We can schedule the cube refresh in any given frequency, and many users can access a SPICE dataset concurrently without performance issues.

Prepare Multi-Dimensional Analysis

Preparing hierarchies, level, and drill-downs navigations over the datasets is seamless in QuickSight. Note that QuickSight provides the option to convert the date field to year, month, or day automatically. The above features are available in charts as well.

HCLTech-OLAP-Solutions-3

Figure 3 – Amazon QuickSight hierarchy, levels, drill down, and slice/dice.

Amazon QuickSight provides ability to drill up and down levels, and also to slice and dice rows and columns as shown above.

Drill Through Navigation from Summary to Detail Reports

We can use the action options to drill through the navigation, from summary report to detail report, and pass the filter context for the same. This provides the flexibility of splitting the complexities of the reports while navigating seamlessly without losing the context.

HCLTech-OLAP-Solutions-4

Figure 4 – Amazon QuickSight – Drill through navigation.

Aggregate and Table Calculations

Amazon QuickSight provides rich sets of table calculations to discover how dimensions influence measures or each other. With the help of this table and aggregate calculation, we can address the mulit-dimensional expressions (MDX) equivalent expressions. Provided below are some of the business scenarios:

  • QTD: QuickSight calculate the quarter to date (QTD) automatically based on the time period hierarchy.
  • YTD: To calculate the year to date (YTD) value, use the following table function:

runningSum(sum(actual),[period ASC],[product ASC,{product_brand} ASC,{product_type} ASC,{product_line} ASC])

  • PY Act: To calculate the prior year actual value for the given date:

periodOverPeriodLastValue(sum(actual),{sale_date}, MONTH,12)

HCLTech-OLAP-Solutions-5

Figure 5 – Amazon QuickSight – PY Act, Forward One Month, Trail One Month.

  • T1M: To calculate the trailing one month (T1M), which is the sum of a given month and previous month value:

sum(actual)+periodOverPeriodLastValue(sum(actual),{sale_date},MONTH,1)

  • F1M: To calculate the forward one month (F1M), which is the sum of a given month and next month value:

sum(actual)+periodOverPeriodLastValue(sum(actual),{sale_date},MONTH,-1)

We can use Amazon QuickSight parameter control to customize the above calculations for specific months.

Machine-Learning Powered Forecasting

We can use QuickSight machine learning-powered forecasting to forecast complex, real-world scenarios such as data with multiple seasonality.

Auto Narratives

QuickSight automatically interprets the data, charts, and tables in the dashboard and provides a number of suggested insights in natural language. The suggested insights you can choose from are ready-made and come with words, calculations, and functions. These narratives can be customized per user requirements.

QuickSight and Redshift Serverless Integrations

Amazon QuickSight can perform ROLAP by directly integrating with Amazon Redshift using direct query mode. All of the QuickSight reports and insights developed in SPICE can be seamlessly repointed to its source Redshift tables, views, and materialized views.

Redshift views or materialized views encapsulate complexities of underlying data models (tables and joins) and create a subset of contextual datasets that can act as a base view for OLAP analysis.

We can embed window functions in the SQLs for complex analytical scenarios, pre-compute, aggregate, and store the processed result set in materialized view. Materialized views offer significant performance benefits compared to live table SQL or view queries.

Also, we can extend the QuickSight insights from Redshift managed tables to Amazon S3 datasets using Amazon Redshift Spectrum.

HCLTech-OLAP-Solutions-6

Figure 6 – Amazon Redshift Spectrum view.

We can query the S3 dataset by creating an external table from Redshift, as shown above. We can also combine datasets that include data stored in Redshift and in S3. Then, we can access them using the SQL syntax in Redshift.

QuickSight and Amazon Athena Integrations

Amazon Athena provides virtualization infrastructure where it submits a single SQL query to analyze data in relational, nonrelational, object, and custom data sources running on-premises or in the cloud. Athena accesses AWs Glue Data Catalog which maintains registry of sources metadata, and AWS Glue crawler continuously scans the source metadata and maintains up to date changes.

HCLTech-OLAP-Solutions-7

Figure 7 – Amazon QuickSight integration with Athena.

By leveraging the Amazon QuickSight Athena Connector, we can compose federated queries and generate OLAP insights.

QuickSight and Amazon EMR Integrations

Amazon QuickSight supports big data connectors to connect with Apache Spark on Amazon EMR. We can leverage Spark tables or through results of Spark SQL queries. For this process, QuickSight requires your Spark server to be secured and authenticated using LDAP, which is available to Spark version 2.0. Spark SQL can also be configured to use AWS Glue Data Catalog as its meta store.

HCLTech-OLAP-Solutions-8

Figure 8 – QuickSight integration with Apache Spark on Amazon EMR.

AWS Glue Crawlers can automatically infer schema from source data in S3 and store the associated metadata in the data catalog. QuickSight requires LDAP as authentication protocol for authenticating Spark cluster; hence, we need to install and configure OpenLDAP in Amazon EMR and enable Thrift Server.

OLAP Solution Deployments on AWS

In this section, we’ll cover OLAP deployment types which involves provisioning of set of AWS analytic services based on business functional and technical requirements, along with HCLTech accelerators to fast track the deployments.

HCLTech-OLAP-Solutions-9

Figure 9 – AWS OLAP deployment types.

Use Case Requirement #1

A customer wants to perform self-service business intelligence with multi-dimensional data analysis requiring quick query response and ad-hoc analysis over GBs of data volume, with acceptable data latency for reporting

MOLAP is the appropriate deployment type, and you can leverage Amazon QuickSight SPICE which supports a maximum of 2,000 fields per cube with 24 million records (25 GB dataset) for standard edition and one billion records (one TB dataset) for enterprise edition.

Use Case Requirement #2

A customer wants a managed report/scorecard in addition to dashboards and analysis over complex multi-dimensional data model, with TBs of datasets and zero data latency.

ROLAP is the deployment type, and Amazon QuickSight’s integration with Redshift Serverless leverages a massively parallel processing query optimizer and takes advantage of the columnar-oriented data storage. It can process complex analytic queries that often include multi-table joins, subqueries, and aggregation.

Generally, customers keep the cold data, unstructured data, and historical transactional data in Amazon S3 for low-cost, efficient storage. Redshift Spectrum can create complex queries of churning both Redshift-managed tables containing hot data and S3 datasets.

Use Case Requirement #3

A customer wants to perform analysis over a big data lake with petabytes of datasets.

ROLAP is the deployment type and leverages the Amazon EMR SparkSQL connector with QuickSight. Amazon EMR provides a simple and cost-effective way to run highly distributed processing frameworks such as Spark.

Use Case Requirement #4

A customer wants to perform hybrid data analysis over various data sources through virtualization.

VOLAP is the deployment type, and QuickSight’s integration with Amazon Athena can run OLAP analysis over federated queries churning datasets across S3, Redshift, and Amazon Relational Database Service (Amazon RDS). This offers zero footprint traversing of analytical queries from data mart to data lake.

Advantages of AWS and HCLTech

There are several advantages for customers when deploying or migrating OLAP workloads onto AWS:

  • Interoperability of analytical insights: Several AWS analytic services seamlessly integrate with your datasets, including Amazon QuickSight, Amazon Redshift, Amazon S3, Amazon Athena, Amazon EMR, and AWS Glue Data Catalog.
  • AWS serverless and cost-effective deployment: With low-cost data storage, serverless performance at scale, and pay-as-you-go compute services, AWS offers fully managed and cost-effective infrastructure for managing OLAP workload deployments.
  • Quicker deployment: AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (DMS) make it easy to migrate relational databases, data warehouses, and NoSQL databases to AWS and unlock analytic insights. HCLTech’s ADvantage Migrate fast tracks AWS data lake/data mart implementation and data migration to AWS, optimizes production release cycles, and drives productivity by adopting modular design approaches (reusability of codes) and template-based data preparation and testing activities.

Conclusion

In this post, we briefed you on orchestrating online analytical processing (OLAP) workloads using AWS analytics services and their key features in addressing OLAP requirements.

Unlike legacy products that require separate infrastructure and licenses, AWS analytics services offer seamless OLAP analysis over the data residing in the data lake, data mart, data warehouse, with the flexibility to scale up and down at a lower cost.

.
HCLTech-APN-Blog-Connect-2023
.


HCLTech – AWS Partner Spotlight

HCLTech is an AWS Premier Tier Services Partner and MSP that serves hundreds of global enterprises to solve day-to-day and complex challenges with a dedicated full-stack business unit.

Contact HCLTech | Partner Overview | AWS Marketplace | Case Studies