Microsoft Workloads on AWS

Data Reply: Modernizing Legacy SQL Server Data Warehouses with AWS

For many organizations, data warehouses have traditionally resided in on-premises data centers. These organizations are now looking to free themselves from the limitations of on-premises data warehouses.

Reply logoData Reply is a Reply group company, a Premier Amazon Web Services (AWS) Partner that offers a broad range of advanced analytics, AI/ML, and data processing services. Data Reply operates across many industries and business functions, enabling their customers to achieve meaningful business outcomes through the effective use of data, accelerating innovation, and time to value.

I chatted with Jyoti Mann, Manager at Data Reply focused on data transformation and analytics, to understand how they are helping customers modernize their SQL Server data warehouses on AWS to unlock the agility, security, scalability, and economics of the cloud.

Q. What challenges do you see organizations facing by using on-premises data warehouses?

A. In a traditional data warehouse, the inflexible architecture makes it hard to scale and introduce rapid changes affecting scalability and agility. Legacy systems also become slower over time, affecting performance, efficiency, and productivity – not to mention the high cost involved in managing existing infrastructure.

Also, it is crucial to meet security and compliance requirements that were non-existent when these legacy data warehouses were built. This means placing business-critical data on a secure platform with controlled access.

Many organizations still hold business-critical data in legacy on-premises Microsoft SQL Server data warehouses. These legacy data warehouses, sooner or later, need to be modernized.

Q: What do you mean by modernizing a legacy data warehouse?

A. Modernization means moving to a modern data warehouse in the cloud. This is particularly critical for organizations that need to store unstructured data, such as streaming voice and video data, social media data, clinical notes in healthcare, financial trade information, and so on – where business decisions depend on fast-growing and unpredictable data.

There are also management and strategy considerations. Companies need to think about how to get accurate and reliable business intelligence (BI) from their data. What more can they do with data? For example, would they benefit from AI/ML based analytics? Would they like to offer to customers historic data analytics to discover past trends and forecast the future?

Q: How does Data Reply help modernize their customers’ legacy data warehouses?

A. At Data Reply, we understand that each business case is unique. Through in-depth assessment of a company’s data warehouse needs, use case, and vision, our experts use their knowledge and experience to recommend the most suitable migration and modernization pathways on the AWS cloud.

From our experience, we have seen two data analytics options on AWS that suit most of our customers’ needs. In Figure 1, one option uses Amazon Redshift (Path A) and the other uses Amazon Athena (Path B), along with other AWS services. In few cases, both options are used in conjunction with each other based on the what customers are looking to achieve.

Figure 1: Data warehouse modernization with AWSFigure 1: Data warehouse modernization with AWS

Q. Interesting! Let’s start with Path A. Can you dive deep into how data warehouse modernization works with Amazon Redshift?

A. Sure thing! This option involves migrating your data warehouse to Amazon Redshift, an enterprise-level, petabyte scale, fully managed data warehousing service on AWS. Also, extending data warehouses to data lakes is part of modern data architecture practices. Amazon Redshift uses Amazon Redshift Spectrum to allow this extension. With Redshift Spectrum, you can query the data in your Amazon Simple Storage Service (Amazon S3) data lake directly.

Using Redshift Spectrum offers several advantages. First, this capability extends your petabyte-scale Amazon Redshift data warehouse to unbounded data storage limits using Amazon S3, which allows you to scale to exabytes of data cost-effectively. Second, it elastically scales compute resources separately from the storage layer in an Amazon S3 data lake. It also offers significantly higher concurrency, because you can run multiple Amazon Redshift clusters and query the same data in Amazon S3. Finally, with Redshift Spectrum, you have the freedom to store data in a multitude of formats, so that it is available for processing whenever you need it. You can run instant queries on unstructured data without having to load or transform it.

When integrated with other advanced AWS services, this architecture offers additional advantages. For example, using it with AWS Glue—which offers serverless data integration—makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. And by adding Amazon QuickSight and Amazon SageMaker, we can offer our customers scalable, serverless, embeddable, machine learning-powered business intelligence.

Q. For which scenarios or use cases would you recommend using Amazon Redshift and Redshift Spectrum?

A. We often recommend this approach for customers migrating large databases to the cloud. Amazon Redshift is especially useful for business-critical workloads for customers that use a central extract, transform, and load (ETL) cluster that shares data with multiple BI or analytic clusters.

One use case for Redshift Spectrum could be a high-velocity, high-volume e-commerce site. We recently supported a global fashion retailer with a solution that included customers’ order history in an Amazon Redshift data warehouse. Orders arrived in real time through an Apache Kafka stream, landing in Amazon S3 in Apache Parquet format. The customer challenge was to make quick and accurate decisions for orders using order history data in Amazon Redshift and real-time data in Amazon S3. We designed a solution that used Redshift Spectrum to query data from both Amazon Redshift and Amazon S3.

Q: Let’s talk about Path B now. Can you dive deep into how data warehouse modernization work with Amazon Athena?

A. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. It’s serverless, so there’s no infrastructure to manage—you pay only for the queries that you run. Users can analyze unstructured, semi-structured, and structured data stored in Amazon S3.

Both Amazon Athena and Redshift Spectrum query Amazon S3 using standard SQL, and store the results. But there’s one major difference: Amazon Athena stores query results on Amazon S3, from where they can be loaded into Amazon Redshift, while Redshift Spectrum can join tables directly on Amazon Redshift.

Q. For which use cases would you recommend customers use Amazon Athena?

A. Data Reply offers the Amazon Athena pathway to customers that, for example, want to run interactive ad hoc SQL queries against data on Amazon S3 without having to manage infrastructure or clusters.

Most data science and analytics teams’ key requirement is to analyze data to provide near-real-time insights. Many of the financial service industry customers we work with run ad hoc reports. In-depth data analysis helps them to make better decisions and optimize returns on investment (ROI) through forecasting and spotting trends. Data warehouses can use SQL for this type of analysis. One of the main challenges is that large volumes of data and diverse data sources may require additional ETL work. In such cases, we recommend using Amazon Athena, storing raw data in Amazon S3, and creating specific ETL flows for each use case, as demonstrated in Figure 2.

Figure 2: Amazon Athena for ad hoc analyticsFigure 2: Amazon Athena for ad hoc analytics

Q. How does Data Reply help customers in their data warehouse modernization journey?

A. We have a tried-and-tested, three-stage approach to migration: assess, mobilize, and modernize.

Figure 3: Data Reply three-stage approach

Figure 3: Data Reply three-stage approach

Our data warehouse and analytics assessment (DWAA) approach, as shown in Figure 3, helps customers identify options aligned with their business goals, establishes a clear vision of the future, and creates a high-level business case for modernization. Depending on the use case, we can deliver these assessments for potentially zero cost to customers.

We perform an in-depth review of the customer’s as-is environment to understand business requirements, goals, and potential challenges. We also look at a customer’s current capabilities—both technical and organizational—to identify gaps and constraints. We then design a best-fit data warehouse and analytics-complete solution to fit their needs. This typically includes a target operational model and governance around data warehouse management. We also create a roadmap and a business case. Breaking the journey into manageable chunks helps in reducing risk and ensures that we deliver tangible business outcomes at each stage.

Q. Are there any resources where our readers can find more about how Data Reply’s offerings?

A. Customers can learn more about how Data Reply helps organizations modernize their legacy SQL Server data warehouses on AWS on our webpage where we dive deep into our approach and success story of FSI customer.

About the Microsoft workloads on AWS Partner Spotlight series

Please continue to join me in this blog series as I highlight our AWS Partners’ capabilities in migrating and modernizing Microsoft workloads on AWS. As you do, I hope you’ll also ask the question: “What’s my organization’s plan for moving off SQL Server or Windows Server or whatever technology is keeping us from modernizing to better care for our customers?” Let AWS Partners with their specific niche offerings help you assess how your company can get the most out of cloud.

AWS can help you assess how your company can get the most out of cloud. Join the millions of AWS customers that trust us to migrate and modernize their most important applications in the cloud. To learn more on modernizing Windows Server or SQL Server, visit Windows on AWSContact us to start your modernization journey today.

Prasad Rao

Prasad Rao

Prasad Rao is a Principal Partner Solutions Architect at AWS based out of UK. His focus areas are .NET Application Modernization and Windows Workloads on AWS. He leverages his experience to help AWS Partners across EMEA for their long term technical enablement to build scalable architecture on AWS. He also mentors diverse people who are new to cloud and would like to get started on AWS.