Realize Faster Time to Value with IBM’s Modern Data Accelerators on AWS
By Tony Giordano, Sr. Partner, Global Leader Data Platform Services – IBM
By Ryan Keough, Manager, Solution Architecture – AWS
By Amit Chowdhury, Sr. Partner Solutions Architect – AWS
Use cases for data have changed and expanded from the data warehouse days, as digital transformation has unleashed new and existing uses for data services.
Many organizations still are managing expensive single use case data environments, however, and IBM’s Modern Data Accelerators on AWS can help build modern implementation of a data fabric architecture, enabling customers to realize faster time to value.
Customers tend to maintain the following data environments:
- Data warehouses for batch business intelligence and reporting.
- Data lakes for data science predictive model testing and development.
- Digital databases for digital interactions.
These single-use environments tend to have significant duplication in data sourcing from both internal and external sources, leading to poor data quality, high maintenance costs, and inflexible data environments.
In this post, we will discuss how IBM’s Modern Data Accelerators can help build modern implementation of a data fabric architecture, which standardizes data integration across the enterprise.
IBM Consulting is an AWS Premier Tier Services Partner and is recognized as a Global Systems Integrator (GSI) for many competencies including Data and Analytics Consulting, which positions IBM to help customers who use AWS to harness the power of innovation and drive their business transformation.
Challenges with Legacy Data Environments
Generally, legacy data environments are inter-connected with event-driven inbound and outbound digital channels, using the predictive models which are built in the data science sandboxes. However, real-time visualization requires a higher level of integration of an organization’s data use cases.
This challenge can be solved by building data environments into multi use-case data platforms that provide:
- Common data provisioning data lakes: Cost effective, cloud-based stores to source real-time and batch data from internal and external sources.
- Integrated conform layer: Enterprise data conformed for specific domains such as customer, product, and transaction.
- Consumption layer: For infomarts, data science sandboxes, digital data stores, and operational data use cases.
A multi-use case data platform with common data provisioning that’s instantiated on AWS generally provides a lower cost and more flexible data environment with high data quality.
Modern Data Accelerators Offering by IBM Consulting
IBM Consulting has built an integrated set of assets called Modern Data Accelerators that boost time-to-value and reduce delivery time as well as cost.
Modern Data Accelerators allow enterprises to build out the multi-use case data platforms, and have been developed as an end-to-end set of assets that can quickly instantiate data capabilities for a data cloud implementation on AWS. These assets have been deployed in multiple geographies, and across industries.
Figure 1 – Modern Data Accelerators.
It’s important to note these assets are not just software. They are not extract, transform, load (ETL) tools, databases, big data stores, or other data management software. This is a combination of all of these which enable the usage of the data management services faster.
There are essentially two options to accelerate data management services:
- Perform traditional analysis, design, and coding.
- Install and configure assets, and then extend them for specific requirements.
The Modern Data Accelerators are known data management processes that have been codified to drive business value faster. These assets drive the lifecycle of a data environment from development to maintenance and ingestion, through analytics.
Modern Data Architecture Components
The following components provide a brief overview of IBM’s Modern Data Accelerators.
- Workload analysis and modernization automation: Creates persona-based, detailed inventories of data ecosystems and their dependencies. Provides automated translation of legacy data- processing code, and automated testing, which is essential for creating actionable migration roadmaps.
- Real-time and batch intelligent integration engine: A Kafka-based, real-time and batch engine that creates data pipelines for managing the ingestion, organization, and publication of data. These engines include custom connections, unified batch and stream capabilities, and ingestion APIs. The data onboarding process learns and adapts using machine learning models.
- Digital integration for intelligent workflows: This is a data provisioning asset for intelligent workflows that’s designed for enterprise and unstructured data.
- Data mesh console: Provides an operational interface for data mesh implementations, which helps manage the lifecycle of data products and their dependencies. The data mesh console is designed to integrate and leverage data catalogs and data marketplaces, while also providing data product observability metrics.
- AI-driven cognitive classifier: The artificial intelligence-driven cognitive classifier automates the classification and organization of data against enterprise canonical models and provides real-time insight into data quality. Automated data tagging provides classification for the application of security policies across the data platform. Once trained on an organization’s data, the tool can be integrated in the intelligent integration engine to classify, auto-map data, and send notifications in both real-time and batch mode.
- Lightweight master data management (MDM): Provides multi-domain entity matching capabilities that identifies duplicates or suspected duplicate entities across large datasets, using both probabilistic and deterministic matching logic. It’s built leveraging both graph database and Elasticsearch capabilities.
Figure 2 – Master data management.
- Real-time and batch monitoring: Data quality across all components in the data fabric environment is maintained using customizable, open-source Grafana dashboards and portlets. Using modern technologies such as Airflow, the solution is able to provide a high degree of automated data pipeline and structure observability.
- Data marketplace: This is the central provisioning point for data consumption in the modern data platform on AWS. Data is persisted in original and curated forms with purpose-fit storage, allowing for publication and subscription of data.
- Data science marketplace: Contains a set of cloud-ready data science models that can be accessed and stored in the model marketplace, hosted in the data platform.
As stated earlier, each of these assets help accelerate time to value of data cloud solutions on AWS. See Figure 5 to understand how all of these components come together to build the Modern Data Accelerators.
Figure 3 – Various components of Modern Data Accelerators.
Customer Success Story
The following case study is a good representation of how the Modern Data Accelerators have helped accelerate customers’ time to value.
IBM Consulting was engaged to assist a large healthcare organization to transform its use of its analytic data in more productive and proactive ways while reducing the overall cost.
The organization had a large legacy relational database that was primarily used for reporting purposes. The first step was to analyze the existing database workload with IBM’s Workload Analysis tool to determine the types of data and personas using that data.
With the objective to change the way business users consume data, IBM Consulting ingested data from hundreds of sources using IBM’s real-time and batch intelligent integration engine into a new Amazon Simple Storage Service (Amazon S3)-driven data lake, where IBM envisioned 14 first-of-a-kind predictive models to define “member health profile” in 12 weeks.
IBM used the data science marketplace to store the predictive models that were easy to reuse in subsequent models. The early success provided the momentum to right-size the end-state data platform, curate 23 new datasets into production in three months, and leverage real-time and batch monitoring capabilities to bring the new, single AWS Cloud platform into production within six months.
IBM’s Modern Data Accelerators on AWS is a proven solution to reduce delivery time and cost for customers. It mitigates the risk of maintaining legacy data platforms while improving time to value.
The Modern Data Accelerators help achieve the following objectives in a data fabric, data mesh, or modern data stack implementation:
- Manage: Singular and integrated means of management of data at rest, in motion, and integration of the data.
- Govern: Map all enterprise assets to a single canonical model and data catalog.
- Secure: Information classification serves as the single means of defining security policies and entitlements.
To learn more about the solution, check out IBM’s data platforms page. You can also refer this AWS blog post to explore modernizing data platforms, accelerating innovation, and unlocking business value with data mesh on AWS.
IBM – AWS Partner Spotlight
IBM Consulting is an AWS Premier Tier Services Partner and MSP that offers comprehensive service capabilities addressing both business and technology challenges that clients face today.