AWS Partner Network (APN) Blog

Teradata Vantage Real-Time API Integration with Amazon SageMaker Endpoints

By Mike Riordan, Director Product Management Data Science and Analytics – Teradata
By Doug Mbaya, Partner Solutions Architect – AWS
By Vjay Pawar, Solutions Architect ISV – AWS

Teradata-AWS-Partners-2022
Teradata
Connect with Teradata-1

Teradata, an AWS Data and Analytics Competency Partner, has expanded its collaboration with Amazon Web Services (AWS) by adding integration capabilities for Teradata Vantage, the data platform for enterprise analytics and AWS cloud services.

Vantage, with its native object store (NOS) read/write connector to Amazon Simple Storage Service (Amazon S3) data, already provides data integration with S3 data and Vantage enterprise data.

Now, Teradata introduces an API integration with Amazon SageMaker and Amazon Forecast. With this integration, customers can connect data from Vantage to these external services and return analytic results all using a Vantage query.

By integrating with Amazon SageMaker, Teradata gives customers the tools to operationalize analytic workflows on their Vantage platforms. This enables them to connect integrated enterprise data with AWS analytics in real-time queries, putting analytic results in the hands of business and analytic teams to drive outcomes.

For example, business analysts can retrieve the latest scores from Amazon Forecast to help plan next month’s delivery schedules. Support teams can have the latest churn-risk scores from sophisticated SageMaker models while still on the phone talking to customers.

Teradata-Vantage-SageMaker-1

Figure 1 – Teradata Vantage API integration.

This Vantage integration with deployed AWS services puts the results of those analytics into the hands of Vantage users and Vantage workflows across the enterprise, bringing more insights and analytic value to customers’ data.

Teradata High-Level Architecture

The following diagram illustrates an end-to-end data flow from data generation to value analytics using Teradata Vantage on AWS.

Teradata-Vantage-SageMaker-2

Figure 2 – Modern Teradata enterprise architecture on AWS.

Model Inference/Scoring

The primary use case for the Vantage API integration is model scoring of data with AWS analytic services for business analysts. For these users, connecting enterprise data in Vantage to these already trained and deployed AWS services is crucial for driving their workflows and delivering insights and outcomes.

With the existing Teradata NOS connector to Amazon S3, data science teams have been able to use integrated enterprise data from Vantage for model training. These teams leverage the SageMaker platform to create the best performing model and finally deploy it as an AWS endpoint.

Financial Fraud Demo

In this example notebook flow, the analytic model has already been created and is deployed on the AWS Cloud. Vantage business users have access to the analytic service by passing data from Vantage to the deployed endpoint through a query using the in-database API_Request function, with calculated results returned as the query’s result set.

This Jupyter notebook uses the Teradata SQL Kernel, a Jupyter notebook kernel that allows you to connect and query Teradata using SQL language within the notebook.

%var SystemName=Vantage-DEV



%connect ${SystemName}
Success: 'Vantage-DEV' connection established and activated for user 'alice', with default database 'alice'

The Vantage table NEW_FINANCIAL_TRANS contains the analytic dataset used by the model for detecting possible fraudulent accounts.

The analytic team creates this account behavior profile dataset by combining account history data. Using Teradata NOS to write this analytic data set to S3, a predictive model can be trained using SageMaker.

In [49]:

# retrieve and show a sample of records from Vantage
select rec_id, avg_pmt_05_mth, max_utilization_05_mth, times_nsf, credit_util_cur_mth, credit_util_prior_5_mth from new_financial_trans sample 10;
rec_id avg_pmt_05_mth max_utilization_05_mth times_nsf credit_util_cur_mth credit_util_prior_5_mth
76809 -1900.2 62.4 0 0 21.6
597417 -7424.4 34.22222 0 0 13.95741
631585 -5240.4 29.37143 0 0 16.41428
597417 -7424.4 34.22222 0 0 13.95741
45837 -961 0 0 0 0
392307 -2900.2 87.26667 0 0 45.49444
484301 -3240 9.375 0 9.375 9.375
603917 -7042 14.4 0 6.56 9.17167
12892 -4711.8 30.18333 0 0 14.33889
12862 -4711.8 30.18333 0 0 14.33889

Predictive Model with Amazon SageMaker

Once the data science team trains an acceptable model, it’s deployed as an AWS endpoint to be used by business teams to detect and minimize fraud exposure.

Here’s an example Python code from the SageMaker notebook used to deploy this analytic model. Note the AWS endpoint address, as this will be used by the Vantage API integration query to score data.

# Amazon SageMaker code:

# Deploy model to a hosted endpoint

from sagemaker.serializers import CSVSerializer

xgb_predictor = xgb.deploy( initial_instance_count = 1,instance_type = "ml.t2.medium"
                           ,serializer=CSVSerializer() )

# get the endpoint name 
xgb_predictor.endpoint_name

#'sagemaker-xgboost-2021-10-20-15-43-44-623'

In this example from the SageMaker Jupyter notebook code, the deployed model is now available at the endpoint address: sagemaker-xgboost-2021-10-20-15-43-44-623

Integrating Vantage Data with AWS Analytic Services

With the new Vantage API_Request function, business analysts have seamless integration of Teradata Vantage with these Amazon SageMaker analytic services, connecting Vantage data for inference and scoring execution.

Vantage API query using a VIEW: BUSTOUT_FRAUD_SCORING

Wrapping the underlying API integration query with a SQL VIEW makes this API integration easy to use. In fact, with a VIEW users can be completely unaware their query is reaching out to these external services.

Below is the VIEW syntax, created with the Vantage API_Request function. You can see as input parameters the SageMaker endpoint address created earlier by the SageMaker deployment, and the proper AWS access credentials, along with the Vantage data to be sent for inference/scoring.

CREATE VIEW BUSTOUT_FRAUD_SCORING
AS
SELECT rec_id, round(100 * output) as fraud_risk_score
FROM tapidb.API_Request 
(
    ON ( SELECT rec_id, avg_pmt_05_mth,max_utilization_05_mth,times_nsf
               ,credit_util_cur_mth,credit_util_prior_5_mth
               ,credit_util_cur_to_prior_ratio,days_since_lst_pymnt
               ,num_pymnt_lst_7_days,num_pymnt_lst_60_days
               ,num_pur_lst_7_days,num_pur_lst_60_days,tot_pymnt_chnl,last_12m_trans_ct
         FROM NEW_FINANCIAL_TRANS 
       ) 
    USING AUTHORIZATION('{ "Access_ID"  : "******", "Session_Token" : "*****", "Region" : "us-east-2" }' ) 
    API_TYPE('aws-sagemaker') 
    ENDPOINT('sagemaker-xgboost-2021-10-20-15-43-44-623') 
    CONTENT_TYPE('csv') 
    KEY_START_INDEX('1') 
) as N

Calling the API/VIEW with this simple SQL query, the Vantage user is able to retrieve model scoring results from the externally deployed SageMaker model from their Vantage data.

SELECT * from BUSTOUT_FRAUD_SCORING sample 10

rec_id fraud_risk_score
76809 47
597417 87
631585 42
392307 59
631585 42
744410 95
631585 42
484301 94
603917 93
554445 79

A Vantage business analyst can use a simple query to join data such as end-of-month balance and account credit limit with a real-time fraud risk score from BUSTOUT_FRAUD_SCORING, the integration with the AWS analytic service.

- Create Reporting Query to add Fraud Risk Score to Acct Balance Details

select DISTINCT A.acct_no, 
       A.credit_limit, 
       A.eom_bal_amount,
       100 * A.eom_bal_amount / A.credit_limit as percent_limit,
       S.fraud_risk_score
from bustout_fraud_scoring S  --API integration to AWS service
    ,acct_map              M
    ,acct_statement        A
    ,( select acct_no, max(as_of_dt_day) dt  --curr balance
       from acct_statement group by 1
     ) as D
where S.rec_id = M.rec_id
and   A.acct_no = M.acct_no
and   D.acct_no = A.acct_no
and   A.as_of_dt_day = D.dt
order by 5 desc
acct_no credit_limit eom_balance_amount percent_limit fraud_risk_score
1094 8000 735 9.19 95
87 per9000 731 8.12 95
1251 4000 2026 50.65 94
385 10000 440 4.4 94
587 2000 1710 85.5 93
716 9000 1641 18.23 88
58 6000 264 4.4 87
500 5000 734 14.68 73
1016 6000 2866 47.77 69
206 3000 535 17.83 59
899 7000 3177 45.39 55
1087 6000 463 7.72 48
1134 9000 3636 40.4 47
641 2000 401 20.05 42
1093 5000 0 0 28

This generated report is actionable by business analysts on accounts that are likely fraudulent, using tactical queries like this with Vantage.

This FRAUD_RISK_SCORE can be easily leveraged and joined with Vantage business data. In this way, connecting business users with the analytic models is transparent and used in their daily workflows.

Summary

Teradata understands that customers are leveraging a wide variety of data science tools and platforms to extract new value from their data. With Vantage, Teradata embraces this full ecosystem of tools with new API integration as a key foundation for that strategy.

The Vantage API integration with AWS analytic services enables business users to drive outcomes with real-time analytics.

Learn more about Teradata Vantage on AWS, Amazon SageMaker, and Amazon Forecast.

.
Teradata-APN-Blog-Connect-1
.


Teradata – AWS Partner Spotlight

Teradata is an AWS Competency Partner and cloud-first, enterprise analytics solution that delivers customer insights at scale.

Contact Teradata | Partner Overview | AWS Marketplace