Machine Learning for Telecommunication deploys a scalable, customizable machine learning (ML) architecture that provides a framework for end-to-end ML workloads for use in telecommunications use cases. This guidance streamlines the process of ad-hoc data exploration, data processing and feature engineering, and machine learning model building including training, evaluation and performing predictions by deploying the model in an endpoint.
The guidance also includes a synthetic telecom IP Data Record (IPDR) dataset to demonstrate how to use ML algorithms to test and train models for predictive analysis in telecommunication. You can use the included Jupyter notebooks as a starting point for doing your own artificial intelligence research to develop your own custom ML models, or you can customize the included notebooks for your own use case.
Overview
The diagram below presents the architecture you can build using the example code on GitHub.

Machine Learning for Telecommunication guidance architecture
An Amazon Simple Storage Service (Amazon S3) bucket includes a synthetic IP Data Record (IPDR) dataset, an AWS Glue job converts the datasets, and an Amazon SageMaker instance includes Machine Learning (ML) Jupyter Notebooks.
The guidance ingests data from the Amazon S3 bucket into the Amazon SageMaker cluster and runs the Jupyter notebooks on the dataset.
The notebooks preprocess the data, extract features, and divide the data into training and testing. Amazon S3 Select reads the Parquet compressed data that was processed by the AWS Glue job. ML algorithms process the training dataset to develop a model to identify anomalies and predict future anomalies.
Machine Learning for Telecommunication
Version 1.1.1
Last updated: 12/2019
Author: AWS
Features
Machine Learning for Telecommunication Guidance
Synthetic dataset for training

Browse our library of AWS Solutions to get answers to common architectural problems.

Find AWS Partners to help you get started.

Find prescriptive architectural diagrams, sample code, and technical content for common use cases.