Integrated data analysis infrastructure built on Amazon Redshift
Total data volume of 6 PB, or, 50 TB of data per day can be processed at high speed.
Utilized to improve communication quality and sophistication of digital marketing

2022

NTT DOCOMO, Inc., a leading mobile telecommunications carrier, has been using Amazon Redshift from Amazon Web Services (AWS) as its company-wide common integrated data analysis platform since 2014. The total data volume amounts to 6 PB, and more than 2,600 analysts use it for various analyses. In August 2021, the company migrated to Amazon Redshift's RA3 node type to support large data volumes and enhance performance. The integrated data analysis platform has become an indispensable part of the company's business, used to optimize the telecommunications network and advance digital marketing.

AWS Case Study| NTT DOCOMO, INC.
kr_quotemark

"Amazon Redshift contributes to the optimization of communication networks by processing petabytes of data without a break and providing it quickly to users. We also use it to advance our digital marketing, and the integrated data analysis platform has become an indispensable part of our business.

Mr. Ken Ota
NTT DOCOMO, INC.
Service Innovation Department
General Manager

Selecting Amazon Redshift for security and scalability

NTT DOCOMO operates smart life, corporate, and international businesses, as well as R&D, centered on telecommunications. Data analysis is indispensable in all aspects of its business, such as improving the quality of communication services, planning and developing new services, and streamlining operations. Therefore, the company has built a company-wide integrated data analysis platform, and since October 2014 has provided it to analysts in business divisions.

The integrated data analysis platform accumulates data from equipment and quality control in the telecommunications business, data on various services provided in the smart life business, and so on. As of April 2022, the total number of analysts has reached approximately 2,600, with a total data volume of 6 PB and a daily data volume of over 50 TB subject to data processing. “The data we handle ranges from various logs of telecommunication devices to customer-related information on the use of various services. We expose the SQL interface of the database to analysts through ETL processing, so they can not only visualize with tools, but also analyze freely using Excel, R, and Python," said Jun Sasaki, Manager in charge of Big Data, Service Innovation Department.

Before building the integrated data analysis platform, there was a data warehouse (DWH) for each facility and service. As a result, integrated analysis required coordination between departments, and it took a long time before the data was ready. We began developing an integrated data analysis infrastructure with the aim of creating an environment in which analysis could be performed immediately.
“AWS has abundant security services such as Amazon VPC, AWS Direct Connect, and AWS IAM. Third-party certification was also obtained and the design was able to meet our security standards for on-premise use. In addition, we chose Amazon Redshift for its high-capacity and high-speed DWH because it scales easily as data grows and its rich managed services reduce the operational burden," said Mr. Sasaki.

Increasing ETL1.2-1.4x with automatic WLM on RA3 nodes

The integrated data analysis platform, which started operation with 125 nodes of Amazon Redshift in 2014, has adopted the optimal configuration while conducting performance verification according to the timing of updating Reserved Instances and the introduction of new node types. In 2017, NTT DOCOMO added 125 Amazon Redshift DS2 nodes, and in 2020 NTT DOCOMO replaced the entire 125 nodes with Amazon Redshift Spectrum, which can perform queries on groups of data on Amazon S3, to create a cluster of Amazon Redshift and Amazon Redshift Spectrum.

In August 2021, in anticipation of the data explosion caused by 5G, NTT DOCOMO migrated Amazon Redshift from DS2 nodes to the latest RA3 nodes, with a configuration of 64 nodes (8PB). "In 2021, Amazon Redshift Spectrum, which uses external data, had issues with slow query execution speed and increased costs due to pay-as-you-go fees," said Hirokazu Hayakawa, Service Innovation Department. "Therefore, we integrated it into the RA3 node of Amazon Redshift, which can hold large amounts of data internally. During the implementation, we conducted performance verification with the help of the AWS support team and migrated while they guided us on how to efficiently utilize the RA3 nodes."

The migration to the RA3 node has greatly enhanced the performance of the integrated data analysis platform. In the previous environment, Amazon Redshift workload management (WLM) had to be done manually, tuning memory and concurrency allocation depending on time of day. As a result, unexpected load increases sometimes caused processing delays. The RA3 node applies automatic WLM to optimize Amazon Redshift performance by allocating appropriate processing resources.
“The processing speed is 1.2 to 1.4 times faster than before for ETL processing of 50TB of data per day. The time it takes to provide SQL data to users has been reduced by about 3 hours from the past, enabling us to provide fresher data to analysts," says Mr. Hayakawa.

Other new features provided by the RA3 node, Amazon Redshift Data Sharing, allow secure and easy data sharing between Redshift clusters. This reduces data transfer times and eliminates the problems of conventional environments, such as duplicate data retention and increased load. In addition, in the Amazon Redshift usage environment, the user management system has reduced capacity pressures scheme caused by some users retaining large amounts of data and the increase in Amazon Redshift storage costs.

Meanwhile, the new integrated data analysis platform now offers Amazon SageMaker Studio, which encompasses all the tools needed for machine learning. In doing so, an IAM Policy of AWS IAM is created for each project to ensure secure operations, eliminating the risk of unauthorized users sharing data. Mr. Yuya Matsubara, Service Innovation Department, said, "Amazon SageMaker Studio allows analysts to create models without any machine learning expertise. Even when processing large amounts of data, we can launch and scale instances on a per-user basis, so there is no impact on other users, and the system is highly regarded for its comfort and ease of use.”

Evolving into an indispensable integrated data analysis platform for telecommunications and service businesses

The integrated data analysis platform, which has evolved over the past eight years since its initial construction, is now an integral part of NTT DOCOMO's business.
"We contribute not only to the early detection of network anomalies in the telecommunications business, but also to improving communication quality through medium- to long-term analysis and optimizing base station design. In addition, data analysis of the membership base across services and businesses is also useful for more sophisticated digital marketing. From the standpoint of system operation, as the work system changes with COVID‑19, the benefits of being freed from hardware maintenance are significant and contribute to a new way of working," says Mr. Sasaki.

Providing an environment where users can run analyses easily and freely

NTT DOCOMO plans to continue to provide fresh information by adding real-time analysis visualization functions to its integrated data analysis platform. By opening the platform to users in other divisions using Amazon Redshift Data Sharing, and expanding the analysis environment using Amazon SageMaker Studio, it aims to create an environment where users can easily and freely run analyses.
"From now on, we will also focus on SLA/SLO to improve stability, and improve the quality of data provision while strengthening the monitoring system. We will also promote dynamic data utilization through serverless architectures such as AWS Lambda and Amazon Redshift Serverless, and provide fresh information to analysts by storing data in Amazon Redshift as quickly as possible," said Mr. Matsubara.

Mr. Ken Ota

Mr. Jun Sasaki

Hirokazu Hayakawa

Yuya Matsubara


Customer Profile: NTT DOCOMO, INC.

  • Established: July 1, 1992
  • Capital: JPY 949,679 million
  • Number of employees: 8,847 (Group 46,506) (as of March 31, 2022)
  • Business: Telecommunications business, smart life business, others

Benefits of adopting AWS and future prospects

  • Realizes 6PB ​​large capacity data retention
  • 1.2 to 1.4x performance improvement for ETL processing of 50TB of data
  • Reduces data transfer times and eliminates the problems of conventional environments, such as duplicate data retention and increased load.
  • Secure and stable analysis environment
  • Strengthens monitoring functions, introduces real-time analysis visualizations, improves inter-departmental collaborations, expands the analysis environment, and utilizes serverless architectures, etc.

Key Services Currently In Use

Amazon Redshift

Redshift allows petabytes of structured and semi-structured data in data warehouses, operational databases, and data lakes to be queried using standard SQL.

Learn more »

Amazon SageMaker Studio

Amazon SageMaker Studio provides a single, web-based visual interface to perform all ML development steps, increasing the productivity of data science teams up to 10x.

Learn more »

AWS Direct Connect

AWS Direct Connect is a cloud services solution that simplifies building dedicated network connections from on-premise to AWS. AWS Direct Connect allows you to establish a private connection between AWS and your data center, office, or co-location environment.

Learn more »

AWS IAM

With AWS Identity and Access Management (IAM), you can now securely manage access to AWS services and resources. IAM can be used to create and manage AWS users and groups, as well as grant or deny access to AWS resources with the use of permissions.

Click here for details »