Near Streamlines Big Data Processing with Data Lake on AWS

2020

AI-Powered Marketing Technology Platform

With information on 1.6 billion people in 44 countries, Near has one the largest datasets of people’s behavior on the planet. Near is a data intelligence company whose AI-powered platform helps media companies and publishing houses deliver more effective campaigns. Founded in Singapore in 2012, the company expanded to Australia in 2014 and Europe in 2015 and is now growing its business in the US. Near’s combination of products and analytics yield real-time results for customer queries that support marketing campaigns.

MPL
kr_quotemark

“Developers are happy, and our data science teams have benefited from using AWS to drive innovation.”

Mahesh Vandi Chalil
Senior Vice President and Head of Engineering, Near

 

An Efficient Data Science Team

The company migrated to the Amazon Web Services (AWS) Cloud in 2019 because it was struggling with latency and long lead times for provisioning, often taking 8 hours or more to scale up its bare-metal cloud infrastructure from another provider. “Our goals for migration were high elasticity, low latency, and increased efficiency in our data science team,” explains Mahesh Vandi Chalil, senior vice president and head of engineering at Near. Retrieving and processing raw data from various third-party sources was often a lengthy and painstaking process, and engineers were limited in terms of big data analysis.

“From a data science perspective, the motivators for moving to AWS included scaling our data platform in new countries and fully automating our lifecycle management for machine learning and artificial intelligence models,” adds Ravi Kaushik, vice president and head of data science at Near.

Security was also a major factor for Near. Amit Kumar, head of DevOps at Near, says, “By leveraging solutions such as AWS Identity and Access Management (IAM), we can now manage access, create roles, and set permissions to services and resources securely and that’s a great plus.”

Support When and Where It Counts

Engineers performed tests before implementing AWS to ensure latency was consistently under 100 milliseconds, the target SLA required to serve global customers such as mass media company News Corp. Many of Near’s engineers were new to the AWS Cloud, so Mahesh subscribed to AWS Enterprise Support to get priority technical assistance, especially during migration. “The knowledge of AWS support teams is remarkable. This really stood out from our previous vendor,” Mahesh comments.

Near’s 30 engineers have regular meetings with their AWS technical account manager and solutions architect to learn about the nuances of AWS products currently in use and how to optimize their architecture. Even during the peak of migration, when support tickets were running into double digits, the company received the help it needed to avoid system errors and downtime. This assistance often entailed finding the right Amazon Elastic Compute Cloud (Amazon EC2) instance sizes or types for each workload. “With AWS, we get the right tools for the right job. An example is the Amazon EC2 R5 instances that are memory optimized and suited to run our Apache Spark workloads,” Ravi says.

New Capabilities with a Scalable Data Lake

Near had been using Apache Hadoop and its corresponding Hadoop Distributed File System (HDFS) to store and process data. However, the team switched to Amazon Simple Storage Service (Amazon S3) to improve storage scalability and start building a data lake. “Accessing data in Amazon S3 is much easier than our HDFS solution,” Mahesh says. “Amazon S3 keeps things organized with logical partitions and provides a single storage for all data types. It’s extremely stable with high availability, and the redundancy aspect is taken care of by automatically storing data across multiple devices spanning a minimum of three AWS Availability Zones.”

Engineering teams are now incorporating Amazon Redshift and Amazon Athena to set up a data warehouse and run serverless queries on top of Amazon S3. “We are seeing firsthand how powerful these services are for running big data queries for some of our critical use cases,” Mahesh says.

For data processing, the company switched from Cloudera to Amazon EMR. “On-demand processing for our data science and engineering teams has become much easier with Amazon EMR,” Mahesh says. Capacity planning used to be a headache for Mahesh, but Amazon EMR has helped address the uncertainties that come with expansion. Furthermore, with the support of AWS Professional Services, Near experienced a smooth migration to Amazon EMR. “The AWS Professional Services team validated our architecture and adoption of Amazon EMR and was always available to provide highly skilled support. We worked as one team,” Mahesh adds.

More Data, More Clients

About 40 percent of Near’s Amazon EMR clusters are running on Amazon EC2 Spot Instances, which has reduced both the cost and time required to run big data queries. Since migrating to AWS, the number of data requests Near handles daily has gone from five billion to eight billion. Although its total spend has remained relatively stable, the volume of data has spiked, the number of data scientists has doubled, and more customers have come on board. Data scientists can also perform more complex queries for analysis.

This is expected to yield long-term benefits in terms of winning bigger projects and attracting new clients because global marketers often require their marketing technology vendors to have a minimum amount of data before awarding new contracts. Of particular interest to global marketers is Near’s data enrichment service. This service provides consumer intelligence that enhances marketers’ first-party data, ultimately leading to more detailed insights about their audiences. The more data Near is able to ingest and process, the greater its potential for attracting clients seeking data enrichment.

Confident and Happy Teams

Mahesh has peace of mind knowing that Near’s architecture is robust enough to handle expansion at any scale. “Whenever we decide to further expand in terms of new countries or markets, I’m not worried, because I can scale to any level with our global AWS infrastructure,” he says.

Furthermore, the company is also confident that it can manage costs seamlessly while focusing on expansion. Sooraj Balakrishnan, principle engineer at Near, says, “By using tools such as Amazon CloudWatch, we’re able to gain a high level of visibility across our applications to better manage operational performance and resources, resulting in improved cost predictability and projections.”

Going forward, DevOps engineers are excited about trying new AWS tools such as Amazon CodeGuru to automate code hygiene. Mahesh says, “Developers are happy, and our data science teams have benefited from using AWS to drive innovation.”


About Near

Based in Singapore, Near is a data intelligence company with an AI-powered platform that processes data on 1.6 billion monthly active users in 44 countries. Its suite of software as a service (SaaS) offerings for marketing and data enrichment helps companies run high-impact campaigns.

Benefits of AWS

  • Ensures latency of 100 milliseconds or less
  • Scales to support increase in daily data requests from 5 billion to 8 billion
  • Controls infrastructure costs even during rapid expansion
  • Gives teams confidence to expand to new countries with elastic global architecture
  • Enables larger, more complex queries for big data
  • Facilitates the creation of a data lake and data warehouse to simplify analysis

AWS Services Used

Amazon Simple Storage Service

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »

Amazon EMR

Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. With EMR you can run Petabyte-scale analysis at less than half of the cost of traditional on-premises solutions and over 3x faster than standard Apache Spark.

Learn more »

Amazon EC2 Spot Instances

Amazon EMR reduces the complexity of managing big data frameworks (e.g. Apache Spark and Hive), while taking advantage of cloud best practices such as separating compute and storage.

Learn more »

AWS Professional Services

Adopting the AWS Cloud can provide you with sustainable business advantages. Supplementing your team with specialized skills and experience can help you achieve those results. The AWS Professional Services organization is a global team of experts that can help you realize your desired business outcomes when using the AWS Cloud.

Learn more »


Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.