Data Lakes on AWS

Quickly build, test, and deploy your data lake with AWS and partner solutions.

Download eBook

Traditional data storage and analytic tools can no longer provide the agility and flexibility required to deliver relevant business insights. That’s why many organizations are shifting to a data lake architecture. With Data Lake Quick Starts and customer-ready solutions, AWS and competency partners make it faster and easier to build your data lake. A data lake is an architectural approach that allows you to store massive amounts of data into a central location, so it's readily available to be categorized, processed, analyzed and consumed by diverse groups within an organization. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand.

Learn how AWS and APN Competency Partners have helped organizations migrate massive volumes of heterogeneous data to a data lake on AWS, where they can swiftly and simply leverage it for critical business insights.

Download eBook

A data lake on AWS can help you:

Collect and store any type of data, at any scale, and at low cost
Secure the data and prevent unauthorized access
Catalogue, search, and find the relevant data in the central repository
Quickly and easily perform new types of data analysis
Use a broad set of analytic engines for ad hoc analytics, real-time streaming, predictive analytics, artificial intelligence (AI), and machine learning

A data lake can also complement and extend your existing data warehouse. If you’re already using a data warehouse, or are looking to implement one, a data lake can be used as a source for both structured and unstructured data.

Building a Data Lake on AWS

A data lake on AWS gives you access to the most complete platform for big data. AWS provides you with secure infrastructure and offers a broad set of scalable, cost-effective services to collect, store, categorize, and analyze your data to get meaningful insights. AWS makes it easy to build and tailor your data lake to your specific data analytic requirements. You can get started using one of the available Quick Starts or leveraging the skills and expertise of an APN partner to implement one for you. A data lake can be used as a source for both structured and unstructured data.

Advantages of a Data Lake on AWS

Flexibility

Easily ingest data in a variety of ways, including leveraging Amazon Kinesis, AWS Import/Export Snowball, AWS Direct Connect, and more. Store all of your data, regardless of volume or format, using Amazon Simple Storage Service (Amazon S3).

Agility

Deploy the infrastructure you need almost instantly. This means your teams can be more productive, it’s easier to try new things, and projects can roll out sooner.

Security and Compliance

AWS provides capabilities across facilities, network, software, and business processes to meet the strictest requirements. Environments are continuously audited for certifications such as ISO 27001, FedRAMP, DoD SRG, and PCI DSS.

Broad and Deep Capabilities

Build virtually any big data application and support any workload regardless of volume, velocity, and variety of data. With 50+ services and hundreds of features added every year, AWS provides everything you need to collect, store, process, analyze, and visualize big data on the cloud.

Featured APN Technology Partners

Learn More »

Read Case Study »

Attunity

Fanatics, a popular sports apparel website and fan gear merchandiser, needed to ingest terabytes of data from multiple historical and streaming sources – transactional, e-commerce, and back-office systems – to a data lake on Amazon S3. Once ingested, the data would be analyzed to better identify, predict, and fulfill customer needs related to the products Fanatics offers in over 300 online and offline stores.

To accomplish this, Fanatics chose Attunity Replicate, a software solution featuring continuous data capture (CDC) and parallel threading for streaming data in real time from multiple sources into a data lake on Amazon S3. The data can then be consumed in Apache Kafka for real-time analytics. Attunity helps Fanatics avoid the heavy lifting of manually extracting data from disparate sources and enables the organization to see results in real time.

Webinar Title: Fanatics Ingests Streaming Data to a Data Lake on AWS

Customer Presenter: Alan Chang, Senior Product Manager, Fanatics
Attunity Presenter: Jordan Martz, Director of Technology
AWS Presenter: Paul Sears, Solutions Architect

View On-Demand Webinar

Learn More »

Databricks

Performing data science workloads on data from disparate sources – data lake, data warehouse, streaming, and more – creates challenges for organizations needing to use their data to drive operational and product improvements. Textbook publisher McGraw-Hill needed to remove such data silos so it could transform its business model to accommodate a growing focus on digital learning. Specifically, the company wanted the ability to quickly perform complex analytics operations and enable collaboration between business analysts, data engineers, and data scientists.

McGraw-Hill deployed Databricks, a unified analytics platform that allows it to work efficiently with streaming data as well as historical data stored in data lakes on Amazon S3 and in multiple data warehouses. In this webinar, you’ll learn how Databricks, developed by the original creators of Apache Spark™, enables McGraw-Hill to analyze streaming and historical data at a scale and speed their previous solution simply couldn’t provide. Data science workloads that used to take weeks, now take hours.

Webinar Title: McGraw-Hill Optimizes Analytics Workloads with Databricks
Customer Presenter: Matthew Ashbourne, Lead Software Engineer, McGraw-Hill Education
Databricks Presenter: Brian Dirking, Sr Director of Partner Marketing
AWS Presenter: Pratap Ramamurthy, Partner Solutions Architect

View On-Demand Webinar

Learn More »

AWS Quick Start »

Read Case Study »

View AWS Marketplace listing »

Qubole

Big data technologies can be both complex and involve time consuming manual processes. Organizations that intelligently automate big data operations lower their costs, make their teams more productive, scale more efficiently, and reduce the risk of failure.

In our webinar, representatives from TiVo, creator of a digital recording platform for television content, will explain how they implemented a new big data and analytics platform that dynamically scales in response to changing demand. You’ll learn how the solution enables TiVo to easily orchestrate big data clusters using Amazon Elastic Cloud Compute (Amazon EC2) and Amazon EC2 Spot instances that read data from a data lake on Amazon Simple Storage Service (Amazon S3) and how this reduces the development cost and effort needed to support its network and advertiser users. TiVo will share lessons learned and best practices for quickly and affordably ingesting, processing, and making available for analysis terabytes of streaming and batch viewership data from millions of households.

Webinar Title: Tivo: How to scale new products with a data lake on AWS and Qubole

Customer Presenter: Ashish Mrig, Senior Manager, Big Data Analytics, TiVo
Qubole Presenter: Harsh Jetly, Solutions Architect
AWS Presenter: Paul Sears, Solutions Architect

View On-Demand Webinar

Learn More »

Read the Beachbody Case Study »

Download Solution Brief »

Talend

Learn how to reduce development time and innovate on AWS. In this webinar, Beachbody - sellers of fitness, weight loss, and muscle-building home-exercise videos - talks about their experience migrating to a data lake architecture on AWS using Talend. Beachbody will describe how they created an open enterprise data platform, giving their employees access to secure, well-governed data, and increasing DevOps efficiency across the entire company.

Join our webinar and find out how Talend and AWS helped Beachbody migrate a variety of unstructured and structured data sources to a data lake, shorten development and testing cycles, and solve complex deployment challenges common with real-time data.

Webinar Title: Architecting an Open Data Lake for the Enterprise
Talend Presenter: Ashwin Viswanath, Director, Cloud Product Marketing
Customer Presenter: Eric Anderson, Executive Director, Data, Beachbody
AWS Presenter: Pratap Ramamurthy, Solutions Architect

View On-Demand Webinar

Read White Paper »

View On-Demand Webinar »

Informatica

The Informatica Intelligent Data Lake Management solution enables you to ingest, cleanse, process, govern, and secure high volumes of raw data into a trusted data lake on AWS. Informatica’s metadata-driven AI and enterprise cataloging capabilities empower business stakeholders such as analysts to quickly discover, profile, prepare, and secure data for timely, relevant business insights. In short, Informatica empowers businesses to leverage the power of a data lake on AWS and unleash big data insights that help drive innovation and sales.

Learn More »

Read blog post »

View AWS Marketplace Listing »

Looker

Today’s businesses run on big data and the metrics generated by that data need to be centrally defined and thoroughly accessible to be of real benefit. Today’s solution is Looker, a modern data platform that allows everyone in the company to find and explore the data they need to make decisions. Looker is built for cloud platforms like Amazon Web Services (AWS) and allows you to query modern cloud databases like data lakes directly. Customers use Looker for internal analytics, as well as to expose data to customers, partners and vendors.

Learn More »