AWS Storage Blog

Break down silos with AWS Transfer Family and Dremio’s Data Lakehouse Platform

Becoming a data-driven organization is a top priority for the majority of executives. According to an IDC data culture survey from 2021, 83% of CEOs want their organizations to be more data-driven, and 87% of CxOs say that giving their organizations the ability to make better strategic, tactical and operational decisions is their top priority.

However, the reality on the ground is that only 30% of those surveyed say that their actions are driven by data, and only 29% are being asked to communicate using data-driven methods.

One reason for this roughly 50% gap and disconnect between what the C-Suite is asking and what practitioners are able to deliver is the growth of data silos. Business groups and functional teams in large enterprises have adopted a multitude of point solutions over the years, and many of these products don’t integrate with the tools and technologies adopted by other parts of their organizations. Each of these applications generate data that is often stored locally, but may never move into an analytic database or data lake for collective use and big data insights.

An unfortunate result of the proliferation of point solutions is that few, if any, teams within a large enterprise actually operate with a complete view of their data, and many teams have different versions of the truth based on their data. This lack of consistency often has a negative impact on operations and the customer experience, and breaking down data silos should be the top priority of every organization that wants to become data-driven.

The solution to this challenge is to build a centralized data lakehouse. The newest and fastest-growing sources of customer and operational data are already being stored primarily in cloud object storage, like Amazon S3. By moving application data out of silos into a centralized data lakehouse, and then providing users with access to analytics on the lakehouse, businesses can empower a range of technical and non-technical users to discover insights and collaborate using a single source of truth.

This post details how Dremio, an AWS Partner, and AWS Transfer Family deliver a seamless method for enterprises to move data into a data lake for data processing, analytics, and redistribution. With an AWS Transfer Family and Dremio data lake solution, you get a a single, governed source of truth for business metrics, including data from external vendors and partners. This data powers BI dashboards and interactive analytics directly on data lake storage.

Walkthrough

AWS Transfer Family offers a variety of protocols enabling you to securely and efficiently move data from third-party vendors and external sources, and store it as tables in Amazon Simple Storage Service (S3). AWS Transfer Family helps businesses break down data silos by moving data from point solutions into a central repository, where it can be queried alongside data from a variety of other sources.

Dremio is an open lakehouse platform enabling you to query data directly on Amazon S3. Whereas standard proprietary data warehouses require data teams to build complex ETL pipelines in order to make the data available for analytics, Dremio gives organizations full ANSI SQL functionality directly on data lake storage, dramatically reducing the effort and time required to get insights from Amazon S3. Dremio features a semantic layer that broadens access to the data for technical and non-technical data consumers, and a query acceleration technology that enables sub-second queries on large-scale datasets.

Together, Dremio and AWS Transfer Family provide you with a unified and consistent view of an organization’s data, including siloed data sitting in individual business units, external data that provides additional insights, and self-service access to analytics for a broad range of data consumers.

Solution architecture

The following architecture shows how Dremio and AWS Transfer Family work together to provide access not only to all of an organization’s data, but also to key data from vendors and partners, to enable analytic use cases, including mission-critical business intelligence (BI and reporting), data science and ad-hoc data exploration, and home-grown and commercial data applications meant to deliver analytics as a product or service to non-technical consumers.

In this architecture, data from AWS Transfer Family is stored in Apache Iceberg, an open table format for cloud object storage that is growing in popularity for its openness and ability to scale. Dremio gives data consumers the ability to query data in Amazon S3 directly, without any additional data movement, copying, or transformation. The semantic layer provides self-service access to Amazon S3 and other sources like relational databases for a wide range of data consumers and analytics use cases.

Figure 1: Architecture diagram

In this architecture, data from AWS Transfer Family is stored in Apache Iceberg, an open table format for cloud object storage that is growing in popularity for its openness and ability to scale. Dremio gives data consumers the ability to query data in Amazon S3 directly, without any additional data movement, copying, or transformation. The semantic layer provides self-service access to Amazon S3 and other sources like relational databases for a wide range of data consumers and analytics use cases.

As a result of this architecture, data is available much more quickly than it is with solutions that rely on proprietary formats. The data remains open in Amazon S3, so data teams can leverage different tools and engines for specific use cases. And data consumers have self-service access to all of their data for analytics.

Conclusion

This post shows how Dremio works with AWS Transfer Family to easily move data into Amazon S3 so data teams can build and continuously feed a data lake. By breaking down silos across the organization, data consumers have better access to a complete view of the data and are empowered to make better data-driven decisions.

To try Dremio Cloud, visit the get started page and sign up for a free tier in AWS. Dremio Cloud takes only a few minutes to spin up in your own AWS account and connect to an Amazon S3 data lake. You can start querying your data and seeing results within minutes. All you pay for is for AWS infrastructure and the data you move with AWS Transfer Family. Try Dremio Cloud and AWS Transfer Family today, and start driving value and insights from your Amazon S3 data. For any questions, leave a comment for the AWS team or visit the AWS Transfer Family product page to learn more.

Jeremiah Morrow

Jeremiah Morrow

Jeremiah Morrow is the Partner Solution Marketing Director for Dremio. He is responsible for supporting all of Dremio's partner ecosystem, including cloud service providers, technology partners, and channel and systems integrators. Jeremiah joined Dremio in February of this year. Over the past ten years he has worked in partner and industry marketing, analyst relations, sales and business development for technology and advisory companies, including Vertica, OVH, SoftwareONE, and Gartner.

Louise Ping

Louise Ping

Louise Ping is an AWS service-aligned Storage Specialist in the Worldwide Specialist Organization, who develops and implements Go-to-Market strategy for AWS Transfer Family and other key services. She helps internal teams and partners sell AWS services and solutions to enterprise customers. She has over 20 years of experience in product marketing and product management in startups as well as Microsoft, Apple, and Adobe.