AWS Big Data Blog
Tag: AWS Glue
Create cross-account and cross-region AWS Glue connections
In this blog post, we describe how to configure the networking routes and interfaces to give AWS Glue access to a data store in an AWS Region different from the one with your AWS Glue resources. In our example, we connect AWS Glue, located in Region A, to an Amazon Redshift data warehouse located in Region B.
Easily manage table metadata for Presto running on Amazon EMR using the AWS Glue Data Catalog
In this post, we will explore how the AWS Glue Data Catalog addresses discoverability and manageability for table metadata for Presto on Amazon EMR.
AWS Glue Now Supports Scala Scripts
We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal.
Simplify Querying Nested JSON with the AWS Glue Relationalize Transform
AWS Glue has a transform called Relationalize that simplifies the extract, transform, load (ETL) process by converting nested JSON into columns that you can easily import into relational databases. Relationalize transforms the nested JSON into key-value pairs at the outermost level of the JSON document. The transformed data maintains a list of the original keys from the nested JSON separated by periods. Let’s look at how Relationalize can help you with a sample use case.
Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production
This is a guest post by Rafi Ton, founder and CEO of NUVIAD. The ability to provide fresh, up-to-the-minute data to our customers and partners was always a main goal with our platform. We saw other solutions provide data that was a few hours old, but this was not good enough for us. We insisted on providing the freshest data possible. For us, that meant loading Amazon Redshift in frequent micro batches and allowing our customers to query Amazon Redshift directly to get results in near real time. The benefits were immediately evident. Our customers could see how their campaigns performed faster than with other solutions, and react sooner to the ever-changing media supply pricing and availability. They were very happy.
Visualize AWS Cloudtrail Logs Using AWS Glue and Amazon QuickSight
In this post, I walk through using AWS Glue and AWS Lambda to convert AWS CloudTrail logs from JSON to a query-optimized format dataset in Amazon S3. I then use Amazon Athena and Amazon QuickSight to query and visualize the data.
Build a Data Lake Foundation with AWS Glue and Amazon S3
A data lake is an increasingly popular way to store and analyze data that addresses the challenges of dealing with massive volumes of heterogeneous data. A data lake allows organizations to store all their data—structured and unstructured—in one centralized repository. Because data can be stored as-is, there is no need to convert it to a predefined schema. This post walks you through the process of using AWS Glue to crawl your data on Amazon S3 and build a metadata store that can be used with other AWS offerings.
Unite Real-Time and Batch Analytics Using the Big Data Lambda Architecture, Without Servers!
In this post, I show you how you can use AWS services like AWS Glue to build a Lambda Architecture completely without servers. I use a practical demonstration to examine the tight integration between serverless services on AWS and create a robust data processing Lambda Architecture system.
Harmonize, Query, and Visualize Data from Various Providers using AWS Glue, Amazon Athena, and Amazon QuickSight
Have you ever been faced with many different data sources in different formats that need to be analyzed together to drive value and insights? You need to be able to query, analyze, process, and visualize all your data as one canonical dataset, regardless of the data source or original format. In this post, I walk […]
Upsert into Amazon Redshift using AWS Glue and SneaQL
This is a guest post by Jeremy Winters and Ritu Mishra, Solution Architects at Full 360. In their own words, “Full 360 is a cloud first, cloud native integrator, and true believers in the cloud since inception in 2007, our focus has been on helping customers with their journey into the cloud. Our practice areas […]