AWS Big Data Blog
Using QuickSight parameters and controls to drive interactivity in your dashboards
Amazon QuickSight added support for parameters, on-screen controls, and URL actions earlier this year. In this blog post, we walk through several examples to show how you can use these capabilities within interactive dashboards for your audience.
How SimilarWeb analyze hundreds of terabytes of data every month with Amazon Athena and Upsolver
This is a guest post by Yossi Wasserman, a data collection & innovation team leader at Similar Web. SimilarWeb, in their own words: SimilarWeb is the pioneer of market intelligence and the standard for understanding the digital world. SimilarWeb provides granular insights about any website or mobile app across all industries in every region. SimilarWeb […]
How Pagely implemented a serverless data lake in AWS to facilitate customer support analytics
In this post, we discuss how Pagely worked with Beyondsoft, an AWS Advanced Consulting Partner, to use ConvergDB, an open-source tool developed by Beyondsoft, to build a DevOps-centric data pipeline. This pipeline uses AWS Glue to transform application logs into optimized tables that can be queried quickly and cost effectively using Amazon Athena.
Amazon QuickSight now supports Email Reports and Data Labels
Today, we are excited to announce the availability of email reports and data labels in Amazon QuickSight. Email reports With email reports in Amazon QuickSight, you can receive scheduled and one-off reports delivered directly to your email inbox. Using email reports, you have access to the latest information without logging in to your Amazon QuickSight […]
How to access and analyze on-premises data stores using AWS Glue
This post demonstrates how to set up AWS Glue in a hybrid environment. While using AWS Glue as a managed ETL service in the cloud, you can use existing connectivity between your VPC and data centers to reach an existing database service without significant migration effort. This provides you with an immediate benefit.
Migrate RDBMS or On-Premise data to EMR Hive, S3, and Amazon Redshift using EMR – Sqoop
This blog post shows how our customers can benefit by using the Apache Sqoop tool. This tool is designed to transfer and import data from a Relational Database Management System (RDBMS) into AWS – EMR Hadoop Distributed File System (HDFS), transform the data in Hadoop, and then export the data into a Data Warehouse (e.g. in Hive or Amazon Redshift).
Viewing Amazon OpenSearch Service Error Logs
Today, Amazon OpenSearch Service announces support for publishing error logs to Amazon CloudWatch Logs. This new feature provides you with the ability to capture error logs so you can access information about errors and warnings raised during the operation of the service. These details can be useful for troubleshooting. You can then use this information […]
Get sub-second query response times with Amazon Redshift result caching
In this post, we take a look at query result caching in Amazon Redshift. Result caching does exactly what its name implies—it caches the results of a query.
Build a Concurrent Data Orchestration Pipeline Using Amazon EMR and Apache Livy
In this post, we explore orchestrating a Spark data pipeline on Amazon EMR using Apache Livy and Apache Airflow, we create a simple Airflow DAG to demonstrate how to run spark jobs concurrently, and we see how Livy helps to hide the complexity to submit spark jobs via REST by using optimal EMR resources.
Exploratory data analysis of genomic datasets using ADAM and Mango with Apache Spark on Amazon EMR
In this post, we describe how to set up and run ADAM and Mango on Amazon EMR. We demonstrate how you can use these tools in an interactive notebook environment to explore the 1000 Genomes dataset, which is publicly available in Amazon S3 as a public dataset.









