Tag: Amazon Quicksight


Amazon QuickSight Now Supports Federated Single Sign-On Using SAML 2.0

by Jose Kunnackal | on | | Comments

Since launch, Amazon QuickSight has enabled business users to quickly and easily analyze data from a wide variety of data sources with superfast visualization capabilities enabled by SPICE (Superfast, Parallel, In-memory Calculation Engine). When setting up Amazon QuickSight access for business users, administrators have a choice of authentication mechanisms. These include Amazon QuickSight–specific credentials, AWS credentials, or in the case of Amazon QuickSight Enterprise Edition, existing Microsoft Active Directory credentials. Although each of these mechanisms provides a reliable, secure authentication process, they all require end users to input their credentials every time users log in to Amazon QuickSight. In addition, the invitation model for user onboarding currently in place today requires administrators to add users to Amazon QuickSight accounts either via email invitations or via AD-group membership, which can contribute to delays in user provisioning.

Today, we are happy to announce two new features that will make user authentication and provisioning simpler – Federated Single-Sign-On (SSO) and just-in-time (JIT) user creation.

Federated Single Sign-On

Federated SSO authentication to web applications (including the AWS Management Console) and Software-as-a-Service products has become increasingly popular, because Federated SSO lets organizations consolidate end-user authentication to external applications.

Traditionally, SSO involves the use of a centralized identity store (such as Active Directory or LDAP) to authenticate the user against applications within a corporate network. The growing popularity of SaaS and web applications created the need to authenticate users outside corporate networks. Federated SSO makes this scenario possible. It provides a mechanism for external applications to direct authentication requests to the centralized identity store and receive an authentication token back with the response and validity. SAML is the most common protocol used as a basis for Federated SSO capabilities today.

With Federated SSO in place, business users sign in to their Identity Provider portals with existing credentials and access QuickSight with a single click, without having to enter any QuickSight-specific passwords or account names. This makes it simple for users to access Amazon QuickSight for data analysis needs.

(more…)

Build a Visualization and Monitoring Dashboard for IoT Data with Amazon Kinesis Analytics and Amazon QuickSight

by Karan Desai | on | | Comments

Customers across the world are increasingly building innovative Internet of Things (IoT) workloads on AWS. With AWS, they can handle the constant stream of data coming from millions of new, internet-connected devices. This data can be a valuable source of information if it can be processed, analyzed, and visualized quickly in a scalable, cost-efficient manner. Engineers and developers can monitor performance and troubleshoot issues while sales and marketing can track usage patterns and statistics to base business decisions.

In this post, I demonstrate a sample solution to build a quick and easy monitoring and visualization dashboard for your IoT data using AWS serverless and managed services. There’s no need for purchasing any additional software or hardware. If you are already using AWS IoT, you can build this dashboard to tap into your existing device data. If you are new to AWS IoT, you can be up and running in minutes using sample data. Later, you can customize it to your needs, as your business grows to millions of devices and messages.

Architecture

The following is a high-level architecture diagram showing the serverless setup to configure.

 

AWS service overview

AWS IoT is a managed cloud platform that lets connected devices interact easily and securely with cloud applications and other devices. AWS IoT can process and route billions of messages to AWS endpoints and to other devices reliably and securely.

Amazon Kinesis Firehose is the easiest way to capture, transform, and load streaming data continuously into AWS from thousands of data sources, such as IoT devices. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration.

Amazon Kinesis Analytics allows you to process streaming data coming from IoT devices in real time with standard SQL, without having to learn new programming languages or processing frameworks, providing actionable insights promptly.

(more…)

Visualize Big Data with Amazon QuickSight, Presto, and Apache Spark on Amazon EMR

by Luis Wang | on | | Comments

Last December, we introduced the Amazon Athena connector in Amazon QuickSight, in the Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight post.

The connector allows you to visualize your big data easily in Amazon S3 using Athena’s interactive query engine in a serverless fashion. This turned out to be a very popular combination, as customers benefit from the speed, agility, and cost benefit that serverless business intelligence (BI) and analytics architecture brings.

Today, we’re excited to announce two new native connectors in QuickSight for big data analytics: Presto and Spark. With the Presto and SparkSQL connector in QuickSight, you can easily create interactive visualizations over large datasets using Amazon EMR.

EMR provides a simple and cost effective way to run highly distributed processing frameworks such as Presto and Spark when compared to on-premises deployments. EMR provides you with the flexibility to define specific compute, memory, storage, and application parameters and optimize your analytic requirements.

In this post, I walk you through connecting QuickSight to an EMR cluster running Presto. If you’d like a walkthrough with Spark, let us know in the comments section!

Presto overview

Presto is an open source, distributed SQL query engine for running interactive analytic queries against data sources ranging from gigabytes to petabytes. It supports the ANSI SQL standard, including complex queries, aggregations, joins, and window functions. Presto can run on multiple data sources, including Amazon S3.

Presto’s execution framework is fundamentally different from that of Hive/MapReduce. Presto has a custom query and execution engine where the stages of execution are pipelined, similar to a directed acyclic graph (DAG), and all processing occurs in memory to reduce disk I/O. This pipelined execution model can run multiple stages in parallel and streams data from one stage to another as the data becomes available. This reduces end-to-end latency and makes Presto a great tool for ad hoc data exploration over large data sets.

Walkthrough

Use the following steps to connect QuickSight to an EMR cluster running Presto:

  1. Create an EMR cluster with the latest 5.5.0 release.
  2. Configure LDAP for user authentication in QuickSight.
  3. Configure SSL using a QuickSight supported certificate authority (CA).
  4. Create tables for Presto in the Hive metastore.
  5. Whitelist the QuickSight IP address range in your EMR master security group rules.
  6. Connect QuickSight to Presto and create some visualizations.

Prerequisites

You need run Presto version 0.167, at a minimum, which is the first release that supports LDAP authentication. LDAP authentication is a requirement for the Presto and Spark connectors and QuickSight refuses to connect if LDAP is not configured on your cluster.

Create an EMR cluster with release version 5.5.0

In the EMR console, use the Quick Create option to create a cluster.  For this post, use most of the default settings with a few exceptions. To install both Presto and Spark on your cluster (and customize other settings), create your cluster from the Advanced Options wizard instead.

Make sure that EMR release 5.5.0 is selected and under Applications, choose Presto. If you have an EC2 key pair, you can use it. Otherwise, create a key pair (.PEM file) and then return to this page to create the cluster. 

(more…)

Amazon QuickSight Now Supports Audit Logging with AWS CloudTrail

by Jose Kunnackal | on | | Comments

We launched Amazon QuickSight to democratize BI. Our goal is to make it easier and cheaper to roll out advanced business analytics capabilities to everyone in an organization. Overall, this enables better understanding of business, and allows faster data-driven decisions in an organization. In the past, the ability to share data presented an administrative challenge – that of knowing who has access to what data. Solving this problem ensures compliance with policies, and also provides an opportunity for businesses to see how employees use data to drive crucial decisions.

Today, we are happy to announce support for AWS CloudTrail in Amazon QuickSight, which allows logging of QuickSight events across an AWS account. Whether you have an enterprise setting or a small team scenario, this integration will allow QuickSight administrators to accurately answer questions such as who last changed an analysis, or who has connected to sensitive data. With CloudTrail, administrators have better governance, auditing and risk management of their QuickSight usage

You can get started with CloudTrail with just a few clicks. Any AWS account that is enabled for CloudTrail will automatically see QuickSight activity included in the CloudTrail logs. When enabled, CloudTrail starts logging events including:

  • Account subscribe/unsubscribe
  • Data source create/update/delete
  • Data set create/update/delete
  • Analysis create/access/update/delete
  • Dashboard create/access/update/delete
  • SPICE capacity purchases
  • User subscription purchases

A full list of all supported events can be found in the QuickSight documentation.

With CloudTrail logging enabled, you can easily track QuickSight activities in your account, starting with the question of who signed up for the service.

(more…)

Amazon QuickSight Spring Announcement: KPI Charts, Export to CSV, AD Connector, and More!

by Jose Kunnackal | on | | Comments

Today I’m excited to share with you a number of exciting new features and enhancements in Amazon QuickSight. You can now create key performance indicator (KPI) charts, define custom ranges when importing Microsoft Excel spreadsheets, export data to comma separated value (CSV) format, and create aggregate filters for SPICE data sets. In the Enterprise Edition, we added an additional option to connect to your on-premises Active Directory using AD Connector.

KPI charts

With KPI charts, you can present a single aggregated value from a measure, and also comparisons against another measure or over a time period. These values are ideal for building dashboards and surfacing key business metrics. For example, the following KPI chart shows the sales of the current month and the difference compared to the previous month.

To get started with this new chart type, simply select a measure and QuickSight’s AutoGraph capability automatically generates a KPI chart based on the measure. You can then choose to add another measure for a target value, or a dimension for the trend group to add comparisons. Alternatively, you can select the KPI chart type icon on the Visual Types panel to create the chart manually. 

Custom ranges for Microsoft Excel spreadsheets

In this release, we also added some enhancements to our Microsoft Excel support. When you upload an Excel spreadsheet, QuickSight now automatically detects the cell range of your table in the sheet. This approach makes it even easier to analyze your Excel data. Also, if you have a specific area in the sheet that you want to focus on, you can now define a custom range in QuickSight. 

(more…)

Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight

by Ian Robinson and Ben Snively | on | | Comments

Many business and operational processes require you to analyze large volumes of frequently updated data. Log analysis, for example, involves querying and visualizing large volumes of log data to identify behavioral patterns, understand application processing flows, and investigate and diagnose issues.

VPC flow logs capture information about the IP traffic going to and from network interfaces in VPCs in the Amazon VPC service. The logs allow you to investigate network traffic patterns and identify threats and risks across your VPC estate. Flow log data is stored using Amazon CloudWatch Logs. After you’ve created a flow log, you can view and retrieve its data in Amazon CloudWatch Logs.

Flow logs can help you with a number of tasks. For example, you can use them to troubleshoot why specific traffic is not reaching an instance, which in turn can help you diagnose overly restrictive security group rules. You can also use flow logs as a security tool to monitor the traffic that is reaching your instance.

This blog post shows how to build a serverless architecture by using Amazon Kinesis Firehose, AWS Lambda, Amazon S3, Amazon Athena, and Amazon QuickSight to collect, store, query, and visualize flow logs. In building this solution, you will also learn how to implement Athena best practices with regard to compressing and partitioning data so as to reduce query latencies and drive down query costs.

Summary of the solution

The solution described here is divided into three parts:

  • Send VPC Flow Logs to S3 for Analysis with Athena. This section describes how to use Lambda and Firehose to publish flow log data to S3, and how to create a table in Athena so that you can query this data.
  • Visualize Your Logs in QuickSight. Here you’ll learn how to use QuickSight and its Athena connector to build flow log analysis dashboards that you can share with other users in your organization.
  • Partition Your Data in Athena for Improved Query Performance and Reduced Costs. This section shows how you can use a Lambda function to automatically partition Athena data as it arrives in S3. This function will work with any Firehose stream and any other delivery mechanism that writes data to S3 using a year/month/day/hour prefix.

Partitioning your data is one of three strategies for improving Athena query performance and reducing costs. The other two are compressing your data, and converting it into columnar formats such as Apache Parquet. The solution described here automatically compresses your data, but it doesn’t convert it into a columnar format. Even if you don’t convert your data to a columnar format, as is the case here, it’s always worth compressing and partitioning it. For any large-scale solution, you should also consider converting it to Parquet.

Serverless Architecture for Analyzing VPC Flow Logs

Below is a diagram showing how the various services work together.

VPC_Flowlogs_Ian_Ben

When you create a flow log for a VPC, the log data is published to a log group in CloudWatch Logs. By using a CloudWatch Logs subscription, you can send a real-time feed of these log events to a Lambda function that uses Firehose to write the log data to S3.

(more…)

Harmonize, Search, and Analyze Loosely Coupled Datasets on AWS

by Ryan Jancaitis, Oliver Atoa, and Bob Strahan | on | | Comments

You have come up with an exciting hypothesis, and now you are keen to find and analyze as much data as possible to prove (or refute) it. There are many datasets that might be applicable, but they have been created at different times by different people and don’t conform to any common standard. They use different names for variables that mean the same thing and the same names for variables that mean different things. They use different units of measurement and different categories. Some have more variables than others. And they all have data quality issues (for example, badly formed dates and times, invalid geographic coordinates, and so on).

You first need a way to harmonize these datasets, to identify the variables that mean the same thing and make sure that these variables have the same names and units. You also need to clean up or remove records with invalid data.

After the datasets are harmonized, you need to search through the data to find the datasets you’re interested in. Not all of them have records that are relevant to your hypothesis, so you want to filter on a number of important variables to narrow down the datasets and verify they contain enough matching records to be significant.

Having identified the datasets of interest, you are ready to run your custom analyses on the data they contain so that you can prove your hypothesis and create beautiful visualizations to share with the world!

In this blog post, we will describe a sample application that illustrates how to solve these problems. You can install our sample app, which will:

  • Harmonize and index three disparate datasets to make them searchable.
  • Present a data-driven, customizable UI for searching the datasets to do preliminary analysis and to locate relevant datasets.
  • Integrate with Amazon Athena and Amazon QuickSight for custom analysis and visualization.

Example data

The Police Data Initiative seeks to improve community and law enforcement relations through the public availability of data related to police activity. Datasets from participating cities, available through the Public Safety Open Data Portal, have many of the problems just outlined. Despite the commonality of crime and location metadata, there is no standard naming or value scheme. Datasets are stored in various locations and in various formats. There is no central search and discovery engine. To gain insights and value from this data, you have to analyze datasets city by city.

Although the focus of this post is police incident data, the same approach can be used for datasets in other domains, such as IoT, personalized medicine, news, weather, finance, and much more.

Architecture

Our architecture uses the following AWS services:

The diagram below illustrates the solution architecture:

HarmonizeSearch_1 (more…)

Scheduled Refresh for SPICE Data Sets on Amazon QuickSight

by Jose Kunnackal | on | | Comments

Jose Kunnackal is a Senior Product Manager for Amazon Quicksight

This blog post has been translated into Japanese.

In November 2016, we launched Amazon QuickSight, a cloud-powered, business analytics service that lets you quickly and easily visualize your data. QuickSight uses SPICE (Super-fast, Parallel, In-Memory Calculation Engine), a fully managed data store that enables blazing fast visualizations and can ingest data from AWS, on-premises, and cloud sources. Data in SPICE could be refreshed at any time with the click of a button within QuickSight.

Today, we are launching the ability to schedule these refreshes!

To schedule a SPICE data set refresh

Select a SPICE data set and choose Schedule refresh. Then, specify the time zone, frequency, and start time of the refresh.

scheduledrefresh_2

(more…)

Converging Data Silos to Amazon Redshift Using AWS DMS

by Pratim Das | on | | Comments

Organizations often grow organically—and so does their data in individual silos. Such systems are often powered by traditional RDBMS systems and they grow orthogonally in size and features. To gain intelligence across heterogeneous data sources, you have to join the data sets. However, this imposes new challenges, as joining data over dblinks or into a single view is extremely cumbersome and an operational nightmare.

This post walks through using AWS Database Migration Service (AWS DMS) and other AWS services to make it easy to converge multiple heterogonous data sources to Amazon Redshift. You can then use Amazon QuickSight, to visualize the converged dataset to gain additional business insights.

AWS service overview

Here’s a brief overview of AWS services that help with data convergence.

AWS DMS

With DMS, you can migrate your data to and from most widely used commercial and open-source databases. The service supports homogenous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora or Microsoft SQL Server to MySQL. It also allows you to stream data to Amazon Redshift from any of the supported sources including:

  • Amazon Aurora
  • PostgreSQL
  • MySQL
  • MariaDB
  • Oracle
  • SAP ASE
  • SQL Server

DMS enables consolidation and easy analysis of data in the petabyte-scale data warehouse. It can also be used for continuous data replication with high availability.

Amazon QuickSight

Amazon QuickSight provides very fast, easy-to-use, cloud-powered business intelligence at 1/10th the cost of traditional BI solutions. QuickSight uses a new, super-fast, parallel, in-memory calculation engine (“SPICE”) to perform advanced calculations and render visualizations rapidly.

QuickSight integrates automatically with AWS data services, enables organizations to scale to hundreds of thousands of users, and delivers fast and responsive query performance to them. You can easily connect QuickSight to AWS data services, including Amazon Redshift, Amazon RDS, Amazon Aurora, Amazon S3, and Amazon Athena. You can also upload CSV, TSV, and spreadsheet files or connect to third-party data sources such as Salesforce.

Amazon Redshift

Amazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and parallelizing queries across multiple nodes. Amazon Redshift is typically priced at 1/10th of the price of the competition. We have many customers running petabyte scale data analytics on AWS using Amazon Redshift.

Amazon Redshift is also ANSI SQL compliant, supports JDBC/ODBC, and is easy to connect to your existing business intelligence (BI) solution. However, if your storage requirement is in the 10s of TB range and requires high levels of concurrency across small queries, you may want to consider Amazon Aurora as the target converged database.

Walkthrough

Assume that you have an events company specializing on sports, and have built a MySQL database that holds data for the players and the sporting events. Customers and ticket information is stored in another database; in this case, assume it is PostgresSQL and this gets updated when customer purchases tickets from our website and mobile apps. You can download a sample dataset from the aws-database-migration-samples GitHub repo.

These databases could be anywhere: at an on-premises facility; on AWS in Amazon EC2 or Amazon RDS, or other cloud provider; or in a mixture of such locations. To complicate things a little more, you can assume that the lost opportunities (where a customer didn’t complete buying the ticket even though it was added to the shopping cart) are streamed via clickstream through Amazon Kinesis and then stored on Amazon S3. We then use AWS Data Pipeline to orchestrate a process to cleanse that data using Amazon EMR and make it ready for loading to Amazon Redshift. The clickstream integration is not covered in this post but was demonstrated in the recent Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics post.

Architecture

In this solution, you use DMS to bring the two data sources into Amazon Redshift and run analytics to gain business insights. The following diagram demonstrates the proposed solution.

DataSilos_1_1

(more…)

Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight

by Ben Snively | on | | Comments

Ben Snively is a Solutions Architect with AWS

Speed and agility are essential with today’s analytics tools. The quicker you can get from idea to first results, the more you can experiment and innovate with your data, perform ad-hoc analysis, and drive answers to new business questions.

Serverless architectures help in this respect by taking care of the non-differentiated heavy lifting in your environment―tasks such as managing servers, clusters, and device endpoints – allowing you to focus on assembling your IoT system, analyzing data, and building meaningful reports quickly and efficiently.

In this post, I show how you can build a business intelligence capability for streaming IoT device data using AWS serverless and managed services. You can be up and running in minutes―starting small, but able to easily grow to millions of devices and billions of messages.

AWS serverless services

AWS services offer a quicker time to intelligence. The following is a high-level diagram showing how the services in this post are configured:

o_IoT_Minutes_1

(more…)