Karmasphere Analyst used with Amazon EMR provides an intuitive, high productivity solution for working with large structured and unstructured data sets using Apache Hadoop. Karmasphere Analyst works on Windows, Mac and Linux desktop systems. It provides a comprehensive workspace for data professionals and data analysts exploring and interacting with Big Data stored on Amazon S3 using Elastic MapReduce. With Karmasphere Analyst you have immediate access to unstructured, semi-structured and structured data in Hadoop. Through familiar SQL and wizards you can make ad-hoc queries, interact with the results, and iterate.
How to get Karmasphere Analyst
Karmasphere Analyst is available in the same pay-as-you-go, hourly pricing model as Amazon Elastic MapReduce, providing a low cost of entry and a single payment process through Amazon. For pricing details and to download the Karmasphere software, please visit the Elastic MapReduce with Karmasphere Analytics detail page.
The Big Data Analytics Workflow
The workflow model comprises four stages:
How to Configure Access to Elastic MapReduce and Amazon S3
Configuring Your Amazon Credentials
To configure your Amazon AWS account credentials for the first time, select the ‘File’ main menu option and select ‘Manage Cloud Credentials’.
Click on the ‘Add’ button.
Enter the name you want to use. Enter your Access Key ID and Secret Access Key information. Test the credentials entered by clicking on the ‘Test’ button. Enter the SSH Key information by clicking on the SSH Keys button. Depending on which Amazon EMR region you use, you select that region and enter the SSH Key information. Click on the OK button.
Using an Existing JobFlow
If you have already configured a Job Flow and want to start the connection, then select that connection and click on the Start Connection icon. Once the connection is up, the red-cloud graphic turns green.
Example connections are shown in the connection pane.
Launching a New JobFlow
On the Karmasphere Analyst home view click on the ‘Access’ icon.The Access view is shown.
Click on the New Cloud connection icon. The following window is displayed. Enter the additional information required.
The connection is created and starts the JobFlow on the cluster. The connection shows up in the connection pane.
Starting a Connection to a Job Flow
(The connection pane)
Select the connection in the connection pane and click on the Start Connection icon.
How to Assemble Data and Manage Tables
Assembling and managing tables is part of the Assemble stage and is the second step of the Karmasphere Analyst workflow process. This stage allows you to collect and prepare data of any format, organize it for easy understanding and prepare it for analysis, the third stage of the workflow process. The result of the Assemble stage is one or more tables.
Analyst understands many common file and compression formats such as ZIP, GZIP and others and quickly prepares your data for analysis. Analyst comes with a sample excite data log in GZIP format for testing and analysis and is included in the Analyst installation.
Note: Prior to starting the Assemble stage, please be sure to use the Amazon EMR connection you’ve created above in order to create a table on EMR.
To get to the Assemble view, click on the ‘Assemble’ icon on the home view. The Assemble view is shown.
Click on the icon to create and load a new table on EMR. Select the name of the table and location of the data file to use to load this table. The sample data file to use is excite.log.gz, and is located in the Sample folder of your Karmasphere Analyst installation directory.
Enter a name for this new table. If the table name entered already exists, the table name field has a red border. For source data, click on the Browse button and select the excite.log.gz file. Click on the Next button shown above. Click the Next button again to continue to the next screen.
Continue to accept the default values for the next two steps. Click on the Finish button, to create and load this table.
The following dialog window is displayed, indicating that this operation was successful. Click the OK button.
Note: At the bottom of the Assemble view, you see a progress bar, showing the data being copied to your S3 bucket.This is an example:
How to Analyze Your Data
The Analyze stage is the third step of the Analyst workflow process and offers powerful features and functionality for the user.
Once you start to see patterns and trends, you can begin to iterate the results by formatting, filtering and sorting these results. Karmasphere Analyst offers the ability to use HQL, a subset of SQL to enter queries. Syntax highlighting; auto complete and visual query plans that can help you optimize a route to a successful query are also provided.
Once the results have been generated, you have the ability Act on the results, and to save and re-use scripts, export data to files and databases and integrate these results into tools like Microsoft Excel and Tableau.
To get to the Analyze view, click on the Analyze icon in the home view. The Analyze view is shown.
Note: Be sure to use the correct EMR connection when accessing the Analyze stage, so that you are accessing and working with the desired data.
Executing Queries – An Example
Enter the following command in the Query window. Any syntax errors are displayed below the Query window. Click on the ‘Run’ button on the right-hand side to execute the query.
SELECT col2, count(1) query_count FROM newtable WHERE col2 LIKE '%lake %' GROUP BY col2 ORDER BY query_count DESC
The left-hand pane shows the table schema, as shown.
>Filtering these results allows for further iteration. Click on the Filter results icon. A dialog window is displayed. Select the options as shown. Click the OK button.
This query shows us the records matching this filter.
Act on the results, by saving the results as a XLS format file. Click on the Save to XLS file icon. Enter the name and location of the XLS file to save. Click on the OK button. Use the XLS file viewer ( ) to view the results.
How to Chart Results
Karmasphere Analyst allows you to chart your results in a variety of chart types, including line, bar, column and pie charts. Click on the Chart Results icon. Select the chart type (Pie, Bar, Line, Column, Scatter) and other options, including adding a chart title. A sample pie chart of the filtered results from the filtered query above is shown.
How to Act on Results, Charts and SQL Queries
With Karmasphere Analyst, you act on the results by being able to save the results in a variety of file formats; as a database table, a Hive table, or for view in a XLS file viewer. Save charts you’ve created for later use in reports or as reference.
Launch the XLS file viewer, by clicking on the Launch XLS viewer icon, MS-Excel is started and the file is shown in Excel, as shown.