AWS Big Data Blog

Detect and resolve HBase inconsistencies faster with AI on Amazon EMR

HBase operations teams spend hours manually correlating logs, metadata, and consistency reports to identify root causes. Traditional approaches require deep expertise and extensive investigation across scattered data sources, directly impacting MTTR and operational efficiency. As HBase deployments scale and expertise becomes increasingly scarce, organizations face mounting pressure to maintain service reliability while managing growing operational complexity. The manual nature of troubleshooting creates bottlenecks that delay incident resolution, increase operational costs, and risk service degradation during critical business periods.

In this post, we show you how to build an AI-powered troubleshooting solution using Amazon OpenSearch Service vector search and intelligent analysis. This solution reduces HBase inconsistency resolution from hours to minutes and root cause identification from days to hours through natural language queries over operational data. This democratizes HBase troubleshooting capabilities across teams and reducing dependency on specialized expertise.

Solution overview

The solution addresses HBase troubleshooting challenges through data processing, vector search, and AI-powered analysis. It processes operational data from Amazon EMR clusters, generates semantic vector embeddings, and enables natural language queries for intelligent troubleshooting.
Key components include:

  • Amazon EMR HBase: Runs HBase workloads with Amazon S3 as the HBase rootdir for durable, scalable storage
  • Data Processing: Extracts and processes HBase logs, HBCK reports, and metadata with vector embeddings
  • Amazon OpenSearch Service: Provides vector search capabilities with k-NN algorithms for semantic analysis
  • AI Analysis Interface: Enables natural language queries with context-aware recommendations
  • Custom Knowledge Base: Supports organization-specific runbooks and troubleshooting procedures by ingesting Git repositories via Kiro CLI‘s /knowledge add command, enabling the AI assistant to reference custom operational guides alongside HBase source code and operational tools

AWS cloud architecture diagram showing an HBase log analysis system with EMR cluster, VPC networking, IAM roles, Lambda functions, OpenSearch domain, and supporting services for scalable log processing and analytics.

The preceding diagram illustrates how the HBase log analysis system troubleshoots inconsistencies through automated workflows across AWS services.

When an operations team needs to investigate HBase issues, the engineer connects over SSH to the Amazon EMR primary node and runs the error collection script, which gathers logs from HBase master and RegionServer nodes and uploads them to Amazon S3. Next, the engineer connects to the Analytics Amazon Elastic Compute Cloud (Amazon EC2) instance and executes the automated processing script, which downloads logs from Amazon S3, generates semantic vector embeddings, and injects them into Amazon OpenSearch Service for k-NN-based semantic search. The engineer then queries the Kiro CLI AI Assistant using natural language to investigate. Kiro searches Amazon OpenSearch Service for relevant log entries and uses Amazon Bedrock to analyze patterns, correlate errors across components, and provide actionable recommendations. This reduces troubleshooting time from hours to minutes. The system operates within an Amazon Virtual Private Cloud (Amazon VPC) with private subnets for Amazon EMR and Analytics Amazon EC2, AWS Identity and Access Management (AWS IAM) roles for access control, Parameter Store for configuration, and Amazon CloudWatch for monitoring.

Prerequisites

For this walkthrough, you need the following prerequisites:

AWS account setup

  • An AWS account with administrative access for initial deployment
  • AWS Command Line Interface (AWS CLI) configured with administrative credentials

Required AWS IAM permissions

For infrastructure deployment

Your deployment user or role needs the following permissions:

  • Your deployment user or role requires sufficient access to AWS CloudFormation, Amazon S3, AWS IAM, and AWS System Manager.
  • The user or role must have the ability to create AWS CloudFormation stacks.

Infrastructure deployment:

  • For infrastructure deployment, you need AWS CloudFormation stack management permissions.
  • You also require sufficient access to create and manage the following resources:
    • Amazon OpenSearch Service domains
    • Amazon EC2 instances, Amazon VPCs, security groups, and networking components
    • AWS IAM roles and policies
    • AWS Systems Manager Parameter Store entries
    • Amazon CloudWatch Logs groups
    • Amazon S3 bucket for access logs and session logs

Runtime service roles

The AWS CloudFormation stack automatically creates two specialized AWS IAM roles designed with least-privilege access principles.

The first role is the Amazon OpenSearch Service Role, which manages Amazon VPC networking and Amazon CloudWatch logging for the Amazon OpenSearch Service domain.

The second role is the Application Role, which provides minimal Amazon OpenSearch Service and Amazon S3 access specifically for log processing applications and secure log ingestion operations.

Network requirements

  • Amazon VPC with private subnets for secure Amazon OpenSearch Service deployment
  • NAT Gateway for outbound internet access from private subnets
  • Security groups configured for HTTPS-only communication

Running Kiro CLI on Amazon EC2

Kiro platform requirements:

Kiro subscription

  • Active Kiro License: Valid subscription to Kiro platform
  • User Account: Registered Kiro user account with appropriate permissions
  • API Access: Kiro API keys or authentication tokens for CLI access

AWS Identity Center integration

  • AWS IAM Identity Center Setup: AWS IAM Identity Center enabled in your AWS organization
  • Permission Sets: Configured permission sets for Kiro users with appropriate AWS access
  • User Assignment: Users assigned to relevant AWS accounts and permission sets
  • SAML/OIDC Configuration: Identity provider integration if using external identity systems

Additional prerequisites

  • Python 3.7+ and Node.js installed locally
  • Python 3.11+ for AWS Lambda runtime environment (required for OpenSearch MCP server compatibility)
  • Sufficient service quotas for Amazon OpenSearch Service instances and Amazon EC2 resources
  • Recommended access to the analysis instance via AWS Systems Manager Session Manager (recommended). Amazon EMR clusters running HBase workloads
  • EMR_EC2_Default_Role of Amazon EMR EC2 instance profile can execute describe-stacks on AWS CloudFormation stacks in us-east-1
  • Basic familiarity with HBase operations

The deployment follows AWS security best practices with resource-specific permissions, regional restrictions, and encrypted data storage. All AWS IAM policies implement least-privilege access patterns to help secure operation of the log analysis pipeline.

Walkthrough

This walkthrough demonstrates deploying and configuring the AI-powered HBase troubleshooting solution in five key steps:

  1. Deploy AWS infrastructure using AWS CloudFormation
  2. Configure Amazon EMR analysis log collection
  3. Process and index HBase data
  4. Enable AI-powered analysis
  5. Add custom knowledge base (optional)

The complete solution is available in our GitHub repository.

Step 1: Deploy the infrastructure

Deploy the required AWS infrastructure including Amazon OpenSearch Service domain, Amazon EC2 instances, and AWS IAM roles.

To deploy the infrastructure

  1. Deploy AWS CloudFormation stack. Please update your-email@example.com to an email address for security alerts and Advanced Intrusion Detection Environment (AIDE) reports:
# Deploy to development environment
aws cloudformation create-stack \
  --stack-name dev-hbase-log-analysis \
  --template-body file://cloudformation/hbase-log-analysis-simple.yaml \
  --parameters \
    ParameterKey=EnvironmentName,ParameterValue=dev \
    ParameterKey=EC2InstanceType,ParameterValue=m7g.xlarge \
    ParameterKey=SecurityAlertEmail,ParameterValue=your-email@example.com \
  --capabilities CAPABILITY_IAM \
  --region us-east-1
# Wait for deployment to complete (~15-20 minutes)
aws cloudformation wait stack-create-complete \
  --stack-name dev-hbase-log-analysis \
  --region us-east-1
  1. Note the deployment outputs including Amazon OpenSearch Service endpoint and Amazon EC2 instance details in the AWS CloudFormation console.

AWS CloudFormation stack outputs table displaying infrastructure resource identifiers including IAM roles, EC2 instances, security groups, S3 buckets, OpenSearch domain configuration, and VPC details for an HBase log analysis application in the development environment.

The deployment creates:

  • Amazon OpenSearch Service domain with vector search capabilities
  • Amazon EC2 instance for data processing and AI analysis
  • AWS IAM roles with appropriate permissions
  • Security groups and Amazon VPC configuration

Step 2: Connect to Amazon EC2 instance and set up system

Connect to the Amazon EC2 instance using AWS Systems Manager (SSM) and set up the required components.

To connect and set up the system

  1. Run the following commands to get the instance ID from AWS CloudFormation outputs and connect via AWS Systems Manager (SSM):
# Get instance ID
INSTANCE_ID=$(aws cloudformation describe-stacks \
  --stack-name dev-hbase-log-analysis \
  --query 'Stacks[0].Outputs[?OutputKey==`EC2InstanceId`].OutputValue' \
  --output text \
  --region us-east-1)
# Connect via SSM
aws ssm start-session --target $INSTANCE_ID --region us-east-1

Terminal screenshot showing AWS CLI commands to retrieve an EC2 instance ID from CloudFormation stack outputs and establish an AWS Systems Manager Session Manager connection to the instance in the us-east-1 region.

  1. Clone the repository and run automated setup:
# On EC2 instance
sudo su - ec2-user

# Re-install aws cli
sudo dnf remove awscli -y

# For ARM64 (Graviton instances - default)
curl "https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip" -o "awscliv2.zip"

# For x86_64 (if using non-Graviton instances)
# curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"

unzip awscliv2.zip
sudo ./aws/install

# update $PATH in ~/.bashrc
echo 'export PATH=$PATH:/usr/local/bin/' >> ~/.bashrc

# Reload ~/.bashrc
source ~/.bashrc

# Fork and clone the source code repository on GitHub: sample-emr-hbase-inconsistencies-detection-recovery-mcp-kiro
git clone https://github.com/YOUR_USERNAME/sample-emr-hbase-inconsistencies-detection-recovery-mcp-kiro.git hbase-analysis
cd hbase-analysis

# Run automated setup
chmod +x ./scripts/setup/automated-system-setup.sh
./scripts/setup/automated-system-setup.sh \
  --emr-version emr-7.12.0 \
  --stack-name dev-hbase-log-analysis \
  --region us-east-1

The automated setup script installs:

  • System dependencies (awscli, git, unzip)
  • uv package manager and OpenSearch MCP Server
  • Kiro CLI and configuration with AWS IAM Identity Center authentication. The script will automatically add Apache HBase open source repo and Apache HBase open source operational tools to knowledge bases
  • HBase source repositories for your Amazon EMR version
  • Python dependencies and MCP server configuration
  1. Add your own knowledge base to Kiro CLI

To enhance Kiro CLI’s analysis capabilities with Apache HBase open-source repositories, your organization’s HBase runbooks and troubleshooting guides, you can add your own knowledge base repositories. Here are the commands. Please periodically validate and maintain your runbook contents so that they remain accurate and up-to-date, reflecting any changes in your HBase environment, configurations, or operational procedures.:

# Navigate to the HBase repositories directory
cd /opt/hbase-repositories
# Clone your organization's HBase runbook repository
git clone <runbook-repository-url> <your-own-runbook-repo>
# Example:
# git clone https://github.com/your-org/hbase-runbooks.git hbase-runbooks
# git clone https://gitlab.company.com/ops/hbase-troubleshooting.git hbase-troubleshooting
# Add your custom repositories to Kiro CLI knowledge base manually (run these commands inside kiro-cli):
echo "/knowledge add --name \"Your custom HBase knowledge base\" --path /opt/hbase-repositories/<your-own-runbook-repo>" | kiro-cli
# Example:
# echo "/knowledge add --name \"Company HBase runbooks\" --path /opt/hbase-repositories/hbase-runbooks" | kiro-cli
# echo "/knowledge add --name \"HBase troubleshooting guides\" --path /opt/hbase-repositories/hbase-troubleshooting" | kiro-cli

Step 3: Configure Amazon EMR log analysis collection

Set up data collection from your Amazon EMR clusters to gather HBase logs, metadata, and consistency reports using the recommended direct collection method.
To configure Amazon EMR log analysis collection

  1. On your Amazon EMR cluster primary node, run the following commands to download the collection scripts:
# On EMR primary node
sudo su - hadoop

# Fork and clone the source code repository on GitHub: sample-emr-hbase-inconsistencies-detection-recovery-mcp-kiro
git clone https://github.com/YOUR_USERNAME/sample-emr-hbase-inconsistencies-detection-recovery-mcp-kiro.git hbase-analysis
cd hbase-analysis
  1. Run the interactive collection wizard:
# Run collection wizard
python3 scripts/utilities/emr_log_collection/emr_cluster_wizard_v2.py

Input the parameters like the EMR cluster’s jobflow ID, the log analysis Amazon S3 bucket name, and the lookback hours. The default value of the lookback hours is 4 hours.

Terminal screenshot of EMR Cluster Log Collection Wizard V2 showing an interactive command-line interface for configuring HBase diagnostic log collection from Amazon EMR clusters, with step indicators, input fields for job flow ID and S3 bucket, validation confirmations, and lookback hour configuration.

  1. The collection wizard performs these actions:
  • Collects HBase logs from local filesystem. Please reference to prerequisites for the access permission.
  • Runs sudo -u hbase hbase hbck -details (or hbck2 for HBase 2.x)
  • Runs hdfs dfs -ls -R /hbase or aws s3 ls <hbase-root-dir> –recursive
  • Runs hbase shell <<< 'scan "hbase:meta"'
  • Creates properly named files matching analysis system requirements
  • Uploads to Amazon S3 with correct naming conventions

Here’s the data collection summary:

Terminal screenshot showing EMR Cluster Log Collection Wizard V2 completion summary with job flow ID, S3 bucket location, 4-hour lookback period, green success confirmation message, S3 file path, and detailed listing of seven collected diagnostic files including HBCK reports, HBase meta table scans, root directory paths, process information, log collection summary, node logs from all servers, and collection metadata in JSON format.

You can check the uploaded contents through AWS CLI.

aws s3 ls s3://<log-path> --recursive

Here’s a screenshot of the outputs.

Terminal screenshot showing AWS CLI command output listing HBase diagnostic files and logs collected from an EMR cluster and stored in Amazon S3, displaying timestamps, file sizes, and complete S3 object paths including diagnostics directory with HBCK reports, meta table scans, root directory listings, process information, and logs directory with compressed application logs from HBase master and regionserver nodes.

  1. On the Analysis Amazon EC2 instance, download collected files to the Analysis Amazon EC2 instance.
# On analytics EC2 instance
sudo su - ec2-user

# Download logs from S3
mkdir -p /tmp/hbase-log-analysis
cd /tmp/hbase-log-analysis
aws s3 sync s3://<S3-BUCKET-NAME>/emr-logs/<EMR-JOBFLOW-ID>/ .

You can get your jobflow ID from Amazon EMR console:

Amazon EMR clusters management dashboard displaying a table with clusters, showing one cluster entry named "test" in waiting status with green indicator, creation time, elapsed time, normalized instances, along with filter controls, search functionality, pagination showing page 1, and action buttons for View details, Terminate, Clone, and Create cluster operations.

The generated files (hbase-hbase-master-ip-xxx-xxx-xxx-xxx.ec2.internal.log.gz, hbase-hbase-regionserver-ip-xxx-xxx-xxx-xxx.ec2.internal.log.gz, hbck_report.txt, hbase_rootdir_paths.txt, hbase_meta.txt, hbase_processes.txt, log_copy_summary.txt) should be aligned with the automated processing script requirements as following.

Terminal screenshot showing recursive ls -lRt command output listing HBase diagnostic files and logs in /tmp/hbase-log-analysis/ directory, displaying file permissions, ownership by ec2-user, file sizes, timestamps, and complete directory structure including diagnostics directory with text files (manifest.json, HBCK report, meta table scan, process information, root directory paths, log copy summary), logs directory with nested nodes subdirectory containing redacted instance IDs, and applications/hbase subdirectories with compressed RegionServer and Master log files.

Step 4: Process and index data

Process the collected HBase data and create vector embeddings for intelligent search capabilities.To process and index the data, please navigate to the project directory on the Analysis EC2 instance, and run automated-log-processing.sh:

sudo su – ec2-user
cd ~/hbase-analysis
chmod +x ./scripts/processing/automated-log-processing.sh
./scripts/processing/automated-log-processing.sh \
  --job-flow-id j-YOUR-JOB-FLOW-ID \
  --stack-name dev-hbase-log-analysis

The processing scripts extract and parse HBase logs and generate dimensional vector embeddings from HBase log messages using sentence transformer models to enable semantic search beyond keyword matching. The system uses the all-MiniLM-L6-v2 model by default (producing 384-dimensional embeddings), but supports configurable models with different embedding dimensions, automatically adapting the OpenSearch vector index to match the chosen model’s output. The system processes comprehensive HBase operational data including region operations, compaction activities, Write-Ahead Log events, memstore operations, and cluster management information from HMaster and RegionServer logs. Vector embeddings capture error messages, exception stack traces, performance warnings, and multi-line log entries through intelligent text preprocessing. This semantic representation enables advanced troubleshooting where users can query conceptually for “region server performance issues” or “memory pressure” and receive contextually relevant results across different log files and time periods. The vector search capabilities support error correlation by grouping similar exceptions, performance analysis by identifying related bottlenecks, and operational pattern recognition. Each log entry is stored in Amazon OpenSearch Service with original metadata (timestamp, log level, source file, job flow ID) alongside the embedding vector, enabling both structured queries and AI-powered semantic analysis. This approach transforms raw HBase logs into a searchable knowledge base supporting anomaly detection, trend analysis, and predictive insights for proactive cluster management and troubleshooting.

All scripts use AWS IAM authentication automatically. Here’s a screenshot of the data processing outputs.

Terminal screenshot showing successful completion of HBase log analysis processing, green checkmark, confirmation message "Successfully processed 4 file(s)", and next steps section displaying three numbered instructions with redacted URLs for accessing OpenSearch Dashboards, starting Kiro CLI for AI-powered analysis, and querying data using job flow ID, followed by troubleshooting documentation references for HBase inconsistency analysis and log analysis guides.

Step 5: Enable AI-powered analysis

Configure the AI analysis interface to enable natural language queries against your HBase operational data.

To set up AI-powered analysis

  1. Launch Kiro CLI (already configured by automated setup):

kiro-cliCheck mcp and knowledge bases. /mcp list

Terminal screenshot showing MCP list command output displaying one configured MCP server named "opensearch-mcp-server" with command "uvx" in green and white text on dark background with pink shell prompt, featuring a purple "Configured MCP Servers" header with checkbox icon and green horizontal separator line.

/knowledge show

Terminal screenshot showing "/knowledge show" command output displaying Agent kiro_default's knowledge base with repositories: Apache HBase source code, and HBase operational tools

If you cannot see these 2 knowledge bases, you can manually add them through the following commands:

# Note: Large repositories (~500MB) may take a while to index. Check progress with: /knowledge show
/knowledge add --name "HBase operational tools" --path /opt/hbase-repositories/hbase-operator-tools"
/knowledge add --name "Apache HBase source code" --path /opt/hbase-repositories/hbase"
  1. Use natural language queries to analyze your HBase data. The AI analysis uses both the OpenSearch MCP Server for querying indexed data and the Filesystem knowledge bases for accessing HBase source code. You can add your custom runbooks for Kiro’s reference as well.

For HBase inconsistency analysis:

# HBase Inconsistency Detection and Remediation Guidelines
## Search Strategy
- Use fuzzy search for case variations/typos, term query for exact region IDs, match_phrase for paths, query_string for logs
- Always use .keyword subfields for exact text matching
- Cross-reference filesystem (wildcard: {"wildcard": {"path": "*<region_id>*"}}) with hbase:meta (match: {"match": {"row_key": "<region_id>"}})
- The total region count in hbase meta must match the total matched document count of wildcard path like "*/.regioninfo" in hbase rootdir path.  
- All terms of region_name.keyword for a region encoded name must match a wildcard path like "*/.regioninfo"
- All terms of table_name.keyword for a table must match a wildcard path like "*/.tabledesc*"
- 1595e783b53d99cd5eef43b6debb2682 is the master store region that will locate in <hbase-root-dir>/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/
- May cross check with the raw logs in /tmp/hbase-log-analysis/
## Issue Types
Orphan regions, missing .regioninfo, missing/extra regions in hbase:meta, rowkey holes, stuck RIT, master initialization failures
## Analysis Steps
### 1. Cross-Reference Meta vs Filesystem
- Filesystem regions NOT in hbase:meta → ORPHAN REGION
- Meta regions NOT in filesystem → MISSING REGION
### 2. Validate Region Chain Continuity
- Sort regions by STARTKEY, verify region[i].ENDKEY == region[i+1].STARTKEY
- First STARTKEY must be '', last ENDKEY must be ''
- Gaps → ROWKEY HOLE
### 3. Check Region States
- state != 'OPEN' → Check RIT
- Missing server assignment → UNASSIGNED
- Multiple servers → SPLIT BRAIN
- "deployed_servers" field must have only one region server address like "ip-xxx-xxx-xxx-xxx.ec2.internal,16020,1770781485397" . The value should not be null or have multiple values. 
### 4. Validate .regioninfo Files
- Missing .regioninfo in region directory → CORRUPT REGION
### 5. Cross-Check HBCK Report
- Compare orphan counts, RIT regions, filesystem vs meta region counts
### 6. Analyze Logs
- Search: "updating hbase:meta row=<region>", "STUCK", "RIT", "Failed" + "<region>", "Split"/"Merge" + "<region>"
## Remediation
- Reference knowledge bases: "Apache HBase source code", "HBase operational tools"
- Use hbck2: /usr/lib/hbase-operator-tools/hbase-hbck2.jar
- Prefix commands with sudo -u hbase
- Use aws s3 for S3-based rootdir
- Wait 300s after creating holes before hbck fixMeta (catalog janitor cycle)
- Use unassign instead of deprecated close_region
- If the region does not have .regioninfo in  <hbase-root-dir>/data/<namespace>/<table-name>/<region-encoded-name>/ but hbase:meta has that region's information and that region has been deployed on a healthy region server, you can use hbase shell to unassign and assign the region to re-generate .regioninfo
- Always add "sudo -u hbase hbase" before "hbase shell" and "hbase hbck" commands
## Job flow
Target: <your-job-flow-id>
Inconsistency to detect: All kinds of inconsistencies

You can trust or input “y” or “t” to grant Kiro to search through mcp and knowledge bases.

Terminal screenshot showing MCP tool execution authorization prompt.

You may get some outputs like this: Kiro checked for any HBase issue.

Terminal screenshot showing HBase database query results for user table entries with server configuration details and an HBase Inconsistency Detection Framework analysis report

Kiro summarized the examination results.

Terminal screenshot displaying HBase inconsistency detection analysis results for job flow, showing one critical missing .regioninfo file issue for HBase region in a HBase table, with cluster health metrics, risk assessment, recommended fixes, and generated diagnostic reports.

Kiro provided mitigation commands after Kiro summarized the issue.

Terminal screenshot displaying a structured HBase quick fix guide with three sections: recommended fix procedure with sequential steps for region reassignment, verification steps using AWS S3 and HBCK2 tools, and impact assessment showing 30-60 second downtime, zero data loss risk, and isolated region scope for fixing missing .regioninfo file in HBase region.

Cleaning up

To avoid incurring future charges, delete the resources created during this walkthrough.

To clean up the resources

  1. Delete the AWS CloudFormation stack from AWS Management Console:

AWS CloudFormation Stacks management console displaying a list view with stacks, showing the "dev-hbase-log-analysis" stack with CREATE_COMPLETE status, along with action buttons for Delete, Update stack, Stack actions, and Create stack.

  1. Clean up Amazon EMR cluster resources (if created only for this walkthrough):
AWS EMR Clusters management console showing page clusters with a cluster in "Waiting" status
  1. Verify resource cleanup in the AWS Console to verify that all resources are deleted and review your AWS bill to confirm no unexpected charges.

Important considerations:

  • Amazon OpenSearch Service domains take several minutes to fully delete
  • Amazon S3 buckets with versioning retain object versions
  • Use smaller instance types for development to optimize costs
  • Monitor usage with AWS Cost Explorer

Conclusion

In this post, we showed you how to build an AI-powered HBase troubleshooting solution that transforms manual log analysis into an automated workflow. By combining Amazon OpenSearch Service vector search with Amazon Bedrock-powered analysis through the Kiro CLI, operations teams can resolve complex HBase inconsistencies faster and gain deeper operational insights. The solution demonstrates how AI augments human expertise to improve operational efficiency, reducing HBase inconsistency resolution from hours to minutes and root cause identification from days to hours. Ready to transform your HBase operations? Get started with the GitHub repository and explore the Amazon OpenSearch Service documentation for additional guidance on vector search capabilities.

Acknowledgments

The author would like to thank Xi Yang, Anirudh Chawla, and Sasidhar Puthambakkam for their contributions to developing the technical solution. Xi Yang is a Senior Hadoop System Engineer and Amazon EMR subject matter expert at AWS. Anirudh Chawla is an AWS Analytics Specialist Solution Architect who helps organizations empower businesses to harness their data effectively through AWS’s analytics platform. Sasidhar Puthambakkam is a Senior Hadoop Systems Engineer and Amazon EMR Subject Matter Expert who provides architectural guidance for complex BigData workloads.


About the authors

Yu-Ting Su

Yu-ting Su, Sr. Hadoop System Engineer, AWS Support Engineering. Yu-Ting is a Sr. Hadoop Systems Engineer at Amazon Web Services (AWS). Her expertise is in Amazon EMR and Amazon OpenSearch Service. She’s passionate about distributing computation and helping people to bring their ideas to life.