AWS Partner Network (APN) Blog

How NeuBird’s Hawkeye Transforms IT Operations with Amazon Bedrock

By François Martel, Field CTO – NeuBird
By Vaishali Taneja, Partner Solutions Architect – AWS

Neubird

In the high-stakes world of cloud operations, every minute of downtime costs both money and reputation. Despite investments in sophisticated monitoring tools, engineering teams still spend countless hours manually investigating alerts, correlating data across different services, and pinpointing root causes. This reactive troubleshooting cycle has long been accepted as an inevitable part of maintaining complex cloud environments—until now.

Today’s AWS environments generate millions of telemetry data points from Amazon CloudWatch metrics, logs, and alarms across thousands of resources. While this provides unprecedented visibility, it’s created what we call the “observability paradox”: more visibility often results in less clarity. As one AWS enterprise customer recently told us: “We’ve instrumented everything, but we’re drowning in alerts. Our engineers spend most of their day investigating issues rather than building new capabilities.

The numbers paint a stark picture:

  • 70% of alerts require manual correlation across multiple AWS services
  • Engineers typically spend 3-4 hours investigating complex incidents
  • Engineering teams monitor only a fraction of available Amazon CloudWatch metrics due to alert fatigue

This isn’t a technology problem—it’s a human problem. The scale of cloud operations has exceeded human cognitive capacity, creating a fundamental operational challenge.

Introducing Hawkeye: AI-Powered Site Reliability Engineer (SRE) for AWS Environments

To address these challenges, the AWS partner NeuBird has developed Hawkeye, an AI-powered solution. Hawkeye represents a fundamental shift in how organizations approach cloud operations—moving from reactive monitoring to proactive, AI-driven investigation. Built on Amazon Bedrock, Hawkeye functions as an AI-powered SRE teammate that rapidly analyzes, correlates, and diagnoses issues across your AWS environment.

Rather than simply forwarding alerts, Hawkeye investigates them. It examines Amazon CloudWatch metrics, logs, and configuration changes, understands the relationships between AWS services and delivers detailed root cause analysis with actionable remediation steps—all within minutes.

How Hawkeye Works

Hawkeye’s intelligent investigative capabilities are powered by an innovative architecture that leverages Amazon Bedrock at its core:

  1. Smart Runbook Selection: When an incident occurs, Hawkeye first identifies the appropriate analysis strategy by searching its private vector database for similar historical patterns.
  2. Investigative Planning: Using Amazon Bedrock’s foundation models, Hawkeye creates a detailed chain of thought for the investigation, considering incident type, available telemetry, and your specific AWS architecture.
  3. Secure Telemetry Processing: Instead of sending sensitive data to Large Language Models (LLMs), Hawkeye creates a specialized retrieval program that handles data in isolated memory space while using Amazon Bedrock only for logic and reasoning.
  4. Real-time Data Analysis: The system executes its plan by securely accessing Amazon CloudWatch metrics, logs, and other AWS telemetry sources through read-only connections.
  5. Continuous Refinement: As the investigation progresses, Hawkeye iteratively refines its approach based on new information without exposing raw telemetry data to the LLMs.
  6. Actionable Results: Finally, Hawkeye produces a comprehensive analysis with clear evidence for its findings and specific recommended actions.

What makes this architecture particularly powerful is how it maintains the high security standards while leveraging generative AI capabilities. Customer data never leaves their AWS environment, with all processing happening locally inside customer’s VPC and leveraging the customer’s Amazon Bedrock provided LLMs, Amazon RDS and Amazon DocumentDB databases to store configuration and session analysis results, as shown in figure 1 below.

For organizations concerned with implementing AI tools in sensitive operational environments, it’s worth highlighting Hawkeye’s security-first architecture:

  • Zero Data Storage: Hawkeye processes telemetry data in real-time and never stores it persistently.
  • Read-Only Access: All connections to AWS services use strictly read-only permissions enforced through IAM.
  • Customer-Controlled Access: Complete control through AWS IAM and custom trust policies with access revocable instantly.
  • Metadata-Only Approach: Only abstracted metadata—never raw logs or sensitive information—are shared with Amazon Bedrock for reasoning.

This approach allows organizations to leverage the power of generative AI while maintaining the high security standards for their AWS environments.

Hawkeye Architecture Diagram

Figure 1 – Hawkeye Architecture Diagram

Demonstrable Reduction in Incident Resolution Time

Let’s look at a real-world example we encountered during our recent AWS-NeuBird webinar. We demonstrated Hawkeye investigating an issue with an e-commerce application running on Amazon Elastic Kubernetes Service (EKS).

The scenario involved a customer-facing error (HTTP 500) in the online store. Traditional troubleshooting requires an engineer to:

  1. Review CloudWatch logs for error messages
  2. Check Prometheus metrics for performance anomalies
  3. Examine recent deployments for changes
  4. Manually trace the request flow through multiple microservices
  5. Correlate configuration details across services

This process typically takes 2-3 hours of dedicated debugging time, during which customers continue to experience errors.

With Hawkeye, the investigation unfolded automatically with the results shown in figure 2 below:

Hawkeye investigation results

Figure 2 – Hawkeye investigation results

  1. Hawkeye identified the cart service was failing to connect to its Amazon DynamoDB backend
  2. It analyzed the cart service configuration in the EKS cluster
  3. It discovered a recent deployment had changed a critical parameter (CARTS_DYNAMODB_CREATETABLE) from “true” to “false”
  4. It correlated this with AWS CloudTrail events showing a Jenkins deployment 30 minutes prior
  5. It delivered a complete analysis with the exact configuration change needed to restore service

Total time: 4 minutes. The entire investigation happened while the engineering team was still receiving the initial alert notifications.

The Amazon Bedrock Advantage

Hawkeye’s capabilities are directly enabled by Amazon Bedrock—AWS’s fully managed service for foundation models. Below are the key features that make Amazon Bedrock the ideal foundation for this type of AI agent:

Model Quality and Security

Amazon Bedrock provides access to leading foundation models with the enterprise-grade security controls required for sensitive operational data. The service’s commitment to not using customer data for model training aligns perfectly with NeuBird’s security-first approach.

Knowledge Integration

By combining the language model reasoning capabilities offered by Amazon Bedrock with customer-specific operational knowledge, Hawkeye creates an AI agent that understands both general cloud concepts and your specific AWS environment.

API-First Design

Amazon Bedrock’s API-first design integrates Hawkeye with existing AWS services and operational tooling. This allows organizations to enhance their current investments rather than replacing them.

Deployment Flexibility

With support for private VPC endpoints and AWS PrivateLink, Amazon Bedrock ensures that organizations implement sophisticated AI capabilities while maintaining strict network isolation and data security requirements.

Customer Spotlight: Model Rocket’s AWS Ops Breakthrough using Hawkeye

Here’s how one NeuBird customer is transforming AWS operations with Hawkeye. Model Rocket, a custom technology solutions provider, runs a complex cloud-native environment spanning AWS Lambda, Amazon RDS, Amazon ElastiCache, Amazon SQS, Amazon Elastic Container Service and Amazon EKS. With a lean engineering team, they needed a way to manage growing operational complexity without pulling focus from core development work.

Jon Theis quote

During load testing, several critical APIs began degrading. Instead of spending hours manually tracing the issue, the team turned to Hawkeye. Within minutes, it diagnosed the root cause—excessive database connections from Lambda functions—and recommended targeted configuration changes that immediately resolved the bottleneck.

With Hawkeye continuously investigating issues across their AWS stack, Model Rocket reduced MTTR (Mean Time To Recover) by over 90% and reclaimed valuable engineering time to focus on innovation.

Beyond Troubleshooting: Transforming Operations

While expeditious incident resolution is the primary immediate benefit, customers deploying Hawkeye are discovering broader operational transformations:

Breaking Free from Dashboard Limitations

Traditional dashboarding approaches surface only a small portion of available telemetry data. Hawkeye dynamically analyzes the entire corpus of Amazon CloudWatch metrics, logs, and events relevant to an incident—not limited to those pre-selected for dashboards.

Knowledge Preservation and Distribution

Experienced engineers embed their AWS knowledge into Hawkeye through custom investigation playbooks and runbooks. This codifies tactical knowledge and ensures consistent troubleshooting approaches across the organization.

Engineering Focus Shift

With routine investigations handled automatically, engineering teams can focus on core improvements: refining system architecture, optimizing performance, and driving innovation. Several customers report 40%+ time savings for their Cloud Operations teams.

Getting Started: Your Path to AI-Enhanced Operations

Implementing Hawkeye in your AWS environment follows a straightforward process designed for quick time-to-value:

  1. Establish Secure Connections: Deploy Hawkeye with read-only access to your CloudWatch and CloudTrail logs.
  2. Integration with Incident Management: Connect Hawkeye to your existing alerting systems (ServiceNow, PagerDuty, etc.)
  3. Knowledge Import: Add your operational runbooks and documentation to enhance Hawkeye’s understanding of your environment.
  4. Customization: Configure investigation priorities and playbooks to match your specific operational needs.

Organizations see value within days of implementation, with investigation times dropping immediately and continuous improvement as Hawkeye learns your environment.

Conclusion: A New Era of Cloud Operations

The combination of NeuBird’s Hawkeye and Amazon Bedrock represents a step change in how organizations operate their AWS environments. By transforming raw telemetry data into intelligent, automated responses, this solution addresses both the technical challenges of cloud scale and the human challenges of alert fatigue and engineering burnout.

As cloud environments continue to grow in complexity, the partnership between human engineers and AI agents will become increasingly essential. Those who embrace this partnership earliest will gain competitive advantages through improved incident resolution, reliable systems, and optimized resource utilization.

Start with Hawkeye through these channels to enhance your AWS operations:

  1. Try Hawkeye FREE on AWS Marketplace
  2. AWS integration setup guide
  3. Hawkeye Demo

.

.


NeuBird – AWS Partner Spotlight

NeuBird is an AWS Advanced Technology Partner and AWS Competency Partner that offers Hawkeye, a GenAI-powered SRE assistant designed to streamline incident response and troubleshooting in AWS environments. It addresses the challenges of modern IT operations by automatically correlating telemetry data across AWS services, providing rapid root cause analysis and incident resolution through Amazon Bedrock’s foundation models.

Contact NeuBird | Partner Overview | AWS Marketplace