Overview
The first 30–90 minutes of most production incidents are spent on manual triage—collecting logs, reviewing deployments, mapping dependencies, and identifying what changed before remediation can begin.
HCLTech's logIQ delivers an AI-powered, always-on incident investigation and remediation capability on AWS. When an alert is triggered from platforms such as Datadog, Splunk, PagerDuty, Dynatrace, New Relic, or ServiceNow, logIQ immediately correlates logs, live resource topology, and deployment history to identify the root cause. Since most production incidents originate from recent changes, surfacing those changes significantly accelerates resolution.
Based on the findings, logIQ automatically:
Executes validated remediation runbooks for known issues with built-in verification and rollback.
Creates Infrastructure-as-Code pull requests for persistent infrastructure changes, ensuring governance through existing deployment workflows.
Generates a structured investigation report and recommended remediation plan for engineers when incidents require manual intervention.
The result is up to 50–60% reduction in Mean Time to Resolution (MTTR), lower on-call effort, and continuously expanding automation as new incident patterns are converted into reusable runbooks.
Governance is built in through separate IAM roles for investigation and remediation, approval gates, scoped permissions, and complete audit trails.
Key Benefits and Features
-
Stop paying the triage tax on every incident Investigation begins within seconds of an alert, automatically correlating logs, deployment history, and live resource state. The 45 to 90 minutes of manual triage typical of a P2 incident is replaced by a structured root cause finding your team can act on immediately.
-
Resolve faster, with less variability For known failure classes, remediation executes through pre-authored, validated runbooks, removing the engineer-to-engineer variability in how fixes get applied. Persistent infrastructure changes go through pull requests, not direct live mutations, keeping changes inside your existing review controls.
-
Structural Separation and Audit Trail Investigation and remediation run under separate roles with distinct permissions. Approval gates are built into the execution path. Every action is auditable. The blast radius of any individual failure is bounded by design.
-
Compounding Coverage Novel incidents that require engineer handling become candidate runbooks, so the set of auto-remediable failure classes grows with usage instead of staying fixed.
**Our Proven Methodology ** Our engagement follows a clear five‑stage process to accelerate time‑to‑value:
-
Discovery: Understand your current incident response process, observability stack, CI/CD pipelines, ticketing integrations, and target outcomes.
-
POC: Deploy and demonstrate an end-to-end investigation to remediation flow against representative failure scenarios from your environment.
-
Integration: Connect your production observability tools, CI/CD pipelines, and ticketing systems; author the initial set of remediation runbooks and IaC workflows.
-
Testing: Validate investigation accuracy, runbook safety, approval gate behaviour, and rollback paths under controlled fault injection.
-
Pilot & Rollout: Go live on a defined set of failure classes, then expand coverage and roll out across business units.
Solution Scope, Prerequisites, and Responsibilities
Scope of Offering:
This professional service offering includes assessment, AWS DevOps Agent Space and integration configuration (observability, CI/CD, ticketing), remediation agent design, SSM Automation runbook and IaC pull request workflow authoring, and deployment of a functional proof of concept. The engagement delivers a tailored architecture and handover for enterprise rollout.
Prerequisites ● Customer provides access to relevant observability tools, CI/CD pipelines (GitHub or GitLab), and ticketing/incident systems for AWS DevOps Agent integration.
● An AWS account and foundational landing zone must be in place.
● Optional: Customer may bring existing runbooks or remediation playbooks to integrate into the pipeline.
Shared Responsibility Model ● HCLTech: Delivers the professional services outlined above, including design, POC deployment, integration assistance, and optional managed services add-ons.
● Customer: Responsible for ongoing operations after handover, maintaining integration configurations, approving automation scopes, expanding runbook coverage, and securing IAM roles.
● AWS: Provides and maintains the underlying cloud infrastructure according to the AWS Shared Responsibility Model.
Highlights
- Automated Incident Investigation-Replace manual triage with an always-on investigation capability that correlates logs, metrics, topology, and deployment history the moment an alert fires.
- Structural Governance, Compounding Coverage-Investigation and execution run under separate IAM roles with built-in approval gates and audit trails. Novel incidents become candidate runbooks, so automated coverage grows with every incident handled.
- Pre-authored, validated remediation sequences execute automatically for covered incidents, cutting MTTR by 50 to 60% and removing variability from how fixes get applied.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Pricing
Custom pricing options
How can we make this page better?
Legal
Content disclaimer
Support
Vendor support
This offering includes support during the engagement period. Post-deployment, customers can engage HCLTech for ongoing managed services and support through a separate agreement. Standard AWS Support models are available for any issues related to the underlying AWS services. For more information, please reach out at awsecosystembu@hcltech.com