AWS Cloud Operations Blog
Tracing ETL Workloads using AWS X-Ray and AWS Distro for OpenTelemetry
Introduction Data pipelines are essential for modern data-driven companies to gain critical business insights. However, data pipelines commonly fail when new files or datasets from data sources do not conform to the expected schema, leading to downstream job failures, workflow breakdowns, and delayed insights. Additionally, fluctuating data volumes, from a few gigabytes to multiple terabytes, […]
Introducing Just-in-time node access using AWS Systems Manager
Today, we’re excited to announce the general availability of just-in-time node access, a new capability in AWS Systems Manager. Just-in-time node access enables dynamic, time-bound access to Amazon Elastic Compute Cloud (Amazon EC2), on-premises, and multicloud nodes managed by AWS Systems Manager. It uses a policy-based approval process, allowing you to remove long-standing access while […]
Identifying resources driving Amazon CloudWatch GetMetricData charges using AWS CloudTrail
Organizations frequently use third-party monitoring tools to retrieve CloudWatch metric data for their dashboards and alerting systems. This practice often leads to significant GetMetricData API usage and results in high CloudWatch costs. A common challenge for cost optimization teams is identifying which specific resources or applications are driving these increased expenses, especially when they’re not […]
Application Performance Monitoring of AWS Lambda apps with Amazon CloudWatch Application Signals
Amazon CloudWatch Application Signals extends its powerful monitoring and diagnostic capabilities to AWS Lambda. This integration provides Lambda users with streamlined, no-code application performance monitoring, enabling easy access to key metrics such as invocation duration, error rates, cold starts, and throttling events. By bringing together telemetry data across Lambda functions with metrics, traces, and logs, […]
Scaling AWS Fault Injection Service Across Your Organization And Regions
In the first two parts of our series, we explored how to scale AWS Fault Injection Service (FIS) across AWS Organizations. Part one focused on implementing FIS in a single AWS account environment, introducing the concept of standardized IAM roles and Service Control Policies (SCPs) as guardrails for controlled chaos engineering experiments, particularly in centralized […]
Scaling AWS Fault Injection Service Across Your Organization And Accounts
Welcome to part two of our series where we focus on scaling AWS Fault Injection Service (FIS) within your organization. In part one, we learned how customers can enable individual accounts within organizations by introducing a Service Control Policies (SCPs) rule to run network experiments when operating with a centralized networking infrastructure. In this blog, […]
Scaling AWS Fault Injection Service Across Your Organization Using Account Controls
AWS Fault Injection Service (FIS) empowers you to adopt chaos engineering at scale within your AWS environment. Chaos engineering injects real-world, controlled failures into a system to verify resilience and reliability, ultimately improving the customer experience. This proactive, resilience-focused approach increases your confidence in a system’s ability to respond to adverse conditions in production. You […]
New AWS Fault Injection Service recovery action for zonal autoshift
We’re excited to announce that AWS Fault Injection Service (FIS) now supports a recovery action for Amazon Application Recovery Controller (ARC) zonal autoshift. With this integration, you can now perform more comprehensive testing by creating disruptive events and trigger a zonal autoshift as part of the same experiment. That way, you can observe how your application […]
Analyze Azure Audit Logs with CloudTrail Lake
Introduction In the ever-evolving world of cloud computing, maintaining robust security and compliance is paramount. As usage of multicloud environments grows, the need for comprehensive monitoring and logging solutions becomes more critical. Enter the synergy of Azure Audit Logs and AWS CloudTrail Lake—a powerful combination that provides comprehensive visibility across your cloud environments. Azure Audit […]
Unlock the Power of AWS Config: Centralized Compliance and Resource Management
In this post, we will highlight how AWS Config can be used to help organizations implement capabilities related to management and governance, security, and more. Have you ever wondered how to maintain a centralized inventory of resources across your AWS accounts? Do you need to quickly identify the unencrypted resources in your AWS environment? Do you […]