AWS Cloud Operations Blog

Automate registering Windows managed nodes with AWS Systems Manager

Automate registering Windows managed nodes with AWS Systems Manager

Managing hybrid infrastructure across AWS and on-premises environments presents a layer of operational complexity for managing nodes. Some teams use different tools to manage these systems based on the platform they are running on, while others use licensed Remote Monitoring and Management (RMM) software. Teams can use AWS Systems Manager hybrid activations to manage on-premise […]

Build Golden Images with CIS Linux Build Kit within Amazon EC2 Image Builder

Build Golden Images with CIS Linux Build Kit within Amazon EC2 Image Builder

The build and rollout of hardened and certified operating systems (OS) is an imperative of any Cloud Operations (CloudOps) or Cloud Center of Excellence (CCoE) team within an organization. The guideline and security controls to certify the images come from the respective teams within your organization who, in turn, refer to the popular industry wide […]

Announcing AWS CloudTrail network activity events for VPC Endpoints

Today, we are excited to announce AWS CloudTrail network activity for VPC endpoints, a new event type that captures actions transmitted through a Virtual Private Cloud Endpoint. In this preview, this new event type captures network activity events from VPC endpoints for Amazon Elastic Compute Cloud (EC2), AWS Key Management Service (KMS), Amazon S3, and […]

Manage Custom AWS Config Rules with Remediation Using AWS Config Conformance Pack

Introduction Organizations face unique compliance requirements across their AWS resources and accounts. While AWS Config provides managed rules, many organizations need custom rules and automated remediation capabilities that can scale across their AWS Organization. This blog post demonstrates how to use AWS Config custom conformance pack to deploy and manage custom rules with remediation actions […]

Tracing ETL Workloads using AWS X-ray and AWS Distro for OpenTelemetry

Tracing ETL Workloads using AWS X-Ray and AWS Distro for OpenTelemetry

Introduction Data pipelines are essential for modern data-driven companies to gain critical business insights. However, data pipelines commonly fail when new files or datasets from data sources do not conform to the expected schema, leading to downstream job failures, workflow breakdowns, and delayed insights. Additionally, fluctuating data volumes, from a few gigabytes to multiple terabytes, […]

Introducing Just-in-time node access using AWS Systems Manager

Introducing Just-in-time node access using AWS Systems Manager

Today, we’re excited to announce the general availability of just-in-time node access, a new capability in AWS Systems Manager. Just-in-time node access enables dynamic, time-bound access to Amazon Elastic Compute Cloud (Amazon EC2), on-premises, and multicloud nodes managed by AWS Systems Manager. It uses a policy-based approval process, allowing you to remove long-standing access while […]

Identify resources driving Amazon CloudWatch GetMetricData charges using AWS CloudTrail

Identifying resources driving Amazon CloudWatch GetMetricData charges using AWS CloudTrail

Organizations frequently use third-party monitoring tools to retrieve CloudWatch metric data for their dashboards and alerting systems. This practice often leads to significant GetMetricData API usage and results in high CloudWatch costs. A common challenge for cost optimization teams is identifying which specific resources or applications are driving these increased expenses, especially when they’re not […]

Application Performance Monitoring of AWS Lambda apps with Amazon CloudWatch Application Signals

Application Performance Monitoring of AWS Lambda apps with Amazon CloudWatch Application Signals

Amazon CloudWatch Application Signals extends its powerful monitoring and diagnostic capabilities to AWS Lambda. This integration provides Lambda users with streamlined, no-code application performance monitoring, enabling easy access to key metrics such as invocation duration, error rates, cold starts, and throttling events. By bringing together telemetry data across Lambda functions with metrics, traces, and logs, […]

Scaling AWS Fault Injection Service Across Your Organization And Regions

In the first two parts of our series, we explored how to scale AWS Fault Injection Service (FIS) across AWS Organizations. Part one focused on implementing FIS in a single AWS account environment, introducing the concept of standardized IAM roles and Service Control Policies (SCPs) as guardrails for controlled chaos engineering experiments, particularly in centralized […]

Scaling AWS Fault Injection Service Across Your Organization And Accounts

Welcome to part two of our series where we focus on scaling AWS Fault Injection Service (FIS) within your organization. In part one, we learned how customers can enable individual accounts within organizations by introducing a Service Control Policies (SCPs) rule to run network experiments when operating with a centralized networking infrastructure. In this blog, […]