AWS Big Data Blog

How CyberSolutions built a scalable data pipeline using Amazon EMR Serverless and the AWS Data Lab

This post is co-written by Constantin Scoarță and Horațiu Măiereanu from CyberSolutions Tech. CyberSolutions is one of the leading ecommerce enablers in Germany. We design, implement, maintain, and optimize award-winning ecommerce platforms end to end. Our solutions are based on best-in-class software like SAP Hybris and Adobe Experience Manager, and complemented by unique services that […]

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

Amazon EMR on EKS provides a deployment option for Amazon EMR that allows organizations to run open-source big data frameworks on Amazon Elastic Kubernetes Service (Amazon EKS). With EMR on EKS, Spark applications run on the Amazon EMR runtime for Apache Spark. This performance-optimized runtime offered by Amazon EMR makes your Spark jobs run fast […]

SANS Institute uses Amazon QuickSight to drive transformational security awareness maturity within organizations

This is a guest post by Carl Marrelli from SANS Institute. The SANS Institute is a world leader in cybersecurity training and certification. For over 30 years, SANS has worked with leading organizations to help ensure security across their organization, as well as with individual IT professionals who want to build and grow their security […]

Reference guide to build inventory management and forecasting solutions on AWS

Inventory management is a critical function for any business that deals with physical products. The primary challenge businesses face with inventory management is balancing the cost of holding inventory with the need to ensure that products are available when customers demand them. The consequences of poor inventory management can be severe. Overstocking can lead to […]

It’s the Amazon QuickSight Community’s 1st birthday!

Happy birthday Amazon QuickSight Community! We are celebrating 1 year since the launch of our new Community. The Amazon QuickSight Community website is a one-stop-shop where business intelligence (BI) authors and developers from across the globe can ask and answer questions, stay up to date, network, and learn together about Amazon QuickSight. In this post, […]

Push Amazon EMR step logs from Amazon EC2 instances to Amazon CloudWatch logs

Amazon EMR is a big data service offered by AWS to run Apache Spark and other open-source applications on AWS to build scalable data pipelines in a cost-effective manner. Monitoring the logs generated from the jobs deployed on EMR clusters is essential to help detect critical issues in real time and identify root causes quickly. […]

Connect to Amazon MSK Serverless from your on-premises network

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed, highly available, and secure Apache Kafka service. Amazon MSK reduces the work needed to set up, scale, and manage Apache Kafka in production. With Amazon MSK, you can create a cluster in minutes and start sending data. With Amazon MSK Serverless, you can […]

How Morningstar used tag-based access controls in AWS Lake Formation to manage permissions for an Amazon Redshift data warehouse

This post was co-written by Ashish Prabhu, Stephen Johnston, and Colin Ingarfield at Morningstar and Don Drake, at AWS. With “Empowering Investor Success” as the core motto, Morningstar aims at providing our investors and advisors with the tools and information they need to make informed investment decisions. In this post, Morningstar’s Data Lake Team Leads […]

Patterns for updating Amazon OpenSearch Service index settings and mappings

Amazon OpenSearch Service is used for a broad set of use cases like real-time application monitoring, log analytics, and website search at scale. As your domain ages and you add additional consumers, you need to reevaluate and change the domain’s configuration to handle additional storage and compute needs. You want to minimize downtime and performance […]

Implement column-level encryption to protect sensitive data in Amazon Redshift with AWS Glue and AWS Lambda user-defined functions

Amazon Redshift is a massively parallel processing (MPP), fully managed petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using existing business intelligence tools. When businesses are modernizing their data warehousing solutions to Amazon Redshift, implementing additional data protection mechanisms for sensitive data, such as personally identifiable information (PII) or […]