AWS Big Data Blog
Amazon OpenSearch Serverless now supports automated time-based data deletion
We recently announced a new enhancement to OpenSearch Serverless for managing data retention of Time Series collections and Indexes. OpenSearch Serverless for Amazon OpenSearch Service makes it straightforward to run search and analytics workloads without having to think about infrastructure management. With the new automated time-based data deletion feature, you can specify how long they want […]
Run Kinesis Agent on Amazon ECS
February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. Kinesis Agent is a standalone Java software application that offers a straightforward way to collect and send data to Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose. The agent continuously monitors […]
How smava makes loans transparent and affordable using Amazon Redshift Serverless
This is a guest post co-written by Alex Naumov, Principal Data Architect at smava. smava GmbH is one of the leading financial services companies in Germany, making personal loans transparent, fair, and affordable for consumers. Based on digital processes, smava compares loan offers from more than 20 banks. In this way, borrowers can choose the […]
Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector
As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. Data is stored from online systems such as the databases, CRMs, and marketing systems to data stores such as data lakes on Amazon Simple Storage Service (Amazon S3), data warehouses […]
New Amazon CloudWatch log class to cost-effectively scale your AWS Glue workloads
AWS Glue is a serverless data integration service that makes it easier to discover, prepare, and combine data for analytics, machine learning (ML), and application development. You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. One of […]
Sun King uses Amazon Redshift data sharing to accelerate data analytics and improve user experience
This post is co-authored with Guillaume Saint-Martin at Sun King. Sun King is the world’s leading off-grid solar energy company, and is on a mission to power access to brighter lives through off-grid solar. Sun King designs, distributes, installs, and finances solar home energy products for people currently living without reliable energy access. It serves […]
AWS re:Invent 2023 Amazon Redshift Sessions Recap
Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Customers use Amazon Redshift as a key component of their data architecture to drive use cases from typical dashboarding to self-service analytics, real-time analytics, machine learning […]
Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature
AWS Step Functions is a fully managed visual workflow service that enables you to build complex data processing pipelines involving a diverse set of extract, transform, and load (ETL) technologies such as AWS Glue, Amazon EMR, and Amazon Redshift. You can visually build the workflow by wiring individual data pipeline tasks and configuring payloads, retries, […]
Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue
With the exponential growth of data, companies are handling huge volumes and a wide variety of data including personally identifiable information (PII). PII is a legal term pertaining to information that can identify, contact, or locate a single person. Identifying and protecting sensitive data at scale has become increasingly complex, expensive, and time-consuming. Organizations have […]
Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK
Data loses value over time. We hear from our customers that they’d like to analyze the business transactions in real time. Traditionally, customers used batch-based approaches for data movement from operational systems to analytical systems. Batch load can run once or several times a day. A batch-based approach can introduce latency in data movement and […]