AWS Big Data Blog
Tag: Apache Flink
Optimize checkpointing in your Amazon Managed Service for Apache Flink applications with buffer debloating and unaligned checkpoints – Part 1
This post is the first of a two-part series regarding checkpointing mechanisms and in-flight data buffering. In this first part, we explain some of the fundamental Apache Flink internals and cover the buffer debloating feature. In the second part, we focus on unaligned checkpoints. Apache Flink is an open-source distributed engine for stateful processing over […]
Build a data lake with Apache Flink on Amazon EMR
To build a data-driven business, it is important to democratize enterprise data assets in a data catalog. With a unified data catalog, you can quickly search datasets and figure out data schema, data format, and location. The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep […]
Build a Real-time Stream Processing Pipeline with Apache Flink on AWS
NOTE: As of November 2018, you can run Apache Flink programs with Amazon Kinesis Analytics for Java Applications in a fully managed environment. You can find further details in a new blog post on the AWS Big Data Blog and in this Github repository. ————————– September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. […]


