Build end-to-end streaming pipelines with Amazon Managed Service for Apache Flink Blueprints with a single click. Learn more.
General
Interactive analysis helps you to stream data exploration in real time. With ad hoc queries or programs, you can inspect streams from Amazon MSK or Amazon Kinesis Data Streams and visualize what data looks like within those streams. For example, you can view how a real-time metric that computes the average over a time window behaves and send the aggregated data to a destination of your choice. Interactive analysis also helps with iterative development of stream processing applications. The queries you build continuously update as new data arrives. With Amazon Managed Service for Apache Flink Studio, you can deploy these queries to run continuously with auto scaling and durable state backups enabled.
Getting started
Amazon Managed Service for Apache Flink elastically scales your application to accommodate for the data throughput of your source stream and your query complexity for most scenarios. For detailed information on service limits for Apache Flink applications, visit the Limits section in the Amazon Managed Service for Apache Flink Developer Guide.
Yes, by using Apache Flink DataStream Connectors, Amazon Managed Service for Apache Flink applications can use AWS Glue Schema Registry, a serverless feature of AWS Glue. You can integrate Apache Kafka, Amazon MSK, and Amazon Kinesis Data Streams, as a sink or a source, with your Amazon Managed Service for Apache Flink workloads. Visit the AWS Glue Schema Registry Developer Guide to get started and learn more.
Key concepts
- Input: Input is the streaming source for your application. In the input configuration, you map the streaming sources to data streams. Data flows from your data sources into your data streams. You process data from these data streams using your application code, sending processed data to subsequent data streams or destinations. You add inputs inside application code for Apache Flink applications and Studio notebooks and through the API for Amazon Managed Service for Apache Flink applications.
- Application code: Application code is a series of Apache Flink operators that process input and produce output. In its simplest form, application code can be a single Apache Flink operator that reads from adata stream associated with a streaming source and writes to another data stream associated with an output. For a Studio notebook, this could be a simple Flink SQL select query, with the results shown in context within the notebook. You can write Apache Flink code in its supported languages for Amazon Managed Service for Apache Flink applications or Studio notebooks.
- Output: You can then optionally configure an application output to persist data to an external destination. You add these outputs inside application code for Amazon Managed Service for Apache Flink applications and Studio notebooks.
Q: What application code is supported?
Managing applications
- Monitoring Amazon Managed Service for Apache Flink in the Amazon Managed Service for Apache Flink Developer Guide.
- Monitoring Amazon Managed Service for Apache Flink in the Amazon Managed Service for Apache Flink Studio Developer Guide.
Q: How do I manage and control access to my Amazon Managed Service for Apache Flink applications?
- Granting permissions in the Amazon Managed Service for Apache Flink Developer Guide.
- Granting permissions in the Amazon Managed Service for Apache Flink Studio Developer Guide.
Q: How does Amazon Managed Service for Apache Flink scale my application?
Amazon Managed Service for Apache Flink elastically scales your application to accommodate the data throughput of your source stream and your query complexity for most scenarios. Amazon Managed Service for Apache Flink provisions capacity in the form of Amazon KPUs. One KPU provides you with 1 vCPU and 4 GB memory.
Pricing and billing
You are charged an hourly rate based on the number of Amazon KPUs used to run your streaming application. A single KPU is a unit of stream processing capacity comprised of 1 vCPU compute and 4 GB memory. Amazon Managed Service for Apache Flink automatically scales the number of KPUs required by your stream processing application as the demands of memory and compute vary in response to processing complexity and the throughput of streaming data processed.
Building Apache Flink applications
Authoring application code for applications using Apache Flink in your IDE
DataStream <GameEvent> rawEvents = env.addSource(
New KinesisStreamSource(“input_events”));
DataStream <UserPerLevel> gameStream =
rawEvents.map(event - > new UserPerLevel(event.gameMetadata.gameId,
event.gameMetadata.levelId,event.userId));
gameStream.keyBy(event -> event.gameId)
.keyBy(1)
.window(TumblingProcessingTimeWindows.of(Time.minutes(1)))
.apply(...) - > {...};
gameStream.addSink(new KinesisStreamSink("myGameStateStream"));
You can build custom operators if these do not meet your needs. Find more examples in the Operators section of the Amazon Managed Service for Apache Flink Developer Guide. You can find a full list of Apache Flink operators in the Apache Flink documentation.
- Streaming data sources: Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Kinesis Data Streams Destinations, or sinks: Amazon Kinesis Data Streams
- Amazon Kinesis Data Firehose, Amazon DynamoDB, Amazon Elasticsearch Service, and Amazon S3 (through file sink integrations)
Q: Can Amazon Managed Service for Apache Flink applications replicate data across streams and topics?
Building Amazon Managed Service for Apache Flink Studio applications in a managed notebook
Q: How do I develop a Studio application?
You can start from the Amazon Managed Service for Apache Flink Studio, Amazon Kinesis Data Streams, or Amazon MSK consoles in a few steps to launch a serverless notebook to immediately query data streams and perform interactive data analytics.
Interactive data analytics: You can write code in the notebook in SQL, Python, or Scala to interact with your streaming data, with query response times in seconds. You can use built-in visualizations to explore the data, view real-time insights on your streaming data from within your notebook, and develop stream processing applications powered by Apache Flink.
Once your code is ready to run as a production application, you can transition with a single step to a stream processing application that processes gigabytes of data per second, without servers.
Stream processing application: Once you are ready to promote your code to production, you can build your code by clicking “Deploy as stream processing application” in the notebook interface or issue a single command in the CLI. Studio takes care of all the infrastructure management necessary for you to run your stream processing application at scale, with auto scaling and durable state enabled, just as in an Amazon Managed Service for Apache Flink application.
Q: What does my application code look like?
You can write code in the notebook in your preferred language of SQL, Python, or Scala using Apache Flink’s Table API. The Table API is a high-level abstraction and relational API that supports a superset of SQL’s capabilities. It offers familiar operations, such as select, filter, join, group by, aggregate, and so on, along with stream-specific concepts, such as windowing. You use % to specify the language to be used in a section of the notebook and can switch between languages. Interpreters are Apache Zeppelin plugins, so you can specify a language or data processing engine for each section of the notebook. You can also build user-defined functions and reference them to improve code functionality.
Q: What SQL operations are supported?
You can perform SQL operations such as the following:
- Scan and filter (SELECT, WHERE)
- Aggregations (GROUP BY, GROUP BY WINDOW, HAVING)
- Set (UNION, UNIONALL, INTERSECT, IN, EXISTS)
- Order (ORDER BY, LIMIT)
- Joins (INNER, OUTER, Timed Window – BETWEEN, AND, Joining with Temporal Tables – tables that track changes over time)
- Top-N
- Deduplication
- Pattern recognition
Some of these queries, such as GROUP BY, OUTER JOIN, and Top-N, are results updating for streaming data, which means that the results are continuously updating as the streaming data is processed. Other DDL statements, such as CREATE, ALTER, and DROP, are also supported. For a complete list of queries and samples, see the Apache Flink Queries documentation.
Q: How are Python and Scala supported?
Apache Flink’s Table API supports Python and Scala through language integration using Python strings and Scala expressions. The operations supported are very similar to the SQL operations supported, including select, order, group, join, filter, and windowing. A full list of operations and samples are included in our developer guide.
Q: What versions of Apache Flink and Apache Zeppelin are supported?
To learn more about supported Apache Flink versions, visit the Amazon Managed Service for Apache Flink Release Notes page. This page also includes the versions of Apache Zeppelin, Apache Beam, Java, Scala, Python, and AWS SDKs that Amazon Managed Service for Apache Flink supports.
Q: What integrations are supported by default in an Amazon Managed Service for Apache Flink Studio application?
- Data sources: Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Kinesis Data Streams, Amazon S3
- Destinations, or sinks: Amazon MSK, Amazon Kinesis Data Streams, and Amazon S3
Q: Are custom integrations supported?
You can configure additional integrations with a few more steps and lines of Apache Flink code (Python, Scala, or Java) to define connections with all Apache Flink supported integrations. This includes destinations such as Amazon OpenSearch Service, Amazon ElastiCache for Redis, Amazon Aurora, Amazon Redshift, Amazon DynamoDB, Amazon Keyspaces, and more. You can attach executables for these custom connectors when you create or configure your Amazon Managed Service for Apache Flink Studio application.
Service Level Agreement
Q: What does the Amazon Managed Service for Apache Flink SLA guarantee?
Our service level agreement (SLA) guarantees a Monthly Uptime Percentage of at least 99.9% for Amazon Managed Service for Apache Flink.
Q: How do I know if I qualify for an SLA Service Credit?
You are eligible for an SLA Service Credit for Amazon Managed Service for Apache Flink under the Amazon Managed Service for Apache Flink SLA if more than one Availability Zone in which you are running a task, within the same AWS Region, has a Monthly Uptime Percentage of less than 99.9% during any monthly billing cycle. For full details on all the SLA terms and conditions as well as details on how to submit a claim, visit the Amazon Managed Service for Apache Flink SLA details page.
Get started with Amazon Kinesis Data Analytics
Visit the Amazon Kinesis Data Analytics pricing page.
Learn how to use Amazon Kinesis Data Analytics in the step-by-step guide for SQL or Apache Flink.
Build your first streaming application from the Amazon Kinesis Data Analytics console.