Control the evolution of data streams using the AWS Glue Schema Registry

Posted on: Nov 19, 2020

AWS Glue Schema Registry, a serverless feature of AWS Glue, enables you to validate and control the evolution of streaming data using registered Apache Avro schemas, at no additional charge. Through Apache-licensed serializers and deserializers, the Schema Registry integrates with Java applications developed for Apache Kafka/Amazon Managed Streaming for Apache Kafka (MSK), Amazon Kinesis Data Streams, Apache Flink/Amazon Kinesis Data Analytics for Apache Flink, and AWS Lambda.

Schemas define the structure and format of data records (also known as events) produced by applications. For example, a schema may be defined by a group of fields, such as an event timestamp, customer ID, email address, and a unique identifier for an action taken on a webpage. When data producing applications add or remove fields from a schema (e.g. email address is removed), data quality may be compromised and downstream applications can fail. To prevent these issues, developers often write defensive code within their applications, coordinate schema changes between upstream and downstream teams using maintenance windows, or use third party schema registries that can only be used with a single technology.  

With the Schema Registry, you can eliminate defensive coding and cross-team coordination, improve data quality, reduce downstream application failures, and use a registry that’s integrated across multiple AWS services. When data streaming applications are integrated with the Schema Registry, schemas used for data production are validated against schemas within a central registry, allowing you to centrally control data quality. Each schema can be versioned within the guardrails of a compatibility mode, providing developers the flexibility to control schema evolution. Today you can use the Schema Registry with applications built for Apache Kafka/Amazon MSK and Amazon Kinesis Data Streams, or you can use its APIs to build your own integration. Over time, we plan to integrate Schema Registry with other AWS services and open-source frameworks, and expand support for non-Avro data formats and non-Java clients.  

Visit the Schema Registry user documentation to get started and to learn more.  

The Schema Registry is available in the following AWS regions: US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), South America (São Paulo), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), and Europe (Stockholm).