Posted On: Dec 9, 2020
Amazon Redshift, a fully-managed cloud data warehouse, announces preview of native support for JSON and semi-structured data. It is based on the new data type ‘SUPER’ that allows you to store the semi-structured data in Redshift tables. Redshift also adds support for the PartiQL query language to seamlessly query and process the semi-structured data. This functionality enables you to achieve advanced analytics that combine the classic structured SQL data (such as strings, numerics, and timestamps) with the semi-structured SUPER data with superior performance, flexibility, and ease-of-use.
The generic data type SUPER is schemaless in nature and allows for storage of nested values that could consist of Redshift scalar values, nested arrays or other nested structures. Amazon Redshift supports the parsing of JSON data into SUPER and up to 5x faster insertion of JSON/SUPER data in comparison to inserting similar data into classic scalar columns. PartiQL is an extension of SQL that is adopted across multiple AWS services. PartiQL allows access to schemaless and nested SUPER data via efficient object and array navigation, unnesting, and flexibly composing queries with classic analytic operations such as JOINs and aggregates. This enables new advanced analytics through ad-hoc queries that discover combinations of structured and semi-structured data. Furthermore, data engineers can achieve simplified and low latency ELT (Extract, Load, Transform) processing of the inserted semi-structured data directly in their Redshift cluster without integration with external services. PartiQL features that facilitate ELT include schemaless semantics, dynamic typing and type introspection abilities in addition to its navigation and unnesting. You can easily shred the semi-structured data by creating materialized views and can achieve orders of magnitude faster analytical queries, while keeping the materialized views automatically and incrementally maintained.