AWS Big Data Blog
Introducing Point in Time queries and SQL/PPL support in Amazon OpenSearch Serverless
Today we announced support for three new features for Amazon OpenSearch Serverless: Point in Time (PIT) search, which enables you to maintain stable sorting for deep pagination in the presence of updates, and Piped Processing Language (PPL) and Structured Query Language (SQL), which give you new ways to query your data. Querying with SQL or PPL is useful if you’re already familiar with the language or want to integrate your domain with an application that uses them.
OpenSearch Serverless is a powerful and scalable search and analytics engine that enables you to store, search, and analyze large volumes of data while reducing the burden of manual infrastructure provisioning and scaling as you ingest, analyze, and visualize your time series and search data, simplifying data management and enabling you to derive actionable insights from data. The vector engine for OpenSearch Serverless also makes it easy for you to build modern machine learning (ML) augmented search experiences and generative artificial intelligence (generative AI) applications without needing to manage the underlying vector database infrastructure.
PIT search
Point in Time (PIT) search lets you run different queries against a dataset that’s fixed in time. Typically, when you run the same query on the same index at different points in time, you receive different results because documents are constantly indexed, updated, and deleted. With PIT, you can query against a state of your dataset for a point in time. Although OpenSearch still supports other ways of paginating results, PIT search provides superior capabilities and performance because it isn’t bound to a query and supports consistent pagination. When you create a PIT for a set of indexes, OpenSearch creates contexts to access data at that point in time and when you use a query with a PIT ID, it searches the contexts that are frozen in time to provide consistent results.
Using PIT involves the following high-level steps:
- Create a PIT.
- Run search queries with a PIT ID and use the
search_after
parameter for the next page of results. - Close the PIT.
Create a PIT
When you create a PIT, OpenSearch Serverless provides a PIT ID, which you can use to run multiple queries on the frozen dataset. Even though the indexes continue to ingest data and modify or delete documents, the PIT references the data that hasn’t changed since the PIT creation.
Run a search query with the PIT ID
PIT search isn’t bound to a query, so you can run different queries on the same dataset, which is frozen in time.
When you run a query with a PIT ID, you can use the search_after
parameter to retrieve the next page of results. This gives you control over the order of documents in the pages of results.
The following response contains the first 100 documents that match the query. To get the next set of documents, you can run the same query with the last document’s sort values as the search_after
parameter, keeping the same sort and pit.id. You can use the optional keep_alive
parameter to extend the PIT time.
Close the PIT
When your queries on the dataset are complete, you can delete the PIT using the DELETE operation. PITs automatically expire after the keep_alive duration.
Considerations and limitations
Keep in mind the following limitations when using this feature:
- Search slicing is not supported in OpenSearch Serverless
- PIT list segment is not supported
- The total number of open PITs are restricted to 300 per collection that share the same AWS Key Management Service (AWS KMS) key
SQL and PPL support
OpenSearch Serverless provides a primary query interface called query DSL that you can use to search your data. Query DSL is a flexible language with a JSON interface. In addition to DSL, you can now extract insights out of OpenSearch Serverless using the familiar SQL query syntax.
You can use the SQL and PPL API, the /plugins/_sql
and /plugins/_ppl
endpoints respectively, to search the data. You can use aggregations, group by, and where clauses to investigate your data and read your data as JSON documents or CSV tables, so you have the flexibility to use the format that works best for you. By default, queries return data in JDBC format. You can specify the response format as JDBC, standard OpenSearch JSON, CSV, or raw.
Use the /plugins/_sql
endpoint to send SQL queries to the SQL plugin, as shown in the following example.
Besides basic filtering and aggregation, OpenSearch SQL also supports complex queries, such as querying semi-structured data, set operations, sub-queries and limited JOINs. Beyond the standard functions, OpenSearch functions are provided for better analytics and visualization.
For PPL queries, use the /plugins/_ppl
endpoint to send queries to the SQL plugin.
Considerations and limitations
Keep in mind the following:
- Query Workbench is not supported for SQL and PPL queries
- The SQL and PPL CLI is supported and can be used to issue SQL and PPL queries
- DELETE statements are not supported
- SQL plugin data sources are not supported
- The SQL query stats API is not supported
Summary
In this post, we discussed new features in OpenSearch Serverless. PIT is a useful feature when you need to maintain a consistent view of your data for pagination during search operations. SQL in OpenSearch Service bridges the gap between traditional relational database concepts and the flexibility of OpenSearch’s document-oriented data storage. You can send SQL and PPL queries to the _sql and _ppl endpoints, respectively, and use aggregations, group by, and where clauses to analyze their data.
For more information, refer to :
- Point in Time queries in Amazon OpenSearch Serverless
- SQL and PPL support in Amazon OpenSearch Serverless
About the Authors
Jagadish Kumar (Jag) is a Senior Specialist Solutions Architect at AWS focused on Amazon OpenSearch Service. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.
Frank Dattalo is a Software Engineer with Amazon OpenSearch Service. He focuses on the search and plugin experience in Amazon OpenSearch Serverless. He has an extensive background in search, data ingestion, and AI/ML. In his free time, he likes to explore Seattle’s coffee landscape.
Milav Shah is an Engineering Leader with Amazon OpenSearch Service. He focuses on the search experience for OpenSearch customers. He has extensive experience building highly scalable solutions in databases, real-time streaming, and distributed computing. He also possesses functional domain expertise in verticals like Internet of Things, fraud protection, gaming, and ML/AI. In his free time, he likes to ride his bicycle, hike, and play chess.