AWS Storage Blog

Akridata accelerates processing of unstructured data with Amazon S3 Express One Zone

Deep learning processes often need to read full datasets, which are usually hundreds of gigabytes in size, before they can perform intelligent data processing. High data retrieval speed and low latency from storage are crucial for enterprises running these performance-critical workloads.

Akridata, an AWS independent software vendor (ISV) partner, helps make artificial intelligence (AI)-assisted unstructured-data exploration a reality by providing large-scale data curation, exploration, and analysis capabilities on unlabeled video and image datasets.

In this post, we describe how Akridata Data Explorer works through an example of how it helps autonomous vehicle creators unlock the true potential of their raw visual datasets. We then discuss how customers storing data in the Amazon S3 Express One Zone storage class, which can improve data access speeds by 10x and reduce request costs by 50% compared to S3 Standard, can reduce processing times by 3.5x when using Akridata Data Explorer.

Akridata Data Explorer: Automatic processing of unlabeled visual data

Akridata Data Explorer is a software as a service (SaaS) solution and is entirely built on AWS because AWS has proven operational experience, vast native services available, and reach to potential customers around the globe. Akridata Data Explorer is a flexible solution that runs on Amazon Elastic Kubernetes Service (Amazon EKS). It provides a no-code environment for end customers to understand large amounts of unstructured image or video data stored in different Amazon S3 object storage classes, without manual labeling.

The following image shows the workflow of Akridata Data Explorer:

  1. Customers first upload visual datasets into Amazon S3 (in this case, the Amazon S3 Express One Zone storage class).
  2. Customer’s sign in to Akridata Data Explorer SaaS service.
  3. Customers create a data processing pipeline by selecting the desired data model from Amazon Elastic Container Registry (Amazon ECR).
  4. Akridata Data Explorer starts the deep learning process by reading data from Amazon S3 Express One Zone.
  5. When the data catalog is created, it will be stored in Amazon Aurora.
  6. Finally, customers can perform visualization operations such as searching and data analytics against their unstructured visual datasets.

Figure1 Akridata Data Explorer Workflow

Akridata Data Explorer with autonomous vehicle data collection

To demonstrate Akridata Data Explorer, let’s use autonomous vehicle data collection as an example. In autonomous vehicle data collection, hours of videos and images are collected from each test vehicle driving around different countries every day, resulting in large amounts of unstructured data being collected. All of these datasets need to be sanitized, cleaned, and tagged.

Through its various ready-to-use pipelines, Akridata Data Explorer can automatically tag or label a dataset using any foundation or standard model. Inthe following image, you can see the result of automatic tagging by Akridata Data Explorer using the Recognize Anything Model (RAM). Infrastructure tags such as building, parking lot, and road and vehicle tags such as car, jeep, suv, and van are all being tagged automatically.

Tags generated for this image - building, car, parking log, drive, jeep, license plate, park, road, suv, van, vehicle, white

The following image demonstrates how Akridata Data Explorer can transform the entire dataset into clusters, grouping data based on the similarity of images. This visual approach offers a quick and intuitive way to grasp the big picture within large datasets, while also pinpointing outliers.

Figure3 High level view of the entire dataset grouped into clusters based on similarity. ML based auto labeling of images

The following image illustrates a user’s ability to search for visually similar images. For static image operations, Akridata Data Explorer can search up to 25 million images in a single query. This process results in heavy read operations and relies on storage subsystem performance to complete the workflow in the shortest amount of time. Amazon S3 storage scalability and performance will play a critical role in users’ experience.

On the left are three examples of images with green bounding boxes. Each example shows crosswalks and individuals holding umbrellas. In contrast, there’s a negative case shown with a red bounding box that lacks either of these features.

To find similar examples, select the thumbs up icon for the images with crosswalks and individuals holding umbrellas and then choose Quick Search. The results, shown on the right, show similar areas of interest in diverse scenarios. This feature simplifies the discovery of intriguing patterns and scenarios.

Figure4 Searching the dataset using specific areas of interest from images to discover similar patterns.jpg

Akridata Data Explorer can also search unlabeled video or image datasets using natural language queries. In the following image, Akridata Data Explorer presents the outcome of a search for “crosswalk with person holding an umbrella.” With the power of text-based search, insights from data are just a few steps away.

Figure5 Searching the dataset using natural language to discover unseen insights

Accelerating unstructured data exploration with Amazon S3 Express One Zone

When using Akridata Data Explorer, the storage subsystem used will have a direct impact on how long deep learning, discovery, tagging, and searching will take. With Amazon S3 Express One Zone, the fastest cloud object storage that is not only scalable but delivers exceptional single-digit millisecond latency for data analytic use cases, customers can experience our best performance yet.

Compared with using Amazon S3 Standard, Akridata Data Explorer pipeline execution and processing times are dramatically reduced by an average of 3.5x when storing data on Amazon S3 Express One Zone. This results in much faster data ingestion, preparation for visual data exploration, and discovery. When customers are trying to view the original high-resolution image or video recording, the low latency and high throughput of Amazon S3 Express One Zone can help significantly reduce the data preparation time to provide a much greater user experience. Customers can now analyze more data in less time while reducing operating cost, meaning higher productivity.

Conclusion

In this post, we see Akridata Data Explorer solving data classification problems by applying automatic tagging to datasets in autonomous vehicle data collection and searching for specific content through images. By storing data on Amazon S3 Express One Zone, Akridata Data Explorer customers can reduce the time it takes for data preparation by an average of 3.5x.

To learn more, visit Akridata and Amazon S3 Express One Zone or contact your AWS account manager. Akridata Data Explorer is available in AWS Marketplace.

Kunal Vasavada

Kunal Vasavada

Kunal is head of solution development and platform engineering at Akridata. His experience spans hybrid data infrastructure, storage, systems engineering, full stack ML, and big and fast data platforms. He currently leads Platform Engineering and Customer Solutions at Akridata, delivering intelligent, open-architecture edge-to-cloud DataOps platforms for large-scale, distributed data-centric AI applications.

Eric Yuen

Eric Yuen

Eric Yuen is a Senior Partner Solutions Architect with AWS. He works closely with AWS Storage Partners building solutions, and helps customers design storage environments on AWS. Eric brings 20 years of industry experience working with different storage and data protection technologies.