AWS Partner Network (APN) Blog

How Deloitte’s Image Recognition Framework Leverages AWS Artificial Intelligence and Machine Learning

By Ritesh Pal, Director, Cloud Engineering – Deloitte India
By Shubhangi Soni, Manager, Cloud Engineering – Deloitte India
By Gautam Jha, Partner Solutions Architect, AISPL – AWS


With exponential increases in the amount of visual data such as images and videos being generated, image analytics is fast becoming a major business driver for many organizations. The image recognition market is estimated to grow to US $53 billion by 2025, offering a vast landscape for business applications.

Enterprises are looking to adopt image analytics to solve real-world business problems—identifying products, landmarks and brands, digital identity verification, workplace safety, and flagging inappropriate content, to name a few.

Amazon Web Services (AWS) provides high-level artificial intelligence (AI) and machine learning (ML) capabilities that organizations can integrate with their applications to perform image analytics. However, without the technical expertise in analytics and AI/ML, many find it challenging to implement such solutions in their environment.

Deloitte Touché Tohmatsu India LLP (Deloitte) is an AWS Premier Consulting Partner that has developed the Deloitte Image Recognition Framework, a cloud-based image recognition platform leveraging ML to automatically distinguish between two images. It helps customers of all skill levels to deploy image recognition and analytics solutions and customize it per their business requirements.

This post describes the high-level architecture of the Deloitte Image Recognition Framework, and explores some of the key features it offers.

Functional Architecture

The Deloitte Image Recognition Framework is a fully managed image recognition platform to set up the foundation for delivering image recognition and analytics solutions.

The platform offers a pre-built and web-based user interface (UI), as well as an ML-based image extraction framework, predefined attribute catalogue to enhance image classification, and ready-to-use ML-based analytics layer which can be customized based on user requirements. The platform is scalable and agile to meet real-time performance needs.


Figure 1 – Functional architecture.

The platform is built and deployed on AWS and has the following architectural layers:

  • Ingestion layer: This provides a web-based UI to upload images and associate tags with the respective images.
  • Extraction layer: This uses an image extraction model, which extracts multiple attributes of the uploaded images. These attributes include color, shape, text, material type, and logo. These attributes are used for developing use cases based on analytical models.
  • Analytics layer: This is responsible for providing data analysis and processing capabilities to generate business insights.
  • Reporting and visualization layer: This presents a visual of real-time reporting and dashboard generation.

Solution Overview

The following high-level architecture diagram outlines the flow of events across the four layers stated above, and showcases other peripheral services which ensure the solution remains operationally efficient, scalable, and secure.


Figure 2 – Detailed deployment architecture.

The platform provides a UI to upload images. Once uploaded, the images land in Amazon Simple Storage Service (Amazon S3) buckets, which are used as the image repository.

Once copied to the S3 bucket, an event is generated that triggers the initial AWS Lambda function to determine the type of document file (such as .doc or .pdf) or image file (such as .jpeg or .png).

Based on the file type, the following actions are taken:

  • For image files, the Lambda function calls Amazon Rekognition pre-built and custom models, which extract the image attributes and updates the respective .csv file for the image that contains the complete set of metadata associated with it.
  • For document files, another Lambda function calls the Amazon Textract service for text detection, which detects the text in the document and stores in the .csv file.

On successful execution, the final .csv contains the relevant metadata for the image or document file, which will be uploaded in the S3 bucket metadata repository.


Figure 3 – Data extract from the framework for an image and document.

Using Amazon Athena, the metadata repository can be queried on the basis of the image or document name and other attributes such as dates. Once the data is extracted using Athena, Amazon QuickSight is used for further visualization, dashboarding, and reporting.

The solution adopts a serverless approach wherever possible while leveraging AWS managed services like AWS Lambda, Amazon Rekognition, and Amazon Textract. The entire solution is consumed in a pay-as-you-go model, enabling cost efficiency and flexibility for end users.

The interface is highly secure and uses Amazon Cognito for authenticating users. Additionally, if required, users can deploy AWS WAF as an additional security measure.


Figure 4 – Charts based on the various data extract from the framework.

AWS Services

The following section contains a brief overview of the key services used for the benefit of users who may not be familiar with AWS.

  • Amazon Rekognition is at the heart of this framework. With Amazon Rekognition, you can identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content. Amazon Rekognition also provides highly accurate facial analysis and facial search capabilities you can use to detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases.
    This platform leverages both pre-built and custom models of Amazon Rekognition for the image classification. With Amazon Rekognition Custom Labels, you can identify the objects and scenes in images that are specific to your business needs.
  • Amazon Textract is a machine learning service that automatically extracts text, handwriting, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Amazon Textract is being used here to extract text and data from the images.
  • AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware scaling logic, maintaining event integrations, or managing runtimes.
    In this solution, when an image is uploaded to S3, an event is generated that triggers the cycle of Lambda functions. These are responsible for the internal communication between the ML models of the platform and S3.
  • Amazon Athena is an interactive query service that makes it easy to analyze data in S3 using standard SQL. Athena is serverless, so there’s no infrastructure to manage, and you pay only for the queries you run. In this solution, Athena is used to query and analyze the metadata repository for further reporting.
  • Amazon QuickSight is used for reporting and generating dashboards.
  • Amazon S3 is used to maintain the image repository. Also, archival of images is done by using the lifecycle management feature of S3.

Platform Reliability

AWS has natively designed services in a way which helps customers create a highly reliable architecture. Services are replicated across AWS Availability Zones (AZs), and in most cases failover happens seamlessly without a customer even knowing about it.

Though this involves a number of design-level considerations and complex implementation and maintenance, AWS does all of this on behalf of customers. You don’t need to do the undifferentiated heavy-lifting to achieve higher reliability.

For example, Deloitte has chosen Amazon S3 to implement the image repository because it’s a highly reliable, scalable, and low-cost storage solution. With versioning, lost images can be restored from a previous version.

Similarly, AWS Lambda maintains compute capacity across multiple AZs in the region to protect the code against individual machine or single data center facility failures. Lambda provide a highly resilient architecture for the applications.

Platform Security

Security is the major concern when it comes to handling crucial data. AWS provides features to implement security at various stages, and encryption both during transit and at rest.

Fine-grained AWS Identity and Access Management (IAM) roles and policies can further restrict backend access, if necessary. Being a web application, Amazon Cognito is used to add one more layer of security.

Amazon Rekognition is integrated with IAM, and the IAM policies can be used to control access to the Amazon Rekognition API, as well as manage resource-level permissions for AWS accounts.

Finally, AWS CloudTrail logs all the API calls by default, and Amazon CloudWatch is used for maintaining all of the application logs. Those can be analyzed to troubleshoot any security concerns that might arise.

Key Benefits

The key benefits of using the Deloitte Image Recognition Framework includes:

  • Faster time to market: With key foundational components already in place, businesses can use the platform to quickly build and implement their specific use case.
  • Reduced overheads: The platform provides a built-in ML-based image training model, thus reducing the need for development and requirement of specific or niche skills
  • Scalability: The platform leverages native cloud services provided by AWS and can scale seamlessly to address an increase in load.
  • Secured: The platform uses AWS security services to handle the access control permissions of users and maintain all of the security measures required for a Well-Architected platform.


As organizations strive to realize the value image recognition technologies can bring in across industry sectors, there’s a need to implement such solutions in a quick and consistent way.

AWS provides a range of services which customers can use to set up a robust image recognition and analytics platform for their business needs. Customers can start small, experiment, pay per use, and expand when needed.

The Deloitte Image Recognition Framework provides a fully managed platform that can act as the foundational layer for businesses to implement innovative use cases around image recognition. It enables customers to embark on a journey to AWS for image recognition and analytics.

Deloitte can also provide a range of advisory, consulting, and operational support to help customers meet their growing business demands.


Deloitte – AWS Partner Spotlight

Deloitte is an AWS Premier Consulting Partner and MSP. Through a network of professionals, industry specialists, and an ecosystem of alliances, they assist clients in turning complex business issues into opportunities for growth, helping organizations transform in the digital era.

Contact Deloitte | Partner Overview

*Already worked with Deloitte? Rate the Partner

*To review an AWS Partner, you must be a customer that has worked with them directly on a project.