Amazon Rekognition Announces More Accurate Object and Scene Detection, Can Now Locate Objects in Your Images

Posted on: Nov 2, 2018

Amazon Rekognition is a deep learning-based image and video analysis service that can identify objects, people, text, scenes, and activities, as well as detect unsafe content. Today we are announcing a major update to object and scene detection, also known as label detection. Label detection identifies objects and scenes in images. Until now, Amazon Rekognition could identify the presence of an object in an image, but couldn't find where the object is within the image. Amazon Rekognition can now specify the location of common objects such as dogs, people and cars in an image by returning object bounding boxes, and comes with significantly improved accuracy for all existing object and scene labels across a variety of use cases. In addition, customers can use the bounding box information to infer how many of each object ("3 dogs") occur in the image, and the relationship between objects ("dog on a couch"). These new enhancements all come at no additional cost.

Customers in news, sports, and social media companies all face rapidly growing image libraries. They are looking for ways to quickly search and filter such content. Human-provided metadata works to a degree for such applications, but that approach has limited accuracy and scalability. With Amazon Rekognition object and scene detection, customers can automatically index vast image libraries to make them searchable.

“GuruShots connects and inspires millions of photo enthusiasts around the world, reinventing the way people interact with their photos to make the experience more fun, exciting and rewarding. Previously, our end users were manually tagging images to get better insights. To provide a better customer experience, we’ve been looking for scalable ways to automatically tag uploaded images for further analysis. Using Amazon Rekognition, we now tag each user uploaded image and use the generated metadata to detect trends, improve search results, and adjust content to suit user preferences. This new streamlined process resulted in a 40% increase in user retention and a 50% increase in engagement.” - Eran Hazout, Founder and CTO, GuruShots

Now with object bounding boxes, customers can count how many of each object appears in an image (“3 dogs”), and can also determine which objects are prominent or important compared to others by using the position coordinates and bounding box size relative to image dimensions. This information can be used to make decisions on user preferences. For example, someone who has lot of photos where 'Car' is prominent is likely to be an automotive enthusiast. Some customers will also use bounding boxes to further process their images, for example to blur certain objects like weapons. Bounding box information can be further used to search for specific types of images (images with multiple dogs, or prominent dogs, versus one dog in the background). To make asset search even more powerful, Amazon Rekognition now returns parent labels in a hierarchical list, for example, the label 'Dog' has the parents 'Mammal', 'Canine' and 'Animal'. This metadata allows customers to group labels related by parent-child relationships to improve categorization and filtering.

Bounding boxes, hierarchical metadata, and improved label detection accuracy are available today in all regions where Amazon Rekognition Image is offered. Label improvements for Amazon Rekognition Video are coming soon. You can get started today via the Rekognition Console or by downloading the latest AWS SDK. For more information please refer to the documentation.