Amazon Rekognition Face Liveness - Responsible AI Service Card

Overview

Amazon Rekognition Face Liveness enables application providers to predict whether a user attempting to access services is a real person who is physically in front of a camera. Liveness is designed to detect presentation and digital injection attacks, where a third party attempts to spoof faces in order to impersonate another identity or evade being recognized. Examples of presentation attacks include printed photos, digital photos, digital videos, or 3D masks which are physically presented to a camera. Digital injection attacks are spoofs that bypass the camera, such as pre-recorded or deepfake videos. This AI Service Card describes considerations for responsibly deploying and using the Rekognition Face Liveness APIs and SDKs. Typically, customers leverage Amazon Rekognition Face Liveness to authenticate the physical presence of a user in front of a camera before optional subsequent tasks such as face verification, age verification, or bot detection.

Amazon Rekognition Face Liveness requires users to move their face into an oval on the screen and hold still while the device displays a series of different color lights. Face Liveness uses this video to generate a confidence score. Customers can use the confidence score to help predict whether a user is a real human physically present in front of a camera. The minimum confidence score is 0, implying a spoof attack, and the maximum is 100, implying a genuine user. A customer should select a confidence score that meets their use case needs. For example, a moderate confidence score threshold (e.g., 50 or 60) may be suitable to detect presentation attacks and some digital injection attacks. Alternatively, a high confidence score threshold (e.g., 80 or 90) may be suitable to detect sophisticated digital injection attacks, such as deep fake or pre-recorded videos. Overall performance is represented by two numbers: the true acceptance rate (TAR), which is the percentage of genuine users that pass a Liveness check above a confidence score threshold and the true rejection rate (TRR), which is the percentage of spoof attacks that fail a Liveness check below a confidence score threshold. These success metrics are generally inversely related. For example, as TAR goes up, TRR goes down and vice versa. Changing the confidence score threshold changes the TAR and TRR. Rekognition Face Liveness itself does not independently determine if a face in the selfie video is real or a spoof attack. The decision is a result of several factors which can include the confidence score threshold set by the customer, human judgements, or a combination of both.

In addition to the confidence score, Rekognition Face Liveness will also return a high-quality selfie frame extracted from the video of the cooperating end user. A customer can use the selfie frame for other use cases, such as to compare the selfie with an identity document photo for verification (using the Rekognition CompareFaces API) or to perform an age estimate as an input for a minimum age check (using the Rekognition DetectFaces API).

Rekognition Face Liveness is designed to return low confidence scores for faces from 2D photos, 3D full-face masks, digital photos or videos displayed on a device screen, and pre-recorded videos. Conversely, Rekognition Face Liveness’ confidence score is designed to be high for genuine user faces performing the requested user actions in real-time during the liveness check. However, it is important to acknowledge that various elements, known as "confounding variations”, may influence the appearance of a face in a video, which can impact Face Liveness confidence scores for genuine users. Some examples of confounding variations are: (1) variations in lighting direction and intensity; (2) head pose; (3) camera focus and video capture imperfections; (4) video resolution; and (5) occlusions caused by masks, sunglasses, hands, cell phones, scarves, hats, or other objects.

The performance of Rekognition Face Liveness will vary depending on several factors, including the expected amount of confounding variation; device screen characteristics such as brightness, size, and resolution; and camera characteristics such as resolution, focal length, frame rate, and bit rate.

Intended use cases and limitations

Amazon Rekognition Face Liveness can be deployed for a number of different use cases. Four examples are:

User onboarding use case: FinTech, EdTech, Healthcare, and other customers may choose to conduct a face liveness check to enroll new users to their online applications. After passing the liveness check, the selfie frame from the liveness check can then be compared to a face from their government-issued ID card for further verification using the Rekognition CompareFaces API (more information on responsibly deploying face matching can be found here).

Step-up authentication use case: FinTech, ride-sharing, eCommerce, and other customers may choose to enhance security measures for important user activities, such as a device change, password change, or money transfers, by conducting a liveness check. After passing the liveness check, the selfie frame from the liveness check can then be compared to a previously stored image of the user using the Rekognition CompareFaces API.

User age verification use case: Online gaming and dating platforms may choose to implement age verification measures when enrolling new users. In these scenarios, the customer may choose to conduct a face liveness check prior to verifying the user's age through ID checks and the Rekognition DetectFaces API for age prediction.

Bot detection use case: To prevent the takeover of their platforms by bots, customers in the social media and dating industries may choose to implement measures for verifying their users are real people. In order to achieve this, these customers may choose to conduct a face liveness check each time a user logs into their application, thus helping to limit automated bot accounts.

Limitations: Rekognition Face Liveness has minimum device requirements for genuine users. Devices must have a front-facing camera, a minimum screen refresh rate of 60Hz, a minimum screen size of 4 inches, not be jail-broken or rooted, and have a minimum network bandwidth of 100 kbps. Cameras must support recording in colors, not be virtual camera software, support 15 frames per second, and support a minimum recording resolution of 320x240. Desktops must have webcams mounted on top of the screen being used for the liveness check. Lastly, for web users, the supported browsers are Google Chrome, Mozilla Firefox, Apple Safari, and Microsoft Edge. Confounding variation described above can occur in both indoor and outdoor situations. However, when performing the face liveness check indoors, confounding variation is typically minimized by capturing a selfie video with front-facing poses in well-lit conditions (e.g., office lighting), resulting in a high TAR for genuine users. In these optimal conditions, a higher confidence score threshold (e.g., 80 or 90) may be more effective in achieving the right balance between the TAR and TRR. Performing the face liveness check outdoors can result in unsuitable conditions such as the bright sunlight reducing the visibility of different color lights reflected from the user's face. As a consequence, there may be an increase in false rejections of genuine users. In these suboptimal conditions, a lower confidence score (e.g., 50 or 60) may be more effective in achieving the right balance between the TAR and TRR.

Design of Rekognition Face Liveness

Machine learning: Rekognition Face Liveness is built using ML and computer vision technologies and involves the following steps: (1) The liveness check requires the user move their face into an oval and hold still for a color sequence, which is recorded as a video by the frontend SDK. (2) The video is sent to the backend API and passed through a series of checks to validate successful completion. (3) A series of models check that the face is not a spoof, e.g., a photograph of a face, a full 3D face mask, a virtual injection attack. (4) If any of the aforementioned checks do not pass, or pass with low confidence, the API returned confidence score is reduced. (5) Different frames of the video are analyzed for quality. The frame with the highest quality score is returned as the face reference image. (6) A series of audit images, configurable by customers, between 0 and 4 are also returned for future auditing purposes.

Performance expectations: Confounding variation can require different applications to use different confidence scores to achieve their desired performance. Consider two identity verification applications A and B. With each, a user first enrolls with a passport-style image, and later verifies their identity using a real-time selfie video capture. These real-time selfie videos may be captured in different ways. For example, Application A could be an in-office work application that leverages a laptop camera to capture selfie videos that are well-lit before granting access to the service. Application B could use a tablet that is mounted at the entrance of a building to capture selfie videos outside. Application B’s approach may be subject to more extreme lighting conditions such as bright sunlight. Because A and B have differing kinds of inputs, they will likely have differing face liveness error rates at the same confidence score thresholds. This requires both to set the confidence score threshold that is appropriate for their use case.

Test-driven methodology: We use multiple datasets to evaluate performance. No single evaluation dataset provides an absolute picture of performance. That’s because evaluation datasets vary based on their demographic makeup, the amount of confounding variation, the types and quality of labels available, and other factors. We measure Rekognition performance by testing it on evaluation datasets containing selfie videos of live users and spoof attacks. We choose a liveness score threshold, use Rekognition to compute the liveness score for each video, and based on the threshold, determine if the face in the video is live or a spoof attack. Groups in a dataset can be defined by demographic attributes (e.g., gender), confounding variables (e.g., the presence or absence of facial occlusion), or a mix of the two. Different evaluation datasets vary across these and other factors. Because of this, the TAR and TRR vary from dataset to dataset. Taking this variation into account, our ongoing development process examines Rekognition’s performance using multiple evaluation datasets and takes steps to decrease the TAR and TRR.

Fairness and bias: Rekognition Face Liveness is designed to work well for all human faces. To achieve this, we use the iterative development process described above. As part of the process, we use datasets that capture a diverse range of human facial features and skin tones under a wide range of confounding variation. We routinely test across use cases on datasets of spoof attacks and face selfie videos from genuine end users for which we have reliable demographic labels such as gender, age, and skin tone. Overall, we find that Rekognition has high accuracy. For example, iBeta, a NIST accredited lab, used their Quality Assurance’s Presentation Attack Detection (PAD) framework to conduct two tests (Level 1 and Level 2) of Rekognition Face Liveness using a series of presentation attacks and genuine user attempts. PAD testing is conducted in accordance with ISO/IEC 30107-3. For each test, iBeta worked with subjects to provide genuine samples as well as imposter samples (presentation attacks). iBeta recruited a diverse range of subjects across age, gender, and ethnic backgrounds. For Level 1 testing, 6 different attack types were used across varying categories of printed photos, 3D masks, photos on smart phones, and videos on laptops. 900 attack attempts resulted in a true rejection rate (TRR) of 100% at a confidence score threshold of 50. In that same test, 300 genuine user attempts resulted in a true acceptance rate (TAR) of 100% at a confidence score threshold of 50. For Level 2 testing, 5 different attack types were used across silicone mask, latex mask, 3D contoured mask, 3D printed mask, and 3D animation software. 750 attack attempts resulted in a TRR of 100% at a confidence score threshold of 50. In that same test, 250 genuine user attempts resulted in a TAR of 100% at a confidence score threshold of 50.

Explainability: If customers have questions about the liveness confidence score returned by Rekognition for a selfie video, we recommend that customers use the reference and audit images returned by Rekognition to manually review the face images for live user or spoof attack signals.

Robustness: We maximize robustness with a number of techniques, including using large training datasets that capture many kinds of variation across many individuals. Customers must establish expectations for false accept and false reject rates that are appropriate to their use case, and test workflow performance, including their choice of similarity threshold, on their content.

Privacy and security: Rekognition Face Liveness processes user selfie videos and returns liveness scores along with select frames from the video to the customer for face matching, age estimation, or audit trail purposes. Inputs and outputs are never shared between customers. Customers can opt out of training on customer content via AWS Organizations or other opt out mechanisms we may provide. See Section 50.3 of the AWS Service Terms and the AWS Data Privacy FAQ for more information. For service-specific privacy and security information, see the Data Privacy section of the Rekognition FAQs and the Amazon Rekognition Security documentation.

Transparency: Customers who incorporate Amazon Rekognition Face Liveness APIs in their workflows should consider disclosing their use of ML and face analysis technology to end users and other individuals impacted by the application, and give their end users the ability to provide feedback to improve workflows. In their documentation, customers can also reference this AI Service Card.

Governance: We have rigorous methodologies to build our AWS AI services in a responsible way, including a working backwards product development process that incorporates Responsible AI at the design phase, design consultations and implementation assessments by dedicated Responsible AI science and data experts, routine testing, reviews with customers, best practice development, dissemination, and training.

Deployment and performance optimization best practices

We encourage customers to build and operate their applications responsibly, as described in the AWS Responsible Use of Machine Learning guide. This includes implementing Responsible AI practices to address key dimensions including fairness and bias, robustness, explainability, privacy and security, transparency, and governance.

Workflow Design: The accuracy of any application utilizing Rekognition Face Liveness relies on the customer's workflow design, which includes several factors: (1) the anticipated level of confounding variation, (2) the choice of liveness confidence score threshold, (3) the characteristics of the user's device screen, camera, and network, (4) keeping the Liveness SDKs on the latest version, and (5) ongoing tests to account for any changes or drift over time.
Confounding variation: During the recording of a selfie video, workflows should incorporate measures to maintain optimal lighting conditions that are neither excessively bright nor too dim. Users should be guided to perform the liveness check in a different area if the lighting conditions are not optimal, as this can help reduce the likelihood of false rejections. Additionally, workflows should establish clear policies regarding acceptable quality for selfie frame returned and periodically sample selfie frames to ensure compliance with these policies. Random sampling of selfie frames can aid in monitoring and maintaining the quality of the workflow.
Liveness confidence score threshold: Setting an appropriate liveness confidence score threshold for the application is crucial. Otherwise, the workflow may erroneously determine the presence of a real user when there isn't (resulting in a false accept) or vice versa (leading to a false reject). The impact of a false accept may differ from that of a false reject. For instance, fintech onboarding may require a significantly higher confidence threshold compared to gig economy worker pre-delivery verification. To establish the suitable liveness confidence score threshold, customers should actively conduct representative real-world tests, and determine the right confidence score threshold that satisfies their needs.
Characteristics of the user’s device screen, camera, and network: It is important for the user device to meet the minimum specification provided in the Rekognition Face Liveness documentation. The device should have a camera that is mounted directly above the screen of the device. The device should be in good condition with the screen intact and camera operational. Wifi or cellular network used to transmit selfie video in real time should support required speeds and latency.
Latest SDKs: AWS regularly updates Face Liveness AWS SDKs (used in customer backend) and FaceLivenessDetector components of AWS Amplify SDKs (used in client applications) to provide new features, updated APIs, enhanced security, bug fixes, usability improvements, and more. We recommend that you keep the SDKs up-to-date to ensure optimal functioning of the feature. If you continue to use older versions of SDKs, requests may be blocked for maintainability and security reasons.
Human oversight: If a customer's application workflow involves a high risk or sensitive use case, such as a decision that impacts an individual's rights or access to essential services, we recommend incorporating human review into the application workflow where appropriate. Face liveness systems can serve as tools to reduce the effort incurred by fully manual solutions, and to allow humans to expeditiously review and assess possible spoofs and live user rejects.
Consistency: Customers should set and enforce policies for how humans combine the use of liveness confidence score thresholding and their own judgment to determine liveness. These policies should be consistent across all demographic groups.
Performance drift: A change in the kinds of selfie videos that a customer submits to Rekognition Face Liveness, may lead to different outputs. To address these changes, customers should consider ongoing testing of Rekognition performance and adjusting their workflow if necessary.

Further information

For service documentation, see Rekognition, Face Liveness, Face Matching Service Card.

For an example of an authentication workflow design, see Identity Verification Using Amazon Rekognition.

For details on privacy and other legal considerations, see Legal, Compliance, Privacy.

For help optimizing a workflow, see AWS Customer Support, AWS Professional Services, Amazon SageMaker Ground Truth Plus, Amazon Augmented AI.

If you have any questions or feedback about AWS AI service cards, please complete this form.

Glossary

Fairness and Bias refer to how an AI system impacts different subpopulations of users (e.g., by gender, ethnicity).

Explainability refers to having mechanisms to understand and evaluate the outputs of an AI system.

Robustness refers to having mechanisms to ensure an AI system operates reliably.

Privacy and Security refer to data being protected from theft and exposure.

Governance refers to having processes to define, implement and enforce responsible AI practices within an organization.

Transparency refers to communicating information about an AI system so stakeholders can make informed choices about their use of the system.

AWS AI Service Cards – Amazon Rekognition Face Liveness