AWS Machine Learning Blog

Building an AR/AI vehicle manual using Amazon Sumerian and Amazon Lex

Auto manufacturers are continuously adding new controls, interfaces, and intelligence into their vehicles. They publish manuals detailing how to use these functions, but these handbooks are cumbersome. Because they consist of hundreds of pages in several languages, it can be difficult to search for relevant information about specific features. Attempts to replace paper-based manuals with video or mobile apps have not improved the experience. As a result, not all owners know about and take advantage of all the innovations offered by the auto manufacturers.

This post describes how you can use Amazon Sumerian and other AWS services to create an interactive auto manual. This solution uses augmented reality, an AI chatbot, and connected car data provided through AWS IoT. This is not a comprehensive step-by-step tutorial, but it does provide an overview of the logical components.

AWS services

This blog post uses the following six services:

  1. Amazon Sumerian lets you create and run virtual reality (VR), augmented reality (AR), and 3D applications quickly and easily without requiring any specialized programming or 3D graphics expertise. Created 3D scenes can be published with one click and then distributed on the web, in VR headsets and in mobile applications. In this post, Sumerian is used to render a 3D model of both interior and the exterior (optional) of the vehicle and animate it.
  2. Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex is powered by the same technology that powers Amazon Alexa. Amazon Lex democratizes deep learning technologies by putting the power of Alexa within reach of all developers. In this post, Amazon Lex is used to recognize voice commands and determine the function or feature being enquired by the owner.
  3. Amazon Polly is a text-to-speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. Amazon Polly allows you to create applications that talk and build entirely new categories of speech-enabled products. Amazon Polly supports dozens of voices, across a variety of languages, to enable applications working in different countries. In this post, Amazon Polly is used to vocalize Amazon Lex answers into lifelike speech.
  4. Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. DynamoDB is fully managed, has built-in security, backup and restore, and in-memory caching for internet-scale applications. In this post, you see the use of DynamoDB as a document store of steps for interacting within the interior of the vehicle.
  5. AWS Lambda lets you run code without provisioning or managing servers. In this demo, a Lambda function is used to populate an AWS IoT Core shadow document to contain the required
  6. AWS IoT Core is a managed cloud service that lets connected devices easily and securely interact with cloud applications and other devices. AWS IoT Core enables billions of devices and trillions of messages connect reliably and securely to AWS endpoints and to other devices. AWS IoT Core supports the concept of device shadows that store the latest state of connected devices whether these are online or not. In this post, a device shadow document is used to exchange information between Amazon Lex, DynamoDB, Sumerian, and a virtual representation of the car.

The following diagram illustrates the architectural relationships between these services.

The diagram shows AWS services in relation to each other and in relation to the end user and the vehicle. The owner’s journey starts with the mobile application that embeds the Sumerian scene containing the model of the car. The user can then tap the button to activate Amazon Lex and Amazon Polly. Once activated, the user can interact with the application to execute a series of steps to perform.

The content of the manual is stored in DynamoDB. Amazon Lex pulls this information by placing a Lambda call. The Lambda function queries the DynamoDB table and retrieves a JSON structure describing:

  1. the steps, ordered by a time and marked with start and end, to signal when the control should eventually be highlighted. For example,  …{“LeftTemperatureDial”: {“start”: 0, “end”: 2 }}…
  2. the prompt that needs to be announced while steps are shown in the Sumerian model. For example, “Press down left temperature dial for 2 seconds.”

This JSON document is then passed onto AWS IoT Core device shadow document. Sumerian then periodically polls for state change of the document and makes Sumerian model reflect the steps by highlighting interface controls accordingly.

For a better visual and aural representation, see the AWS Auto Demo video.

How to build this demo

Follow these steps and build the demo:

  1. Create a basic scene.
  2. Label the control elements.
  3. Create the DynamoDB table.
  4. Create the Amazon Lex bot.
  5. Use the Lambda function.
  6. Create a state machine in Sumerian.
  7. Position the AR camera in the scene.
  8. Publish the scene.
  9. Link to the Amazon Lex bot.
  10. Deploy the application.

Step 1: Create a basic scene

Create a basic scene, with entities and AWS configuration.

  1. Using the Augmented Reality template, create a scene and import the 3D asset of the commercially available car. This model is sourced from the 3D model marketplace but can be imported from free 3D galleries or from 3D design software in any of the supported formats.
  2. Create an Amazon Cognito identity pool, allowing Sumerian to use both Amazon Lex and AWS IoT Core. This identity pool should have the appropriate policies to access AWS IoT, Amazon Lex, and Amazon Polly. For more information, see Amazon Cognito Setup Using AWS CloudFormation.
  3. Provide the created identity pool ID to the AWS Configuration component in the Sumerian scene and enable the check box on the AWS IoT Data Client.

Step 2: Label the control elements

Create 3D labels or entities covering most of the control elements (dial, button, flap, display, sign, etc.) that are present in the interior. I colored these markers red and made them semitransparent, so that they still allow the view of the actual control underneath. I named these entities to more easily identify them in my scripts. I also hid them, to mimic the initial state, where only the actual interior is visible, as seen in the following screenshot.

Step 3: Create the DynamoDB table

Create a table in DynamoDB and populate it with several vehicle functions and appropriate steps for enabling, disabling, setting, or unsetting that function. These instructions contain start/end times and durations for each child model entity that must appear, honoring the order in which you want to show them, as shown in the following screenshot.

Step 4: Create the Amazon Lex bot

Create the Amazon Lex bot and populate it with intents and utterances. You are enabling Amazon Lex to understand owners’ questions. Amazon Lex determines which function the owner is asking about and sends this information into the Lambda function.

As seen in the two screenshots above, you are creating an intent called airconditioningManual. This intent then contains several sample utterances containing three custom slots:

  • {option} to describe the activity needed to perform, examples include “turn on”, “increase”, “remove” and others
  • {action} to describe the function, such as “temperature”, “fan speed” and others
  • {conjunction} to allow for optional conjunctions, like “with”, “on”, “of”, etc.

You can add more intents for other interactions or other parts of the vehicle.

Step 5: Use the Lambda function

The Lambda function contains code that performs the following steps.

  1. It queries the DynamoDB table to obtain a document of ordered instructions including start times, end times, and durations of the control elements (dial, button, flap, display, sign, etc.) being visible or highlighted.
    response = dynamo_client.get_item(
                        TableName='XXXautoYYY_manual',
                        Key={
                                'action_name': {
                                    'S': toget
                                }
                            }
                    )
  2. It converts and stores this set of instructions into AWS IoT Core, via a device shadow document.
     action = iot_client.update_thing_shadow(
                        thingName='XXXautoYYY',
                        payload=json.dumps({
                            "state":{
                                "desired": {
                                    "steps": actionList
                                }
                            }
                        })
                    )  
  3. It returns a response object to Amazon Lex, fulfilling the request from the owner of the manual. This response object contains instructions to be performed, wrapped in the sentence, which is played back.
    rtrn = {
            "dialogAction": {
                "type": "Close",
                "fulfillmentState": "Fulfilled",
                "message": {
                    "contentType": "PlainText",
                    "content": rtrnmessage
                }
            }
        }

Step 6: Create a state machine in Sumerian

Create a state machine in Sumerian using these steps.

  1. This state machine is continuously listening to changes that happen on device shadow document. There are three states in the state machine, as shown in the following diagram:
    1. loadSDK, which loads the AWS SDK
    2. getShadow (see the following step)
    3. A waiting state that calls the getShadow state in a looping routine.

    To learn more about state machines in Sumerian, see State Machine Basics. These changes are executed on the model, according to instructions provided by the IoT shadow, showing marking elements according to start/end time and the duration specified. The device shadow then gets reset.

  2. The getShadow state in the state machine in the preceding step is executing the script to retrieve the IoT device shadow, performing the actual animation of individual layers. To learn more about scripting and retrieving IoT device shadows, see IoT Thing, Shadow, and Script Actions. The example snippets of the script-performing steps (showing the highlight entity→waiting→hiding the highlight entity) follow:
    function showControl(control, ctx, controlName) {
        
        setTimeout(function(){
            var myWorld = ctx.entity.world.entityManager
            var controlEnt = myWorld.getEntityByName(controlName)
            controlEnt.show()
            setTimeout(function(){
                controlEnt.hide()
                
            }, (control.end-control.start)*1000);
        }, control.start*1000);
    }   

Step 7: Position the AR camera in the scene

Position the AR camera entity into the scene facing the dashboard of the vehicle. I also scale the car accordingly, so the user of the mobile application and vehicle owner can see the relative size of control elements (dial, button, flap, display, sign, etc.) compared to the reality of the physical vehicle.

Step 8: Publish the scene

Publish the scene and embed the URL into an example iOS/Android placeholder application available on GitHub. These applications are open source and available for both iOS and Android.

private let sceneURL = URL(string: "https://us-east-1.sumerian.aws/ABCDEFGHIJKLMNOPRSTUVWXYZ1234567.scene/?arMode=true&&a")!

Step 9: Link to the Amazon Lex bot

Last but not the least, I add an Amazon Lex button from another example project on GitHub and link it with the published Amazon Lex bot from Step 4.

func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
        
        let credentialProvider = AWSCognitoCredentialsProvider(regionType: AWSRegionType.USEast1, identityPoolId: "us-east-1:STUVWXYZ-0000-1111-2222-LKJIHGFEDCBA")
        
        let configuration = AWSServiceConfiguration(region: AWSRegionType.USEast1, credentialsProvider: credentialProvider)
        AWSServiceManager.default().defaultServiceConfiguration = configuration
        
        let chatConfig = AWSLexInteractionKitConfig.defaultInteractionKitConfig(withBotName: "XXXAWSYYY", botAlias: "$LATEST")
        chatConfig.autoPlayback = true
        AWSLexInteractionKit.register(with: configuration!, interactionKitConfiguration: chatConfig, forKey: "AWSLexVoiceButton")
        AWSLexInteractionKit.register(with: configuration!, interactionKitConfiguration: chatConfig, forKey: "chatConfig")
        
        return true
    }

Step 10: Deploy the application

The final step is to deploy the application onto the iOS-enabled device and test the functionality. The demo video can be seen in the AWS services section of this post.

Conclusion

This is not meant to be a comprehensive guide to every single component plugged in to the manual, but it describes all logical components. Based on this post, you should feel confident enabling and deploying 3D models of any assets that need an interactive manual with both visual and aural feedback into the cloud.

Your solution can use Sumerian and other AI, compute, or storage services. You now understand how these services integrate, what role they play in the experience and how they can be extended beyond the scope of this use case.

Start by reviewing the steps above, subscribe to the Amazon Sumerian video channel, read more about integrations with Amazon Lex and Amazon Polly and IoT Shadow, and get building!


About the Author

Miro Masat is a Solutions Architect at Amazon Web Services, based out of London, UK. He is focusing on Engineering accounts, mainly in the automotive industry. Miro is a massive fan of Virtual, Augmented and Mixed reality and always seeks ways to bring engineering to VR/AR/MR and vice versa. Outside of work, he enjoys traveling, learning languages and building DIY projects.