AWS Media Blog

How to: Build virtual environments integrated with Amazon Sumerian


2020 showed the importance of having tools and services that allow us to interact in a virtual, efficient, safe, and cost-optimized way; and yet, we often still continue to use traditional platforms. In this post, we show you how to build a serverless, pay-per-use solution that allows users to interact in a virtual environment with a smart host to share information with them. For example, in interactive kiosks, as a presenter for specific products, or as a friendlier interface between the user and a traditional backend.


Amazon Sumerian is a managed service that allows you to create and run 3D scenes using Augmented Reality (AR) and Virtual Reality (VR) technology. It can run on VR devices, mobile devices, and browsers. This workflow is web-based, so all you need is a browser to start using it. It is also compatible with Oculus (Go and Rift) and HTC (Vive and Vive Pro).

In the following tutorial, we show you how to build a 3D scenario with the following elements:

  • a virtual room
  • a host who explains the use of the different elements within the room, and how to interact with them
  • a TV screen to show pre-recorded videos
  • a second screen where your webcam is activated to interact with your audience

In the backend, we build a complete multimedia signal chain to upload your videos (in any format and resolution) so they are processed and adapted to a streaming format for your scene.




Solution overview


The architecture uses serverless services, in which infrastructure management tasks like capacity provisioning and patching are handled by AWS, so you can focus on delivering value to your customers. Natively, the solution includes automatic scaling, built-in high availability, and a pay-per-use billing model.

Solution overview

1. Media ingest:

a. Videos are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket.
b. Video parameters are obtained using AWS Elemental MediaConvert (a service to process video files and clips to prepare on-demand content for distribution or archiving).
c. Video parameters (name, width, height, resolution, bitrate, and path) are stored in Amazon DynamoDB, a key-value, and document database that delivers single-digit millisecond performance at any scale.
d. A topic is generated in Amazon Simple Notification Service (Amazon SNS) and an email is sent indicating the start of the video conversion.


2. Video processing:

a. Using the video file parameters, the conversion profile is selected (original video in 4K, 1080P, or 720P). AWS Elemental MediaConvert converts the videos into DASH or HLS formats (used in this tutorial) in ABR, and MP4.
b. The converted videos are stored in Amazon S3, and the metadata in Amazon DynamoDB is updated.


3. Publishing

a. The URL is stored in Amazon DynamoDB, which points to the location where the video can be downloaded from an Amazon CloudFront distribution.
b. The original video file is stored on Amazon S3 Glacier, a low-cost, long-term archival solution with 99.999999999% of durability.



4. Consumption

a. The virtual reality scene is downloaded from an Amazon CloudFront content delivery network for  distribution to your browser.
b. The user’s browser is assigned with an unauthenticated role in Amazon Cognito, an access control solution for simple and secure user sign-up, and acquires permissions to run AWS services. In our case, we are going to use Amazon Polly, our text to speech deep learning service.
c. The browser executes the scene, and downloads the videos generated from Amazon CloudFront and the components required to animate the host.

The implementation of this workflow consists of two parts:

  1. Video on Demand on AWS Solution
  2. Amazon Sumerian component for virtual reality


Part 1: Video on Demand on AWS Solution

Ingestion, processing, and publication

We can perform these processes through a pre-built AWS solution:

1. Deploy the Video on Demand on AWS solution. The implementation guide can be found here.

a. Make sure to at least upload a video to the Amazon S3 source bucket to present it in your 3D scene. You can use this video for testing.

Once you implement the solution, the following workflow will be installed:


Part 2: Amazon Sumerian component for virtual reality

Consumption (scenes in Virtual Reality)

In the following steps, the virtual reality scene is built in Amazon Sumerian with these components:


The host of the scene

Amazon Sumerian has components called “Hosts”, which are characters with the following characteristics:

  • They are pre-built – it is only necessary to select the host and insert it into the scene.
  • Ready to use – the host has a set of pre-defined behaviors that are available without coding. Examples are:
            • Coordinated movement of the host (gestures and body language) depending on the situation.
            • Eye-tracking to the camera.
  • Runs on AWS services, like Amazon Polly and Amazon Lex, our conversational AI for chatbots.





The virtual room 

The virtual room is the environment in which the host and the objects in the environment interact.

The room characteristics are:

  • Two screens: one to display video and another to display the video transmission from our camera.
  • It includes various 3D components from the Amazon Sumerian repository such as surfaces, furniture, and textures.
  • Different types of light sources to create a more natural environment.






Service Integration

Amazon Sumerian generates virtual reality scenes viewable in a web browser. To provide the full immersive experience securely, we need to assign an AWS Identity and Access Management (IAM) Role to the scene, which then lets our users access additional AWS services used in this solution.

For this solution, we use Amazon Polly.






The complete scene can be downloaded for import into Amazon Sumerian from this link. If using this file, you’ll need to create the roles in Amazon IAM and the User Pool in Amazon Cognito.

Step 1 – Amazon Cognito

Amazon Cognito lets you add user sign-up, sign-in, and access control to your web and mobile apps quickly and easily by creating Identity Pools.

To assign IAM Roles to an Amazon Sumerian scene

1. On the AWS Management Console, open Amazon Cognito.

2. On Manage Identity Pools, create a new Identity Pool



3. Select an Identity Pool Name and create the Pool. The Identity Pool should allow Enable access to unauthenticated identities.



4. Two IAM roles are created: one authenticated and one unauthenticated. Write down the names of both roles, as they will be used later, then click on Allow.


5. The Pool and the roles are generated.



6. Click on Edit Identity Pool.



7. Make a note of the Identity Pool ARN, for using them when configuring Amazon Sumerian.


Step 2 – Amazon IAM

In this step, the roles created are modified to provide the necessary permissions to access the Amazon Polly service. We will use the names of the roles from the previous step.

1. Enter Amazon IAM, and select Roles.



2. Click on the unauthenticated role name that was created on Step 1 – Amazon Cognito.



3. Click Attach policies.


4.  Find and select the AmazonPollyReadAccess policy, then click on the Attach Policy button.


5.  Validate that the policy was attached to the role correctly.


 Step 3 – Amazon Sumerian (Configuration)

In this step, you create and configure the Amazon Sumerian scene, tying both the serverless backend and the rest of the needed AWS services.

1. From the AWS Management Console, open Amazon Sumerian, and click on Create new scene.


2. On the upper right corner of the editor screen, configure the Amazon Cognito Identity Pool using the ID we obtained during Step 1 – Amazon Cognito.



Step 4 – Amazon Sumerian (Scene construction)

At this point, you can add as many components (elements like furniture or details) as you would like to the scene. The complexity of the scene may be limited by the processing power of your device.

Tip: To navigate within your 3D scene, use the following commands:

a. To move the scene around the camera: Left mouse button + drag
b. To select objects within the scene: Right mouse button + drag
c. To focus on an object: select it from the panel Entities, and click the letter F on your keyboard.


1. All components of Amazon Sumerian are entities, which are at the top of the Amazon Sumerian Window. In this step, we import elements to generate entity instances.

2. Click on Import Assets.



3. A window opens with all the items you can use in Amazon Sumerian. In the upper right text entry box, type Room, select the object that is shown on the main window within the Amazon Sumerian console, and click Add to include the room to your scene.



4. In the Assets section, the Room element is displayed. Select the item room_ViewRoom.fbx, and drag it to the Amazon Sumerian main screen with the left mouse button.



5. Be sure that the room coordinates are 0,0,0, as shown on the screen.



6.  Be sure that the XYZ axis is aligned to the lower left corner of the room.



7.  Attach a new entity. Click on Create Entity.



8. Select a light source. Type will be Spot.



9. Set the following coordinates and characteristics for the light source.



10. The room should look as shown in the following image.



11. Two TV screens are integrated. One to watch your videos, and a second to transmit the video signal from the Webcam. Go back to the Assets section, select Import Assets and search for Television Hanging. Select this asset by clicking Add.

12. Drag the object Television_Wall_Viewroom.fbx to the scene with the left mouse button. The coordinates of the object should be as shown in the following.



13. In order to add the second TV screen, repeat Step 12, but set the coordinates as noted in the following.



14.  The result is two TV screens as shown in the following image.



15. In order to use the screen for video on demand (VoD), an HTML3D must be included, which allows us to show our media. Click on Create Entity, and select HTML3D. Use the following coordinates for the object.

16. Enter the Amazon DynamoDB service console, open the VideoOnDemand table, search the elements for the field mp4Outputs and take note of the URL of the previously transcoded video. The DynamoDB table was generated when we deployed the Video on Demand on AWS solution.



17.  Select object 1 (TV screen for VoD) HTML3D, enter the value 300 in Width, and click Open in Editor.



18.  Generate an iFrame with the following code. Click on Save.



19. For the Webcam screen, select the second screen and configure the following values.



20. Select Add Component, and click on Script.



21. Click on “+” and select Custom (Legacy Format).



22. Edit the script and put in the following code. Once the script is edited, click Save.




// Called when play mode starts
function setup (args, ctx) {
    const material = ctx.entity.meshRendererComponent.materials [0];
    new sumerian.TextureCreator (). loadTextureWebCam (). then ((data) => {
        material.setTexture ('DIFFUSE_MAP', data.texture);
        ctx.mediaStream =;
    }, () => {
        console.error ('Error loading webcam texture');
// Called when play mode stops
function cleanup (args, ctx) {
    // Close the webCam media stream
    if (ctx.mediaStream) {
       ctx.mediaStream.getTracks () [0] .stop ();



23. The playback control icons are at the bottom of the working screen. Click on the Play icon to Play the scene.



24. Allow the use of the webcam, and validate the correct playback of the selected video on the first screen and the playback of the webcam on the second screen.



25. Next the host is inserted. Select Import Assets, in the search box, type Cristine Hoodie and add the asset to the scene.

26. In the Assets window, select the object Cristine Hoodie with a hexagon on the left, and drag to the scene. Place with the following coordinates.



27. Expand the host characteristics, select Point of Interest as Look at Entity, and Target Entity as Fly Cam. This makes the host look at the webcam during the scene.



28. The host can communicate with the audience to add interaction. For this ability, you must configure Speech with the following parameters within the Host configuration parameters.



29. In Speech Files, click on the “+”, and in the pop-up window type the text that you want the host to speak during the scene, then click on Save to save.



30. Create a State Machine, so that the moment the scene starts Amazon Polly allows the host to speak.

31. Click  + Add Component.



32. In the pop-up window, click Add Action.



33. Find and select Start Speech.



34. Configure Speech as shown in the following:


35. Configure scene camera; select the object in the window Assets.



36. Position the camera using these coordinates.



Step 5 – Amazon Sumerian – publish the scene

1. Save the scene (Scene / Save). On the menu on the right uppermost corner, select Publish.

2. Scene is saved and published.



3. Once the scene is published, copy the URL of your scene and run it in the browser of your choice (for this demo tutorial, Run the scene: select the video controls, observe the host’s expressions, and validate the webcam display.




To remove the installed components, remove the AWS CloudFormation stack that was created using the following manual:

Note: Don’t forget to delete any files that you have stored on Amazon S3.


With this solution, we showed you how to configure a virtual reality scene using serverless technology with AWS services. This is just a starting point; you can start setting up multiple behaviors for the scene using services like Amazon Lex, to Chatbot with the host, or use the state machine to add multiple behaviors to the scene.



If you have questions, feedback, or would like to get involved in discussions with other community members, visit the AWS Developer Forums: Media Services.