Transforming audio and shared content in the Amazon Chime SDK for JavaScript

During the course of a virtual meeting, participants sometimes want to share rich media in addition to their standard audio and video. The Amazon Chime SDK enables customers to share media from other applications or browsers into a meeting, such as music for an online fitness class. You can share audio and video from your local computer, local media files, browser tabs, or other sources on the Internet.

However, audio and video sources might not always be ready to be shared with other participants. Audio might be too loud or too quiet, or otherwise need to be adjusted in real time. A video or presentation might need to be cropped or rotated.

One common need is to temporarily reduce the volume of music or the audio track of a video while an instructor or presenter speaks, or while a translator performs live translation. This process is called ducking. The Amazon Chime SDK for JavaScript allows you to define transform devices to express these steps, leveraging the power of Web Audio and a video transform pipeline.

This blog post shows you how to apply a Web Audio transform to microphone input and to content share audio. You then extend this logic into a ducker that can be used to automatically reduce content share volume while the user speaks.

Prerequisites

You understand the basics of how the Amazon Chime SDK is integrated into an application, and have read and satisfied the prerequisites of Building a Meeting Application using the Amazon Chime SDK.
You have the meetingV2 demo application checked out, running, and open in your development tools.
For testing on a single device you will need headphones and a browser that supports choosing audio output (e.g., Google Chrome).

Note: deploying and using the meeting demo code may incur AWS charges.

License

The code in this blog post is licensed under the terms of the Apache License 2.0, along with the rest of the code in the amazon-chime-sdk-js repo.

Getting started

The microphone input, the audio element output, and the WebRTC layer that Amazon Chime uses to exchange audio with other participants all use the web’s media stream abstraction to define the passage of audio through the application. Each stream consists of several tracks, either audio or video.

A diagram showing how camera and microphone inputs are media streams consumed by the Amazon Chime SDK, sent via the WebRTC peer connection

Web Audio is a web technology that connects input streams to a graph of audio nodes within an audio context. Those nodes can be used to transform or generate audio tracks. Each node has at least one input or at least one output, and often both. You can read more about audio nodes in the AudioNode documentation on MDN. The Amazon Chime SDK offers an equivalent concept for processing video; see the Video Processor documentation for more details.

A diagram showing how video pipeline processors and Web Audio nodes can be used to adjust camera and microphone input before the Amazon Chime SDK sends it via the WebRTC peer connection

You will use Web Audio to define an audio transform device that can adjust the volume of microphone input, and test it in a web browser using the meetingV2 demo.

You will use the same technique to modify the demo to adjust the volume of content share audio, and then link that to the demo’s existing realtime volume observer to adjust volume automatically.

Defining and using a simple audio transform

With the Amazon Chime SDK for JavaScript, much of the work needed to apply Web Audio nodes to microphone inputs is done for you. A JavaScript class that implements AudioTransformDevice can define changes to the constraints of another device or a graph of audio nodes that modify the input audio, and the device controller takes care of the rest. You can define a transform for a single AudioNode — the simplest case — by using the SingleNodeAudioTransformDevice abstract class.

To create an AudioTransformDevice that adjusts volume you can use a GainNode. A gain node takes a single stream as input, adjusts the volume, and emits a single stream as output. The audio transform device exposes a method to adjust the settings on the gain node. Your application code can directly select an instance of your new transform device using DeviceController.chooseAudioInputDevice, anywhere you would use a microphone device.

Create a new file, meetingV2/audiotransform/VolumeTransformDevice.ts with the following code:

import {
  SingleNodeAudioTransformDevice,
} from 'amazon-chime-sdk-js';

export class VolumeTransformDevice extends SingleNodeAudioTransformDevice<GainNode> {
  private volume: number = 1.0; // So we can adjust volume prior to creating the node.

  async createSingleAudioNode(context: AudioContext): Promise<GainNode> {
    const node = context.createGain();

    // Start at whatever volume the user already picked for this device, whether
    // or not we were connected to the audio graph.
    node.gain.setValueAtTime(this.volume, context.currentTime);
    return node;
  }

  setVolume(volume: number): void {
    this.volume = volume;
    if (this.node) {
      this.node.gain.linearRampToValueAtTime(volume, this.node.context.currentTime + 0.25);
    }
  }
}

You can try this out in the demo application by importing the above class in meetingV2.ts:

import { VolumeTransformDevice } from './audiotransform/VolumeTransformDevice';

and adding this fragment of code at the top of selectAudioInputDevice:

    // Let's interject volume control into the device selection process.
    if (!isAudioTransformDevice(device)) {
      const volumeTransformDevice = new VolumeTransformDevice(device);
      
      // Make `setVolume` visible to the page so we can change it!
      (window as any).setVolume = volumeTransformDevice.setVolume.bind(volumeTransformDevice);
      return this.selectAudioInputDevice(volumeTransformDevice);
    }

Rebuild and relaunch:

cd demos/browser; npm run start

Now put on some headphones and join your test meeting from two browser windows, making sure to check the Web Audio checkbox.

Mute one of the two windows, and in the other open the Console (in Firefox, click Tools → Web Developer → Web Console; in Chrome, View → Developer → JavaScript Console). You can tweak the volume as follows, and as you speak you will hear the volume change in the output from the other window.

window.setVolume(0.1);
window.setVolume(0.5);

This example is enough to illustrate the point, and I won’t go into detail here about how to build a user interface for this transform device. If you want to take this further, you could add an input slider to your HTML:

<input id="volume-in" type="range" min="0" max="1" step="0.1" value="1">

Instead of storing the function setVolume on the window object, attach it to the change handler for the input:

document.getElementById('volume-in').onchange = setVolume;

Your application probably uses a framework like Vue, React, or jQuery, but the concept remains the same — associate the change in an input element all the way through to calling linearRampToValueAtTime on the GainNode.

Changing the volume of content share

Some browsers, including Google Chrome, allow you to share the audio playing within a tab as well as its visual contents, or share your entire desktop alongside the audio playing through your computer speakers. You can use content share within the Amazon Chime SDK to share presentations, videos, or music with other participants.

Just like microphone and camera input, content share uses media streams. In the case of content share with audio, the stream contains both audio and video tracks.

The Amazon Chime SDK doesn’t support applying AudioTransformDevice transforms directly to content share, and Web Audio nodes don’t pass through video tracks. Instead, you can translate the AudioTransformDevice code to work directly on audio streams, then peel apart the combined stream into its tracks, apply the audio transform, and recombine them for use.

Create a new file, meetingV2/audiotransform/volume.ts. In that file, define a helper function to apply an AudioNode to a stream containing both video and audio tracks:

function addAudioNodeToCombinedStream(context: AudioContext, node: AudioNode, inputStream: MediaStream): MediaStream {
  const audioTracks = inputStream.getAudioTracks();

  // This is a new stream containing just the audio tracks from the input.
  const audioInput = new MediaStream(audioTracks);

  // These are the input and output nodes in the audio graph.
  const source = context.createMediaStreamSource(audioInput);
  const destination = context.createMediaStreamDestination();

  source.connect(node);
  node.connect(destination);

  // Now create a new stream consisting of the gain-adjusted audio stream
  // and the video tracks from the original input.
  const combinedStream = new MediaStream(destination.stream);
  for (const v of inputStream.getVideoTracks()) {
    combinedStream.addTrack(v);
  }

  return combinedStream;
}

Use that helper to apply a gain node to a content share stream:

import { DefaultDeviceController } from 'amazon-chime-sdk-js';

export function addAudioVolumeControlToStream(inputStream: MediaStream): { stream: MediaStream, setVolume?: (volume: number) => void } {
  // Handle the case where this is a silent screen share: just
  // return the input stream with no volume adjustment.
  if (!inputStream.getAudioTracks().length) {
    return { stream: inputStream };
  }

  // This is the Web Audio context to use for our audio graph.
  const audioContext: AudioContext = DefaultDeviceController.getAudioContext();

  // This node applies a gain to its input. Start it at 1.0.
  const gainNode = audioContext.createGain();
  gainNode.gain.setValueAtTime(1.0, audioContext.currentTime);

  // This function lets you adjust the volume. It uses a quick linear ramp
  // to avoid jarring volume changes.
  const setVolume = (to: number, rampSec = 0.25): void => {
    gainNode.gain.linearRampToValueAtTime(to, audioContext.currentTime + rampSec);
  }

  // Now apply the node to the stream using the helper.
  const stream = addAudioNodeToCombinedStream(audioContext, gainNode, inputStream);
  return {
    setVolume,
    stream,
  };
}

Import addAudioVolumeControlToStream in meetingV2.ts:

import { addAudioVolumeControlToStream } from './audiotransform/volume';

Now you can extend the definition of contentShareStart in meetingV2.ts to add volume adjustment to the content share stream. Replace the first switch case and amend the second, adding an import for ContentShareMediaStreamBroker to the import block at the top of the file:

import {
  …
  ContentShareMediaStreamBroker,
  …
} from 'amazon-chime-sdk-js';

…

    switch (this.contentShareType) {
      case ContentShareType.ScreenCapture: {
        const contentShareMediaStreamBroker = new ContentShareMediaStreamBroker(this.meetingLogger);
        const mediaStream = await contentShareMediaStreamBroker.acquireScreenCaptureDisplayInputStream();
        const { stream, setVolume } = addAudioVolumeControlToStream(mediaStream);
        (window as any).setVolume = setVolume;
        await this.audioVideo.startContentShare(stream);
        break;
      }
      case ContentShareType.VideoFile: {
        …
        const { stream, setVolume } = addAudioVolumeControlToStream(mediaStream);
        (window as any).setVolume = setVolume;
        await this.audioVideo.startContentShare(stream);
        break;
      }
    }

Rebuild and relaunch as before, and then share a tab with audio as seen in the example below:

As with the microphone input you transformed earlier, you can now use window.setVolume to adjust the volume of whatever is playing in your shared tab.

For more details on how to test these changes, see Testing on a single machine later in this post.

Bringing it all together: ducking content share during speech

With the code you just added you are able to adjust the volume of the content share audio stream. To implement ducking this adjustment needs to be triggered in real time as the user speaks.

The AudioVideoFacade interface exposes a real-time observer interface. You can use realtimeSubscribeToVolumeIndicator with an attendee ID to monitor that attendee’s input volume. This is the same observer that the demo app uses to show who is speaking.

Using the user’s own attendee ID gives us a neat interface for speech detection without having to monitor the microphone input. You can implement a similar approach by adding an AnalyserNode to the input graph — this is how the lobby view shows the animated microphone input preview.

If you use Amazon Voice Focus via a VoiceFocusTransformDevice on the user’s microphone, the volume should only represent human speech, because Amazon Voice Focus is designed to reduce environmental noise including most non-speech audio.

Calling the setVolume function with a very short ramp time to turn the volume down, and with a longer ramp time to restore the original volume, strikes a balance between response time and smoothness.

The following method does all of this, using the start and stop events on content share to enable and disable the behavior.

/**
 * Use the volume of the speaker to reduce the volume of content share.
 */
private configureDucking(setContentVolume: ((vol: number, rampSec?: number) => void)): void {
  const callback = async (
    _attendeeId: string,
    speakerVolume: number | null,
    _muted: boolean | null,
    _signalStrength: number | null
  ): Promise<void> => {
    if (speakerVolume > 0.1) {
      setContentVolume(0.1, 0.05);
    } else {
      setContentVolume(1.0, 0.5);
    }
  };
  
  const me = this.meetingSession.configuration.credentials.attendeeId;

  const observer: ContentShareObserver = {
    contentShareDidStart: () => {
      this.audioVideo.realtimeSubscribeToVolumeIndicator(me, callback);
    },

    contentShareDidStop: () => {
      this.audioVideo.realtimeUnsubscribeFromVolumeIndicator(me, callback);
      this.audioVideo.removeContentShareObserver(observer);
    },
  };

  this.audioVideo.addContentShareObserver(observer);
}

You can use this new method to reimplement contentShareStart in the meeting demo:

private async contentShareStart(videoUrl?: string): Promise<void> {
  const startAndDuck = async (mediaStream: MediaStream) => {
    const { stream, setVolume } = addAudioVolumeControlToStream(mediaStream);
    if (setVolume) {
      // This won't be set if this is a silent video stream.
      this.configureDucking(setVolume);
    }
    await this.audioVideo.startContentShare(stream);

    this.toggleButton('button-content-share', 'on');
    this.updateContentShareDropdown(true);
  };

  switch (this.contentShareType) {
    case ContentShareType.ScreenCapture: {
      const contentShareMediaStreamBroker = new ContentShareMediaStreamBroker(this.meetingLogger);
      const mediaStream = await contentShareMediaStreamBroker.acquireScreenCaptureDisplayInputStream();
      return startAndDuck(mediaStream);
    }

    case ContentShareType.VideoFile: {
      const videoFile = document.getElementById('content-share-video') as HTMLVideoElement;
      if (videoUrl) {
        videoFile.src = videoUrl;
      }
      return startAndDuck(await this.playToStream(videoFile));
    }
  }
}

With this code in place, content share volume will automatically be ducked when the user speaks. Here is a demo I recorded using this code.

Credit: NASA’s Goddard Space Flight Center (the SDO Team, Genna Duberstein and Scott Wiessinger, Producers)

Testing on a single machine

It is possible to test this scenario on a single computer, but it requires some preparation and two audio output devices. Built-in speakers and a USB or Bluetooth headset are enough.

You will use two windows in Google Chrome so that you can choose different audio devices. One will be for the presenter, and one will be for the listener. You will mute your built-in audio device so that you can clearly hear what’s happening, and also to prevent feedback (“howling”)

Setting up your audio output devices

Connect your headset to your computer. On a Mac you won’t be able to use wired headphones because the OS disables the speakers automatically when headphones are plugged into the headphone jack. A USB headset, an external audio device like a Thunderbolt dock, or Bluetooth headphones will work.

After connecting the device, make sure that your Mac’s built-in speakers are the default device.

On macOS you can do this by clicking the volume indicator in the menu bar and making sure “Internal Speakers” is checked. Click the device name to switch if needed.

A screenshot of the macOS sound device picker

On Windows, open Settings > System > Sound and choose your output device from the drop-down menu.

A screenshot of the Windows sound device picker

Setting up Google Chrome

Open two windows and position them next to each other. I put the presenter on the left and the listener on the right. Open your demo app twice: once in each window. Don’t join the meeting yet.

In the presenter’s window, open a new tab for content share. I used a NASA video, but rhythmic music works well too. Start the video playing. Make sure you can hear the audio through your speakers.

Muting the built-in audio output

Now turn down the volume on your default audio device. In most cases you can just hit the Mute key, or you can drag the volume slider to zero. This will prevent howling and allow you to only hear the content share audio on the listener, rather than hearing it through multiple devices at the same time.

Join the meeting

In the listener’s demo app tab, join the meeting and mute yourself by clicking the microphone at the top. Click the drop-down next to the speaker icon and choose your headset or headphones as output.

In the presenter’s demo app tab, join the same meeting, but leave the audio output as the default, which will be your muted built-in speakers.

When you speak, the unmuted microphone input in the presenter will pick up your voice, and the unmuted headphone output in the listener will allow you to hear your own voice in your headphones.

Now share content in the presenter window by clicking the Camera button. In the dialog box that appears, choose the tab that is playing video and audio, click the “Share audio” checkbox in the bottom left, and then click Share in the bottom right.

You should now still not hear anything from your built-in speakers, but you will hear both your voice and the previously muted content share audio through your headphones. When you speak, the content share audio will temporarily lower in volume.

Cleaning up

Remember to end your demo meeting by clicking the red button in the top right in either demo window. You can close the tabs you opened, disconnect your headphones, and unmute your speakers.

If you deployed the demo serverless application while following this post, remember to delete the CloudFormation stack to avoid incurring future costs.

Conclusion

In this blog post, I showed you how to define audio transforms for microphone input using the Web Audio API, and how to extend those transforms to apply to content share audio. You integrated both of these in the meetingV2 demo, and then went further, implementing ducking by monitoring speech volume using the real-time APIs available in the Amazon Chime SDK. Finally, you tested this by using two audio devices with two browser windows.

Now you can extend these capabilities into your own applications to build volume controls, as well as explore other kinds of transforms — panner nodes for spatial audio, distortion, reverb, filtering, and compression, and even generating audio programmatically. The MDN Web Audio API documents offer an excellent overview of what Web Audio can do.

You can learn more about audio transform devices and about Amazon Voice Focus, which uses these interfaces and technologies to implement noise suppression, in the Amazon Chime SDK documentation. Please reach out to us via the comments or via GitHub issues if you have questions or suggestions.

Author bio

Richard Newman is a Principal Engineer on the Amazon Chime team. He has a background in telephony and building web browsers.

Business Productivity