Amplify CLI announces new GraphQL transform feature for orchestrating multiple AI/ML use cases

The Amplify Framework is an open source project for building cloud-enabled mobile and web applications.

The launch of the Predictions category in the Amplify Framework, a few months ago, enabled developers to easily add AI/ML use cases to their web applications. Use cases such as translate text from one language to another, generated speech from text, and others can be achieved using the Predictions category with few lines of code. No machine learning experience is required. As we see exciting use cases being built with this feature we see a pattern where multiple actions are being combined to achieve a more powerful scenario. For example, you want to translate text from one language to another and then have the translated text be spoken in a target language. This can be achieved using the Predictions library today by making call to individual actions separately.

Today, we are happy to share that Amplify now makes it simple to orchestrate multiple AI/ML chained actions by using a @predictions directive on the query field of your GraphQL schema. The Amplify CLI sets up the backend, policies, and configuration needed for all actions in the predictions directive without you needing to configure each one. You can then use the Amplify API category library to invoke a single GraphQL query operation to get the result of chained inference calls. This simplifies your code and reduces the number of steps needed to achieve orchestration of multiple chained actions both in your frontend and backend.

In this blog, we use the @predictions directive and the Amplify API library to build a react app that can first identify text from an image (English), then translate the text to another language (Spanish), and finally convert the translated text to speech (Spanish) with one simple directive on the query field of your GraphQL schema. The directive will set up a GraphQL API endpoint with HTTP data source endpoints for Amazon AI/ML services corresponding to individual actions such as Amazon Rekognition for identifying text, Amazon Translate for translating text, and finally Amazon Polly for generating speech. In addition, it sets up the IAM policies for each service and AppSync VTL function.

The sequence of actions supported today are:

IdentifyText –> TranslateText –> ConvertTextToSpeech
IdentifyLabels –> TranslateText –> ConvertTextToSpeech
TranslateText –> ConvertTextToSpeech

In addition, these actions can be called individually.

Here is how a sample flow for actions IdentityText followed by TranslateText looks at a high level:

The application UI we are building looks like below:

The upload action in both cases will store the images in S3 bucket that is provisioned from the Amplify CLI.

Prerequisites

Install Node.js and npm if they are not already installed on your machine.
Note: At the time of writing this blog the minimum version of Node.js required is >= 8.x and for npm >= 5.x

Install the and configure the Amplify CLI.

$ npm install -g @aws-amplify/cli
$ amplify configure

The configure step will guide you through steps for creating a new IAM user. Select all default options. If you already have the CLI configured, you do not need to run the configure command again.

Initialize the project

For this blog, let’s say you have a React application. If you do not have one you can create one using the following command:

$ npx create-react-app coolapp
$ cd coolapp

From the root of your project folder run the command and accept defaults where applicable as shown:

$ amplify init
? Enter a name for the project: coolapp
? Enter a name for the environment: dev
? Choose your default editor: Visual Studio Code
? Choose the type of app that you're building: javascript
? What javascript framework are you using: react
? Source Directory Path:  src
? Distribution Directory Path: build
? Build Command:  npm run-script build
? Start Command: npm run-script start
? Do you want to use an AWS profile? Yes
? Please choose the profile you want to use: default

Using the @predictions directive in your GraphQL schema

The predictions directive is added to the query field in your GraphQL schema. The Amplify CLI provisions the backend resources for the actions mentioned in the directive. In our sample, we want to identity text in an image, translate it to another language, and then convert that the translated text to speech.

First, let us add an API to our backend using the amplify add api command. This will create a GraphQL endpoint which will communicate with the HTTP endpoints for services corresponding to each action mentioned in our directive.

$ amplify add api
? Please select from one of the below mentioned services: GraphQL
? Provide API name: coolapp
? Choose the default authorization type for the API API key
? Enter a description for the API key: predictionstest
? After how many days from now the API key should expire (1-365): 7
? Do you want to configure advanced settings for the GraphQL API No, I am done.
? Do you have an annotated GraphQL schema? No
? Do you want a guided schema creation? No
? Provide a custom type name MyType

GraphQL schema compiled successfully.

Edit your schema at /Users/nikdabn/tests/predictivetest/coolapp/amplify/backend/api/coolapp/schema.graphql
Successfully added resource coolapp locally

Edit your schema.graphql file to add the predictions directive as shown below.

type Query {
    speakTranslatedImageText: String @predictions(actions: [ identifyText translateText convertTextToSpeech ])
    speakLabels: String @predictions(actions: [ identifyLabels convertTextToSpeech ])
}

As you can see, the actions can be added in the directive and they will be executed in the order you mention them.

Once you have updated your schema file, run the gql-compile command to make sure it is correct:

$ amplify api gql-compile

You can also invoke single actions using the predictions directive. For e.g. if you wanted to do only text translate you can add the following line in the GraphQL schema under query.

translateThis: String @predictions(actions: [ translateText ])

Add Storage

We add storage to store the images that we use to identify text from. Run the following command from the terminal:

$ amplify add storage
? Please select from one of the below mentioned services: Content (Images, audio, video, etc.)
? You need to add auth (Amazon Cognito) to your project in order to add storage for user files. Do you want to add auth now? Yes
 Do you want to use the default authentication and security configuration? Default configuration
 How do you want users to be able to sign in? Username
 Do you want to configure advanced settings? No, I am done.
Successfully added auth resource

? Please provide a friendly name for your resource that will be used to label this category in the project: s3a7bced65
? Please provide bucket name: mypredictionsbucket
? Who should have access: Auth and guest users
? What kind of access do you want for Authenticated users? create/update, read, delete
? What kind of access do you want for Guest users? create/update, read, delete
? Do you want to add a Lambda Trigger for your S3 Bucket? No

Push your changes to the cloud

$ amplify push

The push command will provision the backend in the cloud.

Making query calls from your client application

First, we install the dependencies by running the following commands from the root of your application folder.

$ npm install aws-amplify

Replace the code in your src/App.js:

import React, { useState } from 'react';
import './App.css';
import API, { graphqlOperation } from '@aws-amplify/api';
import Amplify, { Storage } from 'aws-amplify';
import awsconfig from './aws-exports';
import { speakTranslatedImageText,speakLabels } from './graphql/queries';

/* Configure Exports */
Amplify.configure(awsconfig);

function SpeakTranslatedImage() {
  const [ src, setSrc ] = useState("");
  const [ img, setImg ] = useState("");
  
  function putS3Image(event) {
    const file = event.target.files[0];
    Storage.put(file.name, file)
    .then (async (result) => {
      setSrc(await speakTranslatedImageTextOP(result.key))
      setImg(await Storage.get(result.key));
    })
    .catch(err => console.log(err));
  }


  return (
    <div className="Text">
      <div>
        <h3>Upload Image</h3>
        <input
              type = "file" accept='image/jpeg'
              onChange = {(event) => {
                putS3Image(event)
              }}
          />
        <br />
        { img && <img src = {img}></img>}
        { src && <div> <audio id="audioPlayback" controls>
        <source id="audioSource" type="audio/mp3" src = {src} />
      </audio> </div> }
      </div>
    </div>
  );
}

function SpeakLabels() {
  const [ src, setSrc ] = useState("");
  const [ img, setImg ] = useState("");
  
  function putS3Image(event) {
    const file = event.target.files[0];
    Storage.put(file.name, file)
    .then (async (result) => {
      setSrc(await speakLabelsOP(result.key))
      setImg(await Storage.get(result.key));
    })
    .catch(err => console.log(err));
  }

  return (
    <div className="Text">
      <div>
        <h3>Upload Image with Labels</h3>
        <input
              type = "file" accept='image/jpeg'
              onChange = {(event) => {
                putS3Image(event)
              }}
          />
        <br />
        { img && <img src = {img}></img>}
        { src && <div> <audio id="audioPlayback" controls>
        <source id="audioSource" type="audio/mp3" src = {src} />
      </audio> </div> }
      </div>
    </div>
  );
}

async function speakTranslatedImageTextOP(key) {
  const inputObj = { translateText: { sourceLanguage: "en", targetLanguage: "es" },
   identifyText: { key },
   convertTextToSpeech: { voiceID: "Mia" } };
  const response = await API.graphql(graphqlOperation(speakTranslatedImageText, { input: inputObj }));
  console.log(response.data);
  return response.data.speakTranslatedImageText;
}

async function speakLabelsOP(key) {
  const input = { identifyLabels: { key }, convertTextToSpeech: { voiceID: "Geraint" } }
  const response = await API.graphql(graphqlOperation(speakLabels, { input }));
  console.log(response);
  return response.data.speakLabels;
}

function App() {
  return (
    <div className="App">
        <h1>Speak Translated Image</h1>
        < SpeakTranslatedImage />
        <h1>Speak Labels</h1>
        < SpeakLabels />
    </div>
  );
}

export default App;

Let’s take a look at the important pieces here:

async function speakTranslatedImageTextOP(key) {
  const inputObj = { translateText: { sourceLanguage: "en", targetLanguage: "es},
   identifyText: { key },
   convertTextToSpeech: { voiceID: "Mia} };
  const response = await API.graphql(graphqlOperation(speakTranslatedImageText, { input: inputObj }));
  console.log(response.data);
  return response.data.speakTranslatedImageText;
}

As you can see, we are calling an API GraphQL query using the Amplify API library. We pass in an input object to the API call which contains:

The translate text action which specifies source and target language which in this case is English and Spanish respectively.
The identify text action which passes in the key of the S3 bucket which contains the image to be used to identify text.
The convert to speech action which contains the voice ID of the speaker “Mia”. The list of supported voice in different languages can be found here.

Save the file. Start the app from the terminal by running the command:

$ npm start

This will open up the application:

Next we choose a picture that has text in it and upload it for the first use case:

The image is uploaded to S3 bucket that was provisioned by the CLI. Thereafter, a call to the HTTP endpoint for Amazon Rekognition through the GraphQL endpoint is made to identify text in the image. Next, a similar HTTP call is made to Amazon Translate to translate the call to Spanish. Finally, a call is made to Amazon Polly to convert the translated text to speech.

When you click on the play button below the image, it plays the audio for translated text in Spanish saying “bienvenida”.

Next, let’s test the use case of speaking labels in an image.

Upload the following image in the “Speaks Label” section:

When you click play, it identifies the following labels in the image and plays the audio in the voice of the selected speaker: “outdoor”, “nature”, “water” “boat”, “vehicle”, ”transportation”, “rowboats”, “mountains”, “scenery”, and “landscape”.

Conclusion

We were able to use the predictions directive on the query fields of a GraphQL schema and invoke multiple chained AI/ML actions with a single API query call. The Amplify CLI provisioned the GraphQL API, set up HTTP endpoint as data source, IAM policies to interact with the endpoint, and AppSync VTL resolvers needed for individual actions with respective services.

Feedback

We hope you like these new features! Let us know how we are doing, and submit any feedback in the Amplify Framework Github Repository. You can read more about this feature on the Amplify Framework website.