Field Notes: Inference C++ Models Using SageMaker Processing

Machine learning has existed for decades. Before the prevalence of doing machine learning with Python, many other languages such as Java, and C++ were used to build models. Refactoring legacy models in C++ or Java could be forbiddingly expensive and time consuming. Customers need to know how they can bring their legacy models in C++ to the cloud, so that they can run model inference faster and at a lower cost.

Amazon SageMaker Processing is a new capability of Amazon SageMaker for running processing and model evaluation workloads with a fully managed experience. Amazon SageMaker Processing enables customers to run analytics jobs for data engineering and model evaluation on Amazon SageMaker easily, and at scale. SageMaker Processing allows customers to enjoy the benefits of a fully managed environment with all the security and compliance built into Amazon SageMaker.

In this blog post, we demonstrate inferencing a C++ model using SageMaker Processing. We first explain the C++ program we use to represent a simple linear regression model, and the Python script we use to run inference. Then, we build a custom container that contains the C++ model and Python script. Lastly, we run a SageMaker ScriptProcessor job for inference. The code from this post is available in the GitHub repo.

Prerequisites

To run this code, you need to have permissions to access Amazon S3, push a Docker image to Amazon ECR, and create SageMaker Processing jobs.

Prepare a C++ Model

We use a simple C++ test file for demonstration purposes. This C++ program accepts input data as a series of strings separated by a comma. For example, “2,3“ represents a row of input data, labeled 2 and 3 in two separate columns.

We use a simple linear regression model y=x1 + x2 in this blog post for demonstration purposes. Customer can modify the C++ inference code to inference more realistic and sophisticated models. The C++ code is made up of the following steps:

Receives data record for inferencing from C++ command line parameters.
Parses out data columns and stores data in a C++ vector. We use “,” to separate data columns.
Loops through data columns and calculates the sum.
Prints out the result to standard output stream.

We can compile the C++ program to an executable file using g++. The complete C++ script is shown in the following code:

#include <sstream>
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
using namespace std;

void print(std::vector<int> const &input)
{
    for (int i = 0; i < input.size(); i++)
    {
        std::cout << input.at(i);
        if (i!=input.size()-1)
            cout<< ',';
    }
}


std::vector<std::string> split(const std::string& s, char delimiter)
{
   std::vector<std::string> tokens;
   std::string token;
   std::istringstream tokenStream(s);
   while (std::getline(tokenStream, token, delimiter))
   {
      tokens.push_back(token);
   }
   return tokens;
}


int main(int argc, char* argv[])
{
    vector<int> result;
    int counter = 0;
    int result_temp = 0;
    
    //assuming one argv
    string t1(argv[1]);
    vector<string> temp_str = split(t1, ',');
    vector<string>::iterator pos; 

    for (pos = temp_str.begin(); pos < temp_str.end(); pos++)
    {
        int temp_int;
        istringstream(*pos) >> temp_int;
        
        if (counter == 0)
        {
            result_temp += temp_int;
            counter++;
            continue;
        }
        if (counter == 1)
            result_temp += temp_int;
            result.push_back(result_temp);
            result_temp = 0;
            counter = 0;
    }    
    print(result);
    return 0;
}

Create a SageMaker Processing script

This notebook uses the ScriptProcessor class from the Amazon SageMaker Python SDK. The ScriptProcessor class runs a Python script with your own Docker image that processes input data, and saves the processed data in Amazon S3. For more information, review Run Scripts with Your own Processing Container.

When the processing job starts, the data files are automatically downloaded by SageMaker from S3 to the designated local directory in the processing compute instance.

Your Python script, process_script.py, first finds all data files under /opt/ml/processing/input/ directory. By default, when you use multiple instances, the data files from S3 are duplicated to each processing compute instance. That means every instance gets the full dataset. By setting s3_data_distribution_type='ShardedByS3Key' , each instance gets approximately 1/n of the number of total input date files, where n is the number of compute instances. For more effective parallel processing, partition input data into multiple files to help ensure each node processes a different set of input data.

The Python script reads each data file into memory and converts it into a long string ready for C++ executable to consume. The subprocess module from Python runs the C++ executable and connects to output and error pipes. The output is saved as a CSV file to /opt/ml/processing/output directory. Upon completion, SageMaker Processing uploads output files in this directory from every Processing instance to Amazon S3.

def call_one_exe(a):
    p = subprocess.Popen(["./a.out",
 a],stdout=subprocess.PIPE)
    p_out, err= p.communicate()
    output = p_out.decode("utf-8")
    return output.split(',')

if __name__=='__main__':
    parser = argparse.ArgumentParser()
    #user can pass their own argument from Processor. 
    
    args, _ = parser.parse_known_args()
    print('Received arguments {}'.format(args))
    
    files = glob('/opt/ml/processing/input/*.csv')
    for i, f in enumerate(files):
        try:
            print(f)
            data = pd.read_csv(f, header=None)
            string = str(list(data.values.flat)).replace(' ','')[1:-1]
            predictions = call_one_exe(string)
            output_path = os.path.join('/opt/ml/processing/output', str(i)+'_out.csv')
            print('Saving training features to {}'.format(output_path))
            pd.DataFrame({'results':predictions}).to_csv(output_path, header=False, index=False)
        except Exception as e:
            print(str(e))

Build your own SageMaker Processing container

The processing container is defined as shown in the following image. We have Anaconda and Pandas installed into the container. a.out is the C++ executable that contains the model inference logic. process_script.py is the Python script we use to call C++ executable and save results. We explain more about the C++ program and process_script.py in a later paragraph. Now let us build the Docker container and push it to Amazon ECR. The Dockerfile looks like the following code:

FROM ubuntu:16.04

RUN apt-get update && \
    apt-get -y install build-essential libatlas-dev git wget curl 

RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    bash Miniconda3-latest-Linux-x86_64.sh -bfp /miniconda3 && \
    rm Miniconda3-latest-Linux-x86_64.sh

ENV PATH=/miniconda3/bin:${PATH}

RUN conda update -y conda && \
    conda install -c anaconda scipy

# Python won’t try to write .pyc or .pyo files on the import of source modules
# Force stdin, stdout and stderr to be totally unbuffered. Good for logging
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 PYTHONIOENCODING=UTF-8 LANG=C.UTF-8 LC_ALL=C.UTF-8

RUN pip install --no-cache -I scikit-learn==0.20.0 pandas==1.0.3 boto3 sagemaker retrying
ADD process_script.py /
ADD a.out /

Set up the `ScriptProcessor` and run your script

We have 10 sample data files included in this demo. Each file contains 5000 rows of arbitrarily generated data. We first upload these files to Amazon S3. We use one ml.c5.xlarge instance for inference. You can increase the number of instance counts for a bigger dataset. Amazon SageMaker Processing runs the script in similar way as the following command, where EntryPoint is process_script.py and ImageUri is the Docker image we built earlier.

docker run --entry-point [EntryPoint] [ImageUri]

The SageMaker Processing job is set up as following,

role = get_execution_role()
script_processor = ScriptProcessor(command=['python3'],
                image_uri=Account_number + '.dkr.ecr.us-east-1.amazonaws.com/cpp_processing:latest',
                role=role,
                instance_count=1,
                base_job_name = 'run-exe-processing',
                instance_type='ml.c5.xlarge')
output_location = os.path.join('s3://',default_s3_bucket, 'processing_output')
script_processor.run(code='process_script.py',
                     inputs=[ProcessingInput(
                        source=input_data,
                        destination='/opt/ml/processing/input')],
                      outputs=[ProcessingOutput(source='/opt/ml/processing/output',
                                               destination=output_location)]
                    )

After the processing job starts, Amazon SageMaker displays job progress. Information such as Job Name, input and output locations are reported. Upon completion, we can review a few rows of the output to make sure that the processing job was successful.

print('Top 5 rows from 1_out.csv')
!aws s3 cp $output_location/0_out.csv - | head -n5

Conclusion

In this post, we used Amazon SageMaker Processing to run inference on C++ models. Customers can bring legacy C++ models to SageMaker for faster inference at a lower cost. For more information, review Amazon SageMaker Processing.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

AWS Architecture Blog