IBM & Red Hat on AWS

IBM Granite Code Models can now be deployed in Amazon Bedrock and Amazon SageMaker

IBM has open-sourced its Granite Code Models, which are now available in Amazon SageMaker JumpStart, and can also be deployed in Amazon Bedrock using Custom Model Import. These models provide developers and data scientists with generative AI capabilities for code generation.

IBM Granite AI foundation models are designed to be cost-efficient and enterprise-ready. These models are decoder-only, generating new text based on input. They are trained on a diverse dataset of code from 116 programming languages.

Granite Code models perform well across various code-related tasks, including generation, explanation, fixing, editing, and translation. Trained on license-compliant data, they follow IBM’s AI ethics principles and legal standards, ensuring responsible use. These models emphasize trust and transparency, providing reliable solutions for enterprise software development.

Choosing the optimal model depends on the use case, cost, and performance needs. Larger parameter models, like the 20B and 34B, are better suited for complex tasks and offer higher accuracy. For example, the 20B model is often used for COBOL. Models like the 3B and 8B, with a larger 128K-token context window, handle larger code blocks. The 8B model is commonly used for tasks such as general code completion and documentation.

The following models are available on AWS:

  • Granite-3B-Code-Instruct-128K: A 3-billion-parameter model specifically designed for solving long-context problems up to 128K tokens, making it ideal for coding-related tasks and building AI assistants.
  • Granite-8B-Code-Instruct-128K: An 8-billion-parameter model exposed to both short and long context data. It aims to enhance long-context capability while maintaining strong code generation performance for short input contexts, making it highly versatile for a range of tasks.
  • Granite-20B-Code-Instruct-8K: A 20-billion-parameter model optimized for logical reasoning and problem-solving capabilities.
  • Granite-34B-Code-Instruct-8K: A 34-billion-parameter model intended for creating sophisticated coding assistants capable of tasks such as writing, debugging, and code comprehension.

Deploy IBM Granite Code models with Amazon Bedrock through Custom Model Import

Amazon Bedrock is a fully managed service offering access to foundation models (FMs) from leading AI providers. It includes a comprehensive set of tools for building generative AI applications, with security, privacy, and responsible AI features.

Customers can now use IBM Granite models in Amazon Bedrock, the easiest place to build and scale generative AI applications with foundation models. Through Amazon Bedrock’s Custom Model Import, customers can import Granite models and use it alongside other foundation models through a single, unified API without the overhead of model lifecycle and infrastructure management. This flexibility gives you more value out of your model customization efforts and seamlessly integrates Granite models into your applications built in Amazon Bedrock.

Foundation models can be imported from Amazon SageMaker, or Amazon Simple Storage Service (Amazon S3). When using Amazon S3, customers download the model weights, upload them to an S3 bucket, and then import them into Bedrock. For detailed instructions on creating a custom model through the import process, refer to the Amazon Bedrock documentation.

Once imported, you can test your model in the Amazon Bedrock Text playground (Figure 1).

Screenshot shows IBM Granite Code model being tested on Amazon Bedrock Playground.

Figure 1. IBM Granite code model tested in Amazon Bedrock Text playground.

You can also run inference on your imported Granite Code model through InvokeModel or InvokeModelWithResponseStream APIs. For detailed instructions on submitting a single prompt with the InvokeModel API operations, refer to the Amazon Bedrock documentation.

Deploy IBM Granite Code models on Amazon SageMaker

Amazon SageMaker is a fully managed service that brings together a broad set of tools to enable high-performance, low-cost machine learning (ML) for any use case. With SageMaker, you can build, train and deploy ML models, including foundation models, at scale using tools like notebooks, debuggers, profilers, pipelines, MLOps, and more – all in one integrated development environment (IDE).

Amazon SageMaker JumpStart is a ML hub that provides easy access to pre-trained models, including foundation models for generative AI tasks like content creation, and code generation. It simplifies the evaluation, fine-tuning, and deployment of models.

In Amazon SageMaker Studio, you can easily access SageMaker JumpStart to discover and deploy the IBM Granite Code models. Simply search for IBM in the SageMaker JumpStart interface (Figure 2).

Screenshot shows how to search for IBM Granite Code models on SageMaker JumpStart.

Figure 2. Search for IBM Granite Code models on SageMaker JumpStart.

Choose the IBM tile to view the available models, select the Granite Code model you want to use, and choose deploy (Figure 3).

ScreenShot shows how to deploy IBM Granite Code model from SageMaker JumpStart.

Figure 3. Deploy IBM Granite Code model from SageMaker JumpStart.

Refer to the SageMaker JumpStart documentation for more details on deploying a foundation model in SageMaker Studio.

Code example

Once your IBM Granite Code Model is deployed, take note of the inference endpoint name, you will use it to test the Granite mode by invoking its SageMaker inference endpoint.

The following Python code example demonstrates how to invoke the IBM Granite Code model endpoint:

import json
import argparse
import sagemaker
from sagemaker import Session
from sagemaker.predictor import Predictor

# Set up command line argument parsing
parser = argparse.ArgumentParser(description='Send a question to the SageMaker inference endpoint.')
parser.add_argument('question', type=str, help='The question to ask the code assistant.')
parser.add_argument('endpoint', type=str, help='The SageMaker endpoint name.')
args = parser.parse_args()

# Create a SageMaker session
sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
endpoint_name = args.endpoint

# Construct the prompt
prompt = f"""Using the directions below, generate Python code for the specified task.
Question:# Write a Python function that prints 'Hello World!' string 'n' times.

Answer:
def print_n_times(n):
    for i in range(n):
        print("Hello World!")

<end of code>

Question:
{args.question}

Answer:"""

try:
    # Create a predictor object
    predictor = Predictor(endpoint_name=endpoint_name, sagemaker_session=sagemaker_session)

    # Prepare the payload for inference
    payload = {
        "inputs": prompt.strip(),
        "parameters": {
            "do_sample": True,
            "top_p": 0.6,
            "temperature": 0.1,
            "top_k": 50,
            "max_new_tokens": 1000,
            "repetition_penalty": 1.03,
            "stop": ["<end of code>"],
        }
    }

    # Send the payload to the endpoint and decode the response
    response = predictor.predict(
        data=json.dumps(payload),
        initial_args={"Accept": "application/json", "ContentType": "application/json"},
    ).decode("utf-8")

    # Extract the generated text
    generated_text = json.loads(response)["generated_text"]
    print(generated_text)

except sagemaker.exceptions.SageMakerError as sm_err:
    print(f"SageMaker error occurred: {sm_err}")
except json.JSONDecodeError as json_err:
    print(f"JSON decoding error: {json_err}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Review the Identity and Access Management (IAM) documentation for Amazon SageMaker to set up the required permissions for running the sample script. Ensure that you apply least-privilege permissions by granting only what is needed to execute the script. The following image shows the expected output from the Python script:

Screenshot shows how to execute a Python script to invoke your Amazon SageMaker Real-time inference with IBM Granite Code Model.

Figure 4. Invoking your Amazon SageMaker Real-time inference with IBM Granite Code Model.

Summary

In this post, we’ve shown how to deploy and use IBM Granite Code foundation models in both Amazon Bedrock and Amazon SageMaker.

When choosing between Amazon Bedrock and Amazon SageMaker for generative AI, consider your specific needs. Bedrock is ideal for using pre-trained foundation models with minimal setup, while SageMaker offers greater control for building, training, and customizing models.

Now that you’re aware of the availability of IBM Granite Code models on AWS, we encourage you to deploy Granite Code models in SageMaker Studio to test its capabilities and explore the responses firsthand.

For more information, refer to the following resources:

Visit the AWS Marketplace for IBM watsonx AI solutions on AWS:

Additional Content:

Eduardo Monich Fronza

Eduardo Monich Fronza

Eduardo Monich Fronza is a Partner Solutions Architect at AWS. His experience includes Cloud, solutions architecture, application platforms, containers, workload modernization, and hybrid solutions. In his current role, Eduardo helps AWS partners and customers in their cloud adoption journey.

Karan Sachdeva

Karan Sachdeva

Karan is the Global Business Development Leader for Strategic Partnerships at IBM, where he designs and executes transformative sales and partner programs, including IBM and AWS.

Marc Karp

Marc Karp

Marc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Sammy Amirghodsi

Sammy Amirghodsi

Siamak (Sammy) Amirghodsi is a Principal Architect at AWS. His experience includes multi-cloud architecture, HPC, Regulatory Data management, Capital Markets and Quantum Computing. In his current role, Sammy helps AWS customers and scientists with High performance computing (HPC), cryptography and GenAI implementation needs in their cloud adoption journey.

Vincent Nelis

Vincent Nelis

Vincent Nelis is a Senior Product Manager at IBM, specializing in AI and machine learning technologies. He completed his Ph.D. in computer science at the age of 25, with expertise in statistics, scheduling theory, and AI. His career spans both academia and industry, including roles as an academic researcher, software developer, DevOps, data scientist, and team leader. At IBM, Vincent contributes to product management for the AI Portfolio, focusing on Generative AI and Traditional ML products such as watsonx.ai and watsonx.gov.

Vishwani Dua

Vishwani Dua

Vishwani Dua is the CTO for the IBM AWS Partnership at IBM, where she is responsible for driving the technical strategy across IBM & AWS portfolios and building innovative solutions for our clients. She has been building AI solutions for over 12 years and has held technical leadership roles across Development, Pre & Post-Sales, Technical Sales, Operations.