AWS Partner Network (APN) Blog

Best Practices from Quantiphi for Unleashing Generative AI Functionality by Fine-Tuning LLMs

By Neelesh Kumar Yadav, AI/ML Architect – Quantiphi
By Stephanie Pace, AI/ML Practice Leader – Quantiphi
By Matt Chastain, Sr. Solutions Architect, AWS

Connect with Quantiphi-1

Generative artificial intelligence (AI) is rapidly evolving our technology landscape. For enterprises exploring generative AI use cases, experimenting with fine-tuned large language models (LLMs) can have a profound impact on results. Fine-tuning isn’t just a technique—it’s the key to harnessing the full potential of LLMs and adapting them to the unique business demands of various industries.

In healthcare, for example, fine-tuning LLMs enables the creation of personalized treatment plans that can change lives. In finance, it’s the compass guiding precise market predictions, empowering investors to make informed decisions. Legal professionals discover a source of efficiency in their capacity to expedite contract reviews, paving the way for a more productive and promising future.

In this post, Quantiphi unveils the art of supercharging LLMs to deliver domain-specific outcomes that redefine possibilities. Imagine offering not just personalized diagnostics but revolutionizing market trend predictions and expediting contract reviews by automating the identification of crucial clauses and potential risks.

Quantiphi, a category-defining analytics, machine learning, and cloud modernization company, is an AWS Premier Tier Services Partner and AWS Marketplace Seller with the Generative AI Software Competency.

The Vital Need for Fine-Tuning Large Language Models

Traditional fine-tuning methods have been resource-intensive and time-consuming. However, with Parameter Efficient Fine-Tuning (PEFT), an open-source library from Hugging Face, adapting pre-trained LLMs on AWS for various applications becomes a streamlined journey.

Fine Tuning

Figure 1 – Domain specific fine-tuning.


Fine-tuning large language models offers several distinct advantages:

  • Domain-specific expertise: By fine-tuning the LLM on specific data related to a particular domain or industry, the model gains a deeper understanding of the domain’s jargon, terminology, and context. This leads to precise and pertinent responses customized to the specific domain, like a fine-tuned model that can assist legal professionals by reviewing contracts, identifying potential risks and suggesting legal language adjustments.
  • Faster inference: Fine-tuned models require fewer parameters, resulting in lower memory consumption and quicker inference times. This optimizes speed in real-time applications and large-scale deployments.
  • Enhanced efficiency: Fine-tuning allows customers to focus on specific aspects of their model, reducing the need for extensive training from scratch. This results in significant time and cost savings within the development process.
  • Data privacy and security: Fine-tuning in-house data ensures sensitive information remains within the organization. This reduces the risk of exposure to proprietary or sensitive data to external parties.
  • Adaptability: Fine-tuned models can be rapidly updated and retrained with new data to remain current with domain trends. This ensures ongoing performance enhancement.


The fine-tuning LLMs is borne with its own set of challenges, however. These challenges can be broadly classified into categories such as data, technical, and resource-related.

  • Training data: Fine-tuning models requires a large amount of diverse and high-quality training data to produce accurate and reliable results. Acquiring and curating such datasets can be time-consuming and resource-intensive.
  • Model complexity: Generative AI models are complex and resource-intensive during training and deployment, resulting in elevated computational expenses and extended processing times.
  • Ethical challenges: Fine-tuning models may inadvertently generate sensitive or personally identifiable information (PII), leading to potential privacy violations. There’s a risk of the model generating inaccurate or biased content, which could influence decision-making or propagate misinformation. It can also be misused to create fake documents, signatures, or identities, leading to potentially fraudulent activities. As such, we must dive deep into ethical considerations with commitment to responsible AI deployment.

AWS provides a powerful platform with various tools to help overcome these challenges effectively. It provides easy access to scalable and managed infrastructure, allowing for seamless selection of instance type, number of instances, and distributed training. AWS allows users to deploy models easily using endpoints and provides automatic scaling and load balancing.

Quantiphi, which also holds the Amazon SageMaker specialization, insists on prioritizing diverse and representative data during training, and implements bias detection techniques. It enables users to provide the synthetic data needed to train and enhance LLMs’ capabilities, overcoming challenges associated with real data limitations, privacy concerns, and specialized domain requirements.

Process of Fine-Tuning LLMs

Fine-tuning a large language model on AWS involves several steps. Below is the detailed process:

Reference Architecture

Figure 2 – Fine-tuning workflow.

  • Set up AWS environment: Setting up AWS services for fine-tuning a large language model involves multiple steps, like creating or configuring your AWS account and IAM role configuration with necessary permissions for Amazon SageMaker to access other AWS services. Next, you can create and set up AWS Lambda to create SageMaker training jobs and endpoints.
  • Prepare dataset: First, you need to gather and prepare a domain-specific dataset that is representative of the target task and will store the documents on Amazon Simple Storage Service (Amazon S3). You can perform data cleaning and preprocessing like duplicates, handling missing values, and normalizing the text to remove any noisy or irrelevant information using containerized Lambda functions. After preprocessing, split the dataset into training, validation, and testing sets and apply the formatting process to pack multiple samples to one sequence for efficient training. Then, tokenize them using AutoTokenizer from the Transformers library.
  • Fine-tune the model: Initiate the SageMaker training job using the Hugging Face Estimator library to fine-tune the selected large language model using QLoRA method from PEFT library.

The Estimator streamlines the complete training and deployment process within SageMaker and effectively oversees infrastructure management. It ensures the appropriate Hugging Face container, uploads designated scripts, and seamlessly retrieves data from Amazon S3. This data is then efficiently integrated into the container at /opt/ml/input/data.

PEFT (Parameter Efficient Fine-tuning) an open-source toolkit by Hugging Face designed to streamline the process of adapting pre-trained language models for diverse downstream tasks. This innovative library allows for targeted refinement of specific model parameters, enhancing efficiency without necessitating fine-tuning of the entire model.

QLoRA: Efficient Fine Tuning of Quantized LLMs presents an effective approach for fine-tuning large language models by quantizing a pretrained model to just four bits and adding compact “Low-Rank Adapters” that undergo fine-tuning. This empowers the fine-tuning of even massive models containing up to 65 billion parameters using a single graphics processing unit (GPU). Despite its impressive efficiency, QLoRA performs with full-precision fine-tuning, showcasing exceptional performance on language-related tasks and achieving state-of-the-art results.

Initiate a training job by using the .fit() method from the Estimator, where we provide an S3 path to the training script.

Deploy the Model

  • To deploy the fine-tuned model on SageMaker, create a HuggingFaceModel model class and define the endpoint configuration, including the hf_model_id, image_uri, model_data, and instance_type.
  • To get the latest Hugging Face LLM deep learning container (DLC) on SageMaker, utilize the get_huggingface_llm_image_uri function available through the SageMaker SDK; model_data is the S3 path of fine-tuned model artifacts.
  • Based on the model size, select the appropriate instance_type (ml.g5 series instances, for example), to create the SageMaker real-time endpoint.
  • Once the endpoint is deployed and displayed with an “In-Service” state, execute inferences using the predict method.


The potential of generative AI functionality is truly revolutionary, expanding horizons and pushing the boundaries of what’s possible.

By harnessing the capabilities of a fine-tuned large language model (LLM) on AWS, professionals across various industries can achieve domain mastery and ignite innovation. Fine-tuned AI models (models that understand your business) can reshape how we build solutions and deliver unique customer experiences in fields like healthcare, fintech, and manufacturing.

Quantiphi is at the forefront of AI innovation, helping businesses across industries to unlock the power of cutting-edge AI and machine learning technologies.


Quantiphi – AWS Partner Spotlight

Quantiphi is an AWS Premier Tier Services Partner and category-defining analytics, machine learning, and cloud modernization company with the AWS Generative AI Competency.

Contact Quantiphi | Partner Overview | AWS Marketplace | Case Studies