亚马逊AWS官方博客

基于 Amazon Bedrock 和 Llama2 构建智能导购解决方案

一、背景和目标

在传统电商平台,用户购买商品是搜索+推荐的模式,平台按分类组织商品,客户在购买时需要有明确的购买意图,按分类浏览或搜索商品,电商平台在接收到浏览和搜索需求后,根据一定的推荐、排序算法,给客户推荐商品。然而随着电商行业竞争的白热化、垂直电商平台的增加,流量竞争越来越激烈。平台需要拓展流量,缩短决策流程或者帮助客户做购买决策,所以各电商平台开始尝试电商导购,通过导购机器人自动、快速、精准识别用户购买意图并推荐合适的商品,帮客户做决策。

对于传统的广告平台,拿到电商平台的预算后,他们会制作各种商品推广素材,投放到媒体端,依赖媒体端的流量和广告投放策略来被动的接受请求,并且流量没有粘性,媒体带来的用户是一次性访问,无法沉淀价值,广告素材也无法提供更加全面的信息帮用户做决策,因此部分广告平台开始做业务的延伸,针对垂直行业的广告,会深入对接后向操作,也会向前拓展流量能力,比如对于电商行业的广告平台,试图去做导购机器人,吸引精准流量。

本文所构建的解决方案,就是为了解决上述问题,通过大语言模型构建一个智能导购系统,该系统能够提供一个智能机器人,跟客户对话,理解客户的意图,识别客户的购买偏好,结合电商推荐系统,为客户寻找到合适的商品,并用简短而最具说服力的文案,向客户推荐商品。主要任务有:

  1. 提供用户和导购机器人对话能力,识别购买意图;
  2. 在识别到购买意图后,能生成商品推荐 API 的 Function Call;
  3. 在对话中,及时理解用户当前的状态;
  4. 在最后结束对话,选定商品后,根据商品属性、商品介绍、商品评价,生成推荐文案;

二、解决方案分析

在该场景中,模型能够扮演导购机器人,与用户进行对话,能理解用户的意图,并引导用户说出购买详细需求,帮用户做出购买决策,在这里,对话与对话理解是模型的关键能力要求,通常有以下两种做法:

第一种方案,不需要做模型微调,直接通过 Prompt 来实现,能够减少模型训练的成本,但是因为场景中会涉及商品品类信息,API 调用需要用到品类商品的特殊属性,这些信息需要在 Prompt 中携带,每一轮都会携带大量的对话历史和商品信息,Token 成本会变得更高。

第二种方案,可以将对话风格、针对购物的对话示例和商品信息一起训到模型里,这样在每次对话时的 Prompt 会变小,通过训练,能更加精细的控制模型对每个任务的适配度。训练是低频行为,而对话是高频行为,这一方案也是成本最小的方案。

因此,本方案采用开源模型+全参微调的方式设计。

三、解决方案架构

整体业务流程和技术方案如下:

第一步:准备真实购物对话样本数据,产生 Llama2 模型预训练的训练数据;

第二步:用生成的训练数据,训练 Llama2 模型;

第三步:当用户发起对话时,将用户对话和对话历史提交 Llama2 模型请求回复,模型回复包括 Function Call ;

第四步:根据 Llama2 模型回复,调用推荐系统 API,获得推荐商品列表;调用 Amazon Bedrock Claude3 Haiku 模型,获得当前用户的购买意图;

第五步:根据模型回复,用户发起下一轮对话;

第六步:当用户做出购买决策后,将商品信息+商品评论+对话历史提交 Bedrock Claude3 Haiku 模型,生成商品推荐文案。

第七步:用户根据最终推荐,提交订单。

四、实现流程

1. 训练数据构建

这里我们需要准备一些真实购物对话数据,这些对话数据包含用户的提问、导购机器人的回复、Function Call、API 的返回,以下是一个完整的对话流程:

下面是一个 3c 产品的对话样例:

Customer: Hi, I'm looking to buy a new smartphone. Can you help recommend one based on my needs?  
Shopping Guide: Absolutely! I'd be happy to help you find the perfect smartphone. To get started, what are some of the key features you're looking for in your new phone?  
Customer: I need a phone with a really good camera and display since I watch a lot of videos and take tons of pictures. Long battery life is also pretty important to me.  
Shopping Guide: Okay, it sounds like you need a phone with a high-quality camera, a nice display and good battery life. Do you have any preferences for the brand or operating system?  
Customer: I'm open to either iOS or Android. As for brands, I like Samsung and Google Pixel phones. I also want a 6 inch screen.  

下面是一个 3c 产品的 api 调用样例:

"get_recommds_by_sku": {
 "brand": "Samsung and Google Pixel phones", "category": "smartphone",
"screen size": "6 inch",
 "screen panel": "good display",
"camera": "good camera",
 "operating system": "iOS or Android", "battery": "long battery life"  
}  

2. Llama2模型预训练

1)数据处理

下面是一个样例数据:

2)模型训练

下面是一个真实的模型训练的环境和训练过程:

Meta Info:

Data Size: 42478 (dialog+api)

Model:Llama-2-13b-chat

Method: Full Fine-tuning

Instance: ml.p4d.24xlarge(8*40G A100) * 1

Framework: LLaMA-Factory

DeepSpeed: Z3+Offload Optimizer + Offload Parameters

Training Specific:

Learning Rate: 5e-5

Global Batch Size: 2(single batch size)*8(num of GPU)*4(Gradient Accumulation)

Max Length: 4096

Epoch: 3

3. 意图识别

在真实客户跟导购机器人交流过程中,每一轮对话,我们都需要提取客户的购买意图,进行一些额外的处理,这里我们采用Claude3 Haiku模型,意图定义:

  • Random chat: First is casual chat, with no product,feature,function and shopping intent at all, so it is completely unable to recall any products.
  • Random shopping: It’s like coming to the mall and just browsing around, not specific products/feature/function info,not knowing what you need or what to buy.
  • Intended shopping: There is a clear shopping intent, meaning they have come to shop. This already counts as having the “buy” need, so they are the target users.
  • Specific scenarios: Compared to the previous one, there is a clear “buy” need, and it is also clear “where” this product is needed.
  • Specific needs: This category is mainly about the user clearly needing a product for “what purpose” or needing a product for a certain use,or user has make a decision for purchase and ask buy program.
  • Specific brand: Finally, this is when the user clearly states which brand of product they need  and user has not decided purchase, so we can directly recall products from that brand,If the bot provides brand information, we should classify it as “Specific needs”.

提示词示例:

You are a skilled shopping guider. Your task is to analyze the given chat text carefully and identify users’ purchasing intentions,What starts with "user" is the user's input, and what starts with "bot" is the output of ai bot.purchasing intentions are defined by category_and_description tag.
<category_and_description>

<category> random chat </category>
<category_description>In the user's conversation, there are no topics related to purchases, products or scenarios.</category_description>

<category> random shopping </category>
<category_description>maybe a shopping topic but not specific info</category_description>

<category> intended shopping </category>
<category_description>Users express their willingness to purchase or need</category_description>

<category> specific scenarios</category>
<category_description>The user mentioned an itinerary or other scenario that requires the use of some equipment, tools, etc.</category_description>

<category> specific needs </category>
<category_description>The user has specified specific names, attributes, functions, or decided purchase etc.</category_description>

<category> specific brand</category>
<category_description>The user(not bot) has specified a specific product name and brand</category_description>

</category_and_description>

<hints>
These above six intents build upon each other in a progressive manner.

* Random chat: First is casual chat, with no product,feature,function and shopping intent at all, so it is completely unable to recall any products.
* Random shopping: It's like coming to the mall and just browsing around, not specific products/feature/function info,not knowing what you need or what to buy.
* Intended shopping: There is a clear shopping intent, meaning they have come to shop. This already counts as having the "buy" need, so they are the target users.
* Specific scenarios: Compared to the previous one, there is a clear "buy" need, and it is also clear "where" this product is needed.
* Specific needs: This category is mainly about the user clearly needing a product for "what purpose" or needing a product for a certain use,or user has make a decision for purchase and ask buy program.
* Specific brand: Finally, this is when the user clearly states which brand of product they need  and user has not decided purchase, so we can directly recall products from that brand,If the bot provides brand information, we should classify it as "Specific needs".

</hints>
Here is the chat text for you:
<text>
{msg}
</text>

Before providing your response, please ** think step-by-step ** within <thinking></thinking> tags about which category best fits the text and your reasoning behind it.

Then, provide your final categorization and explanation within <response></response> tags, using the JSON format:
{{
  "category": "category name",
  "reason": "your reasoning"
}}

Make sure to follow the specified format exactly.

识别结果示例:

{
        "origin_cont": "I'm interested in a large, high-performance device. How's the battery life on this model?",
        "haiku_intent": "specific needs",
        "reason": "The chat text expresses a clear shopping intent, with the user inquiring about the details of a specific product (battery life of a large, high-performance device). This indicates a specific need for a product with those characteristics, rather than just random browsing or a general shopping intent."
    }

五、部分参考代码

模型训练:

import time
from sagemaker.estimator import Estimator
from sagemaker.pytorch import PyTorch
from datetime import datetime

instance_count = 1
instance_type = 'ml.p4d.24xlarge'  ## 8*40G
max_time = 200000

# Get the current time
current_time = datetime.now()

# Format the current time as a string
formatted_time = current_time.strftime("%Y%m%d%H%M%S")
print(formatted_time)

environment = {
    'NODE_NUMBER':str(instance_count),
    'MODEL_S3_PATH': f's3://{sagemaker_default_bucket}/Foundation-Models/Llama-2-13b-chat-hf/*', # source model files
    'OUTPUT_MODEL_S3_PATH': f's3://{sagemaker_default_bucket}/Llama-2-13b-chat-hf/{formatted_time}/', # destination
}

estimator = PyTorch(entry_point='entry.py',
                            source_dir='LLaMA-Factory/',
                            role=role,
                            environment=environment,
                            framework_version='2.1.0',
                            py_version='py310',
                            script_mode=True,
                            instance_count=instance_count,
                            instance_type=instance_type,
                            max_run=max_time)

# # data in channel will be automatically copied to each node - /opt/ml/input/data/train1
estimator.fit()

前端 demo 和对话理解、意图识别:

def generate(message, history):
    history = history[-10:]
    status_messages = [status_sys_message]
    dialog_messages = [dialog_sys_message]
    api_messages = [api_sys_message]
    chosen_api_output = []
    text_message = ''
    if len(history) > 0:
        for h in history:
            q, a = h[0], h[1]
            api_output, a = a.split(splitter)
            a = a.strip()
            api_output = api_output.split("api返回(部分信息):\n")[-1].strip() if "api返回(部分信息):\n" in api_output else "[]"
            text_message += f"User: {q}\n"
            text_message += f"Shopping Guide: {a}\n"
            dialog_messages.append({"role": "user", "content": f"User: {q}\n[API Return]:\n{api_output}"})
            dialog_messages.append({"role": "assistant", "content": f"Shopping Guide: {a}"})
    text_message += f"User: {message}"

    #1. extract status
    display_output = "后台进程:\n1. 用户status:\n"
    yield display_output
    user_prompt = '''意图识别Prompt'''
    user_prompt += f"<conversation>\n{text_message}\n</conversation>"
    stream = invoke_model(user_prompt)
    status = ''
    if stream:
        for event in stream:
            # Use chaining of .get() to simplify access to nested data
            delta_content = event.get("chunk", {}).get("bytes")
            if not delta_content:
                continue  # Skip to the next iteration if delta_content is None or empty

            text_segment = json.loads(delta_content.decode()).get('delta', {}).get('text', '')
            if not text_segment:
                continue  # Skip to the next iteration if text_segment is empty

            status += text_segment
            display_output += text_segment
            yield display_output
            
    #2. extract scene
    display_output += "\n\n2. scene信息提取:\n"
    yield display_output
    scene_prompt = '''
    You task is to determine the aim and scene of the User in the last round of the conversation. The aim in scenario information means how would the user use the product, and it should not include the scenario location. If the user does not mention the aim or the scene, the answer should be empty string. You should format the output as a json string, whose keys are "aim" and "scene", without any extra content when answering.
    '''
    scene_prompt += f"<conversation>\n{text_message}\n</conversation>"
    stream = invoke_model(scene_prompt)
    if stream:
        for event in stream:
            # Use chaining of .get() to simplify access to nested data
            delta_content = event.get("chunk", {}).get("bytes")
            if not delta_content:
                continue  # Skip to the next iteration if delta_content is None or empty

            text_segment = json.loads(delta_content.decode()).get('delta', {}).get('text', '')
            if not text_segment:
                continue  # Skip to the next iteration if text_segment is empty

            display_output += text_segment
            yield display_output
        
    # check whether need extract api
    if 'Random Chat' not in status and 'Random Shopping' not in status:
        display_output += "\n\n3. 抽取api:\n"
        yield display_output
        api_messages.append({"role": "user", "content": text_message})
        result = llama_inference(api_messages, 0.3)
        api = ''
        for new_tokens in result:
            api += new_tokens
            display_output += new_tokens
            yield display_output

        api = api.strip()
        print(text_message)
        print(api_messages)
        print(api)

        display_output += "\n\n4. api返回(部分信息):\n"
        yield display_output
        api_output = request_api(api)
        api_output = MessageToJson(api_output)
        api_output = json.loads(api_output)

        for api in api_output['adList']:
            info = {k: v for k, v in api['skuInfo'].items() if k in used_field}
            if len(api['skuInfo']['skuCommentList'])>0:
                info['User_Comment'] = api['skuInfo']['skuCommentList'][0].get('commentContent','')
            chosen_api_output.append(info)

        display_output += str(chosen_api_output)
        yield display_output
    
        display_output += "\n\n"+splitter+'\n'
        yield display_output

        dialog_messages.append({"role": "user", "content": f"User: {message}\n[API Return]:\n{chosen_api_output}"})
        print(dialog_messages)
        response = llama_inference(dialog_messages, 1, ' Shopping Guide: ')
        for new_tokens in response:
            display_output += new_tokens
            yield display_output
    else:
        display_output += "\n\n"+splitter+'\n'
        yield display_output
        user_prompt = '''You are an intelligent shopping guide. The following is a conversation history between the user and you. Please respond to user's last utterance without any preamble or "Shopping Guide:" in front. '''
        user_prompt += f"<conversation>\n{text_message}\n</conversation>"
        stream = invoke_model(user_prompt, temperature=0.8)
        if stream:
            for event in stream:
                # Use chaining of .get() to simplify access to nested data
                delta_content = event.get("chunk", {}).get("bytes")
                if not delta_content:
                    continue  # Skip to the next iteration if delta_content is None or empty

                text_segment = json.loads(delta_content.decode()).get('delta', {}).get('text', '')
                if not text_segment:
                    continue  # Skip to the next iteration if text_segment is empty

                status += text_segment
                display_output += text_segment
                yield display_output

六、实现效果

七、总结

本文介绍了一种电商导购的实现方案,该方案考虑了电商这个垂直行业的行业属性,导购场景的特殊语料和商品属性,有针对性的预训练一个电商导购对话模型,同时结合电商既有的商品推荐 Agent,训练了模型的 Function Call 任务,在整个购物对话过程中,还通过 Bedrock Claude3 Haiku 模型完成了意图识别和商品推荐任务,经过客户真实数据和商品推荐 API 的验证,其对话理解能力、API 抽取能力、意图识别能力、推荐文案生成能力都能满足导购需求。


*前述特定亚马逊云科技生成式人工智能相关的服务仅在亚马逊云科技海外区域可用,亚马逊云科技中国仅为帮助您了解行业前沿技术和发展海外业务选择推介该服务。

本篇作者

魏亦豪

亚马逊云科技应用科学家,长期从事生成式 AI、自然语言处理、多模态预训练等领域的研究和开发工作。支持 GenAI 实验室项目,在对话系统、智能客服、虚拟陪伴、预训练、多模态模型等方向有丰富的算法开发以及落地实践经验。

冉晨伟

亚马逊云科技应用科学家,长期从事生成式 AI、自然语言处理、信息检索等领域的研究和开发工作。支持 GenAI 实验室项目,在大语言模型、搜索排序、预训练、多模态模型等方向有丰富的算法开发以及落地实践经验。

彭赟

亚马逊云科技资深解决方案架构师,负责基于 AWS 的云计算方案架构咨询和设计,20 多年软件架构、设计、开发、项目管理交付经验,擅长业务咨询、产品设计、软件架构,在大数据、区块链、容器化方向有较深的入研究,具有丰富的解决客户实际问题的经验。

王鹤男

亚马逊云科技资深应用科学家,负责生成式 AI 实验室,在生成式 AI 领域有丰富的实践经验,对于大语言模型、文生图模型、多模态模型等都有研究和应用,熟悉计算机视觉、自然语言处理、传统机器学习模型等领域,领导了首汽约车语音降噪、LiveMe 直播场景反欺诈等项目,为企业客户提供云上的人工智能和机器学习赋能。曾任汉迪推荐算法工程师,神州优车集团人工智能实验室负责人等职位。

龚德强

亚马逊云科技资深客户解决方案经理,2019 年加入 AWS。入职 AWS 之前在软件行业、电信行业工作 20 多年,也曾 2 次创业 ToB 公司,熟悉软件开发、项目管理、项目交付,擅长与客户进行需求沟通。