亚马逊AWS官方博客
新功能 – Amazon Comprehend Medical 新增本体链接功能
Amazon Comprehend 是一项使用机器学习在非结构化文本中发现见解的自然语言处理 (NLP) 服务。使用非常简单,不需要具备任何机器学习经验。 您可以针对自己的特殊用途自定义 Comprehend。例如创建自定义文档分类器,以将文档整理到您自己的类别中,或者自定义实体类型来分析文本中的特定术语。 但医学术语可能非常复杂,而且特定于医疗保健领域。
为此,去年我们发布了 Amazon Comprehend Medical,这是一项符合 HIPAA 要求的自然语言处理服务,让用户可利用机器学习技术轻松从非结构化文本中提取相关医疗信息。借助 Comprehend Medical,您可以从医生诊断书、临床试验报告和患者个人健康档案等各种来源中,快速准确地收集信息,例如医疗状况、药物、剂量、强度和频率。
今天,我们增加了将 Comprehend Medical 提取的信息链接到医学本体的功能。
本体提供某个领域的声明式模型,该模型可定义并表示特定领域中存在的概念、其属性以及它们之间的关系。它通常表现为知识库,并可供需要使用或共享知识的应用程序使用。在健康信息学中,本体是对健康相关领域的形式化描述。
Comprehend Medical 支持的本体包括:
- ICD-10-CM – 将医疗状况识别为实体,并将诊断结论、严重性和解剖学差异等相关信息链接为该实体的属性。 这是一个诊断代码集,对人口健康分析以及根据提供的医疗服务从保险公司获得赔付非常有用。
- RxNorm – 将药物识别为实体,并将剂量、频率、强度和给药途径等属性链接到该实体。 医疗保健提供商使用这些概念来实现相应的目的,例如药物核对,该核对过程可确保创建最能准确反映患者所用所有药物的清单。
Comprehend Medical 为针对每个实体返回一个潜在匹配项的排序表。您可以使用置信度来确定哪些匹配项有意义,哪些匹配项还需进一步审查。我们来通过一个示例了解下具体的工作原理。
使用本体链接
在 Comprehend Medical 控制台中,我先在输入中提供一些非结构化的医生诊断书文本:
首先,我使用 Comprehend Medical 中已经可用的一些功能来检测受保护的健康信息 (PHI) 实体。
识别的实体(参阅此处博文了解详情)中包含一些症状和药物。药物被识别为产品或品牌。我们来看看如何将其中一些实体与更具体的概念联系起来。
我使用新增功能将这些实体链接到药物的 RxNorm 概念。
文本中,只有涉及药物的内容会被检测到。我可以在答案详情中看到更多的信息。 例如,我们来看一个检测到的药物:
- “Clonidine”这个词第一次出现(上述输入文本中的第二行)时链接到 RxNorm 本体中的产品概念(下图左侧)。
- “Clonidine”这个词第二次出现(上述输入文本中的第四行)时后跟明确的剂量,并链接到 RxNorm 本体中更规范的格式,包括剂量(下图右侧)。
为了使用 ICD-10-CM 概念查找医疗状况,我输入其他内容:
想法还是将检测到的实体(例如症状和诊断结论)链接到具体的概念。
正如预期,诊断结论和症状都被识别为实体。在详细的结果中,这些实体均链接到 ICD-10-CM 本体中的医疗状况。例如,输入文本中描述的两个主要诊断结论是排名最高的结果,而本体中的特定概念由 Comprehend Medical 推断得来,每个概念都有其自己的分数。
在生产环境中,您可以通过 API 使用 Comprehend Medical,将这些功能集成到您的处理工作流程中。上述所有屏幕截图均以 JSON 格式直观地呈现了 API 返回的结构化信息。例如,以下是检测药物的结果(RxNorm 概念):
{
"Entities": [
{
"Id": 0,
"Text": "Clonidine",
"Category": "MEDICATION",
"Type": "GENERIC_NAME",
"Score": 0.9933062195777893,
"BeginOffset": 83,
"EndOffset": 92,
"Attributes": [],
"Traits": [],
"RxNormConcepts": [
{
"Description": "Clonidine",
"Code": "2599",
"Score": 0.9148101806640625
},
{
"Description": "168 HR Clonidine 0.00417 MG/HR Transdermal System",
"Code": "998671",
"Score": 0.8215734958648682
},
{
"Description": "Clonidine Hydrochloride 0.025 MG Oral Tablet",
"Code": "892791",
"Score": 0.7519310116767883
},
{
"Description": "10 ML Clonidine Hydrochloride 0.5 MG/ML Injection",
"Code": "884225",
"Score": 0.7171697020530701
},
{
"Description": "Clonidine Hydrochloride 0.2 MG Oral Tablet",
"Code": "884185",
"Score": 0.6776907444000244
}
]
},
{
"Id": 1,
"Text": "Vyvanse",
"Category": "MEDICATION",
"Type": "BRAND_NAME",
"Score": 0.9995427131652832,
"BeginOffset": 148,
"EndOffset": 155,
"Attributes": [
{
"Type": "DOSAGE",
"Score": 0.9910679459571838,
"RelationshipScore": 0.9999822378158569,
"Id": 2,
"BeginOffset": 156,
"EndOffset": 162,
"Text": "50 mgs",
"Traits": []
},
{
"Type": "ROUTE_OR_MODE",
"Score": 0.9997182488441467,
"RelationshipScore": 0.9993833303451538,
"Id": 3,
"BeginOffset": 163,
"EndOffset": 165,
"Text": "po",
"Traits": []
},
{
"Type": "FREQUENCY",
"Score": 0.983681321144104,
"RelationshipScore": 0.9999642372131348,
"Id": 4,
"BeginOffset": 166,
"EndOffset": 184,
"Text": "at breakfast daily",
"Traits": []
}
],
"Traits": [],
"RxNormConcepts": [
{
"Description": "lisdexamfetamine dimesylate 50 MG Oral Capsule [Vyvanse]",
"Code": "854852",
"Score": 0.8883932828903198
},
{
"Description": "lisdexamfetamine dimesylate 50 MG Chewable Tablet [Vyvanse]",
"Code": "1871469",
"Score": 0.7482635378837585
},
{
"Description": "Vyvanse",
"Code": "711043",
"Score": 0.7041242122650146
},
{
"Description": "lisdexamfetamine dimesylate 70 MG Oral Capsule [Vyvanse]",
"Code": "854844",
"Score": 0.23675969243049622
},
{
"Description": "lisdexamfetamine dimesylate 60 MG Oral Capsule [Vyvanse]",
"Code": "854848",
"Score": 0.14077001810073853
}
]
},
{
"Id": 5,
"Text": "Clonidine",
"Category": "MEDICATION",
"Type": "GENERIC_NAME",
"Score": 0.9982216954231262,
"BeginOffset": 199,
"EndOffset": 208,
"Attributes": [
{
"Type": "STRENGTH",
"Score": 0.7696017026901245,
"RelationshipScore": 0.9999960660934448,
"Id": 6,
"BeginOffset": 209,
"EndOffset": 216,
"Text": "0.2 mgs",
"Traits": []
},
{
"Type": "DOSAGE",
"Score": 0.777644693851471,
"RelationshipScore": 0.9999927282333374,
"Id": 7,
"BeginOffset": 220,
"EndOffset": 236,
"Text": "1 and 1 / 2 tabs",
"Traits": []
},
{
"Type": "ROUTE_OR_MODE",
"Score": 0.9981689453125,
"RelationshipScore": 0.999950647354126,
"Id": 8,
"BeginOffset": 237,
"EndOffset": 239,
"Text": "po",
"Traits": []
},
{
"Type": "FREQUENCY",
"Score": 0.99753737449646,
"RelationshipScore": 0.9999889135360718,
"Id": 9,
"BeginOffset": 240,
"EndOffset": 243,
"Text": "qhs",
"Traits": []
}
],
"Traits": [],
"RxNormConcepts": [
{
"Description": "Clonidine Hydrochloride 0.2 MG Oral Tablet",
"Code": "884185",
"Score": 0.9600071907043457
},
{
"Description": "Clonidine Hydrochloride 0.025 MG Oral Tablet",
"Code": "892791",
"Score": 0.8955953121185303
},
{
"Description": "24 HR Clonidine Hydrochloride 0.2 MG Extended Release Oral Tablet",
"Code": "885880",
"Score": 0.8706559538841248
},
{
"Description": "12 HR Clonidine Hydrochloride 0.2 MG Extended Release Oral Tablet",
"Code": "1013937",
"Score": 0.786146879196167
},
{
"Description": "Chlorthalidone 15 MG / Clonidine Hydrochloride 0.2 MG Oral Tablet",
"Code": "884198",
"Score": 0.601354718208313
}
]
}
],
"ModelVersion": "0.0.0"
}
类似的,以下是检测医疗状况时的输出(ICD-10-CM 概念):
{
"Entities": [
{
"Id": 0,
"Text": "coronary artery disease",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9933860898017883,
"BeginOffset": 90,
"EndOffset": 113,
"Attributes": [],
"Traits": [
{
"Name": "DIAGNOSIS",
"Score": 0.9682672023773193
}
],
"ICD10CMConcepts": [
{
"Description": "Atherosclerotic heart disease of native coronary artery without angina pectoris",
"Code": "I25.10",
"Score": 0.8199513554573059
},
{
"Description": "Atherosclerotic heart disease of native coronary artery",
"Code": "I25.1",
"Score": 0.4950370192527771
},
{
"Description": "Old myocardial infarction",
"Code": "I25.2",
"Score": 0.18753206729888916
},
{
"Description": "Atherosclerotic heart disease of native coronary artery with unstable angina pectoris",
"Code": "I25.110",
"Score": 0.16535982489585876
},
{
"Description": "Atherosclerotic heart disease of native coronary artery with unspecified angina pectoris",
"Code": "I25.119",
"Score": 0.15222692489624023
}
]
},
{
"Id": 2,
"Text": "atrial fibrillation",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9923409223556519,
"BeginOffset": 116,
"EndOffset": 135,
"Attributes": [],
"Traits": [
{
"Name": "DIAGNOSIS",
"Score": 0.9708861708641052
}
],
"ICD10CMConcepts": [
{
"Description": "Unspecified atrial fibrillation",
"Code": "I48.91",
"Score": 0.7011875510215759
},
{
"Description": "Chronic atrial fibrillation",
"Code": "I48.2",
"Score": 0.28612759709358215
},
{
"Description": "Paroxysmal atrial fibrillation",
"Code": "I48.0",
"Score": 0.21157972514629364
},
{
"Description": "Persistent atrial fibrillation",
"Code": "I48.1",
"Score": 0.16996538639068604
},
{
"Description": "Atrial premature depolarization",
"Code": "I49.1",
"Score": 0.16715925931930542
}
]
},
{
"Id": 3,
"Text": "hypertension",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9993137121200562,
"BeginOffset": 138,
"EndOffset": 150,
"Attributes": [],
"Traits": [
{
"Name": "DIAGNOSIS",
"Score": 0.9734011888504028
}
],
"ICD10CMConcepts": [
{
"Description": "Essential (primary) hypertension",
"Code": "I10",
"Score": 0.6827990412712097
},
{
"Description": "Hypertensive heart disease without heart failure",
"Code": "I11.9",
"Score": 0.09846580773591995
},
{
"Description": "Hypertensive heart disease with heart failure",
"Code": "I11.0",
"Score": 0.09182810038328171
},
{
"Description": "Pulmonary hypertension, unspecified",
"Code": "I27.20",
"Score": 0.0866364985704422
},
{
"Description": "Primary pulmonary hypertension",
"Code": "I27.0",
"Score": 0.07662317156791687
}
]
},
{
"Id": 4,
"Text": "hyperlipidemia",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9998835325241089,
"BeginOffset": 153,
"EndOffset": 167,
"Attributes": [],
"Traits": [
{
"Name": "DIAGNOSIS",
"Score": 0.9702492356300354
}
],
"ICD10CMConcepts": [
{
"Description": "Hyperlipidemia, unspecified",
"Code": "E78.5",
"Score": 0.8378056883811951
},
{
"Description": "Disorders of lipoprotein metabolism and other lipidemias",
"Code": "E78",
"Score": 0.20186281204223633
},
{
"Description": "Lipid storage disorder, unspecified",
"Code": "E75.6",
"Score": 0.18514418601989746
},
{
"Description": "Pure hyperglyceridemia",
"Code": "E78.1",
"Score": 0.1438658982515335
},
{
"Description": "Other hyperlipidemia",
"Code": "E78.49",
"Score": 0.13983778655529022
}
]
},
{
"Id": 5,
"Text": "chills",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9989762306213379,
"BeginOffset": 211,
"EndOffset": 217,
"Attributes": [],
"Traits": [
{
"Name": "SYMPTOM",
"Score": 0.9510533213615417
}
],
"ICD10CMConcepts": [
{
"Description": "Chills (without fever)",
"Code": "R68.83",
"Score": 0.7460958361625671
},
{
"Description": "Fever, unspecified",
"Code": "R50.9",
"Score": 0.11848161369562149
},
{
"Description": "Typhus fever, unspecified",
"Code": "A75.9",
"Score": 0.07497859001159668
},
{
"Description": "Neutropenia, unspecified",
"Code": "D70.9",
"Score": 0.07332006841897964
},
{
"Description": "Lassa fever",
"Code": "A96.2",
"Score": 0.0721040666103363
}
]
},
{
"Id": 6,
"Text": "nausea",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9993392825126648,
"BeginOffset": 220,
"EndOffset": 226,
"Attributes": [],
"Traits": [
{
"Name": "SYMPTOM",
"Score": 0.9175007939338684
}
],
"ICD10CMConcepts": [
{
"Description": "Nausea",
"Code": "R11.0",
"Score": 0.7333012819290161
},
{
"Description": "Nausea with vomiting, unspecified",
"Code": "R11.2",
"Score": 0.20183530449867249
},
{
"Description": "Hematemesis",
"Code": "K92.0",
"Score": 0.1203150525689125
},
{
"Description": "Vomiting, unspecified",
"Code": "R11.10",
"Score": 0.11658868193626404
},
{
"Description": "Nausea and vomiting",
"Code": "R11",
"Score": 0.11535880714654922
}
]
},
{
"Id": 8,
"Text": "flank pain",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9315784573554993,
"BeginOffset": 235,
"EndOffset": 245,
"Attributes": [
{
"Type": "ACUITY",
"Score": 0.9809532761573792,
"RelationshipScore": 0.9999837875366211,
"Id": 7,
"BeginOffset": 229,
"EndOffset": 234,
"Text": "acute",
"Traits": []
}
],
"Traits": [
{
"Name": "SYMPTOM",
"Score": 0.8182812929153442
}
],
"ICD10CMConcepts": [
{
"Description": "Unspecified abdominal pain",
"Code": "R10.9",
"Score": 0.4959934949874878
},
{
"Description": "Generalized abdominal pain",
"Code": "R10.84",
"Score": 0.12332479655742645
},
{
"Description": "Lower abdominal pain, unspecified",
"Code": "R10.30",
"Score": 0.08319114148616791
},
{
"Description": "Upper abdominal pain, unspecified",
"Code": "R10.10",
"Score": 0.08275411278009415
},
{
"Description": "Jaw pain",
"Code": "R68.84",
"Score": 0.07797083258628845
}
]
},
{
"Id": 10,
"Text": "numbness",
"Category": "MEDICAL_CONDITION",
"Type": "DX_NAME",
"Score": 0.9659366011619568,
"BeginOffset": 255,
"EndOffset": 263,
"Attributes": [
{
"Type": "SYSTEM_ORGAN_SITE",
"Score": 0.9976192116737366,
"RelationshipScore": 0.9999089241027832,
"Id": 11,
"BeginOffset": 271,
"EndOffset": 274,
"Text": "leg",
"Traits": []
}
],
"Traits": [
{
"Name": "SYMPTOM",
"Score": 0.7310190796852112
}
],
"ICD10CMConcepts": [
{
"Description": "Anesthesia of skin",
"Code": "R20.0",
"Score": 0.767346203327179
},
{
"Description": "Paresthesia of skin",
"Code": "R20.2",
"Score": 0.13602739572525024
},
{
"Description": "Other complications of anesthesia",
"Code": "T88.59",
"Score": 0.09990577399730682
},
{
"Description": "Hypothermia following anesthesia",
"Code": "T88.51",
"Score": 0.09953102469444275
},
{
"Description": "Disorder of the skin and subcutaneous tissue, unspecified",
"Code": "L98.9",
"Score": 0.08736388385295868
}
]
}
],
"ModelVersion": "0.0.0"
}
现已推出
您可以通过控制台、AWS 命令行界面 (CLI) 或 AWS 开发工具包使用 Amazon Comprehend Medical。 使用 Comprehend Medical,您只需为实际用量付费。您需要根据每月处理的文本量付费,具体取决于您今后的使用情况。详情请参阅 Comprehend 定价页面中的“Comprehend Medical”部分。本体链接功能现已在提供 Amazon Comprehend Medical 的所有区域推出,具体区域请参阅 AWS 区域表。
新推出的本体链接 API 使用户可以轻松从非结构化临床文本中检测药物和医疗状况,并将它们分别链接到 RxNorm 和 ICD-10-CM 代码。这项新功能可以帮助您节省以高精度处理大量非结构化医学文本所需成本、时间和精力。
– Danilo