
Overview
The Clinical De-Identification model is designed to recognize and anonymize PHI in French-language clinical notes. It employs state-of-the-art natural language processing techniques to detect sensitive information such as patient names, addresses, medical record numbers, and other identifiers. Once identified, the PHI is effectively masked or obfuscated, rendering the text safe for broader use while maintaining its informational integrity.
IMPORTANT USAGE INFORMATION:
After subscribing to this product and creating a SageMaker endpoint, billing occurs on an HOURLY BASIS for as long as the endpoint is running.
-Charges apply even if the endpoint is idle and not actively processing requests.
-To stop charges, you MUST DELETE the endpoint in your SageMaker console.
-Simply stopping requests will NOT stop billing.
This ensures you are only billed for the time you actively use the service.
Highlights
- Process up to 14M chars per hour in real-time and 25M chars per hour in batch mode. **Key Features:** - The model is tuned to identify wide range of PHI elements in medical texts, ensuring comprehensive de-identification. - The process aligns with GDPR and other healthcare privacy regulations, aiding in legal compliance and data protection. - Ideal for research, analytics, and training purposes, this model enables the safe utilization of medical texts without compromising patient privacy.
- Covered entities: DATE, AGE, SEX, PROFESSION, ORGANIZATION, PHONE, E-MAIL, ZIP, STREET, CITY, COUNTRY, PATIENT, DOCTOR, HOSPITAL, MEDICALRECORD, SSN, IDNUM, ACCOUNT, PLATE, USERNAME, URL, and IPADDR.
- This model is a useful asset in the healthcare and research sectors, where the protection of patient privacy is paramount. It allows for the ethical and legal use of valuable medical data, promoting research and analysis while upholding the highest standards of data privacy and security.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.m4.2xlarge Inference (Batch) Recommended | Model inference on the ml.m4.2xlarge instance type, batch mode | $47.52 |
ml.m4.xlarge Inference (Real-Time) Recommended | Model inference on the ml.m4.xlarge instance type, real-time mode | $23.76 |
Vendor refund policy
No refunds are possible.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Upgraded johnsnowlabs - 6.0.0 Spark-NLP - 6.0.0 Spark-Healthcare - 6.0.0
Additional details
Inputs
- Summary
Input Format
JSON Format
Provide input as JSON. We support two variations within this format:
- Array of Text Documents: Use an array containing multiple text documents. Each element represents a separate text document.
{ "text": [ "Text document 1", "Text document 2", ... ] } 2. Single Text Document: Provide a single text document as a string.
{ "text": "Single text document" }
JSON Lines (JSONL) Format
Provide input in JSON Lines format, where each line is a JSON object representing a text document.
{"text": "Text document 1"} {"text": "Text document 2"}
Important Parameter
masking_policy: str
Users can select a masking policy to determine how sensitive entities are handled:
Example: "PRENOM : Éric, NOM : Lejeune, NUMÉRO DE SÉCURITÉ SOCIALE : 2730235238020, ADRESSE : 74 Boulevard Riou, VILLE : Sainte Annenec"
masked: Default policy that masks entities with their type.
-> 'PRENOM : , NOM : , NUMÉRO DE SÉCURITÉ SOCIALE : , ADRESSE : , VILLE : '
obfuscated: Replaces sensitive entities with random values of the same type.
-> 'PRENOM : Mlle Goncalves, NOM : Mme Garnier-Roussel, NUMÉRO DE SÉCURITÉ SOCIALE : 129022554007724, ADRESSE : avenue de Mahe, VILLE : Meyer-sur-Lemoine'
masked_fixed_length_chars: Masks entities with a fixed length of asterisks (*).
-> 'PRENOM : ****, NOM : ****, NUMÉRO DE SÉCURITÉ SOCIALE : ****, ADRESSE : ****, VILLE : ****'
masked_with_chars: Masks entities with asterisks (*).
-> 'PRENOM : [], NOM : [], NUMÉRO DE SÉCURITÉ SOCIALE : [****], ADRESSE : [*******], VILLE : [********]'
You can specify these parameters in the input as follows:
{ "text": [ "Text document 1", "Text document 2", ... ], "masking_policy": "masked" }
- Input MIME type
- application/json, application/jsonlines
Resources
Vendor resources
Support
Vendor support
For any assistance, please reach out to support@johnsnowlabs.com .
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products




