Overview
This pipeline can be used to mask PHI information in SVS files. It removes PHI from metadata tags and pixel data of the input file. You can remove any metadata tags via custom parameters like ImageDescription.ScanScope ID, ImageDescription.Time Zone, ImageDescription.ScannerType.
Masked entities include AGE, BIOID, CITY, COUNTRY, DATE, DEVICE, DOCTOR, EMAIL, FAX, HEALTHPLAN, HOSPITAL, IDNUM, LOCATION, MEDICALRECORD, ORGANIZATION, PATIENT, PHONE, PROFESSION, STATE, STREET, URL, USERNAME, ZIP, ACCOUNT, LICENSE, VIN, SSN, DLN, PLATE, and IPADDR.
The output is a SVS document, similar to the one at the input, but with black bounding boxes on top of the targeted entities and PHI removed from metadata tags.
IMPORTANT USAGE INFORMATION:
After subscribing to this product and creating a SageMaker endpoint, billing occurs on an HOURLY BASIS for as long as the endpoint is running.
-Charges apply even if the endpoint is idle and not actively processing requests.
-To stop charges, you MUST DELETE the endpoint in your SageMaker console.
-Simply stopping requests will NOT stop billing.
This ensures you are only billed for the time you actively use the service.
Highlights
- Comprehensive, multi-layered approach to de-identifying SVS files - combining advanced deep learning based NLP, OCR, and binary processing to accurately detect and mask Protected Health Information (PHI) across both pixel data and metadata.
- By targeting a wide range of entity type - from patient names and medical IDs to geographic locations and digital identifiers - the solution ensures compliance with privacy regulations while preserving the integrity and usability of the original SVS file.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.m5.4xlarge Inference (Batch) Recommended | Model inference on the ml.m5.4xlarge instance type, batch mode | $95.04 |
ml.m5.4xlarge Inference (Real-Time) Recommended | Model inference on the ml.m5.4xlarge instance type, real-time mode | $95.04 |
ml.m5.xlarge Inference (Batch) | Model inference on the ml.m5.xlarge instance type, batch mode | $95.04 |
ml.m5.2xlarge Inference (Batch) | Model inference on the ml.m5.2xlarge instance type, batch mode | $95.04 |
ml.m6i.xlarge Inference (Real-Time) | Model inference on the ml.m6i.xlarge instance type, real-time mode | $95.04 |
ml.m5.2xlarge Inference (Real-Time) | Model inference on the ml.m5.2xlarge instance type, real-time mode | $95.04 |
Vendor refund policy
No refunds are possible.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
This endpoint deidentifies SVS files, which comes from Microscopes and other sources. It removes all Protected Healthcare Information (PHI) from input SVS pixel-data and metadata Tags. It uses a combination of SOTA Deep Learning based NLP & OCR pipeline, as well as binary processing.
johnsnowlabs_version: 6.0.2
Spark-OCR==6.0.0 Spark-Healthcare==6.0.2 Spark-NLP==6.0.1
Additional details
Inputs
- Summary
Supported PDF input format. PDF can be digital and scanned or mixed.
- Input MIME type
- application/octet-stream
Resources
Vendor resources
Support
Vendor support
For any assistance, please reach out to support@johnsnowlabs.com .
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.