Overview
Overview
Tech 42 packages Google's MedGemma 1.5 4B as a production-ready Amazon SageMaker Model Package so your team can deploy a state-of-the-art medical AI backend in minutes - no infrastructure setup, no model serving complexity. MedGemma 1.5 4B is the latest generation of Google's open-weight multimodal models built specifically for healthcare AI applications.
What MedGemma 1.5 4B Can Do
- Interpret CT and MRI scans in 3D
- Analyze whole-slide histopathology images
- Compare longitudinal chest X-rays against prior images
- Extract structured data from medical lab reports
- Interpret EHR text using FHIR-compatible records
- Answer medical questions across radiology, dermatology, ophthalmology, and pathology
Deployment
Deploy with one click. The endpoint exposes an OpenAI-compatible REST API (/v1/chat/completions) - a drop-in replacement for existing OpenAI API clients. Compatible with the SageMaker Python SDK and AWS SDK (Boto3). No GPU orchestration or container management required.
Validated on Real-World Medical Data
Tested against established medical datasets covering imaging, clinical reasoning, and document understanding. View full benchmark results on Hugging Face .
- Imaging - MIMIC-CXR, CheXpert, CXR14, CT-RATE, MS-CXR-T, PathMCQA, WSI-Path, PAD-UFES-20, SCIN, ISIC, EyePACS, SLAKE, VQA-RAD, Chest ImaGenome, MedXpertQA
- Text Reasoning - MedQA, MedMCQA, PubMedQA, MMLU Med, MedXpertQA, AfriMed-QA
- Medical Records - EHRNoteQA, EHRQA
- Document Understanding - Mendeley Clinical Lab Reports
SageMaker Real-Time Benchmark Results
Benchmarks run on G6e and G7e endpoints under streaming load, scaling concurrency from c1 to c64. G7e delivers substantially stronger performance.
| Instance | c8 avg RPS | c8 p90 RPS | c8 p90 TTFT | c8 p90 full response |
|---|---|---|---|---|
| G6e | 0.47 RPS | 0.58 RPS | 15.02s | 17.93s |
| G7e | 2.55 RPS | 2.74 RPS | 1.41s | 3.15s |
Highlights
- Google's Medical Multimodal AI - Private, In Your AWS Account: Deploy MedGemma 1.5 4B as a managed SageMaker endpoint. Radiology, pathology, EHR, and lab report understanding - no PHI leaves your VPC.
- OpenAI-Compatible API via vLLM - Zero SDK Changes for Your App: Served by vLLM with a native /v1/chat/completions endpoint. Integrate using the OpenAI Python SDK, LangChain, or any HTTP client - no custom wrappers needed.
- Deploy in minutes: Zero infrastructure management
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.2xlarge Inference (Batch) Recommended | Model inference on the ml.g5.2xlarge instance type, batch mode | $1.50 |
ml.g7e.2xlarge Inference (Real-Time) Recommended | Model inference on the ml.g7e.2xlarge instance type, real-time mode | $1.50 |
ml.g5.xlarge Inference (Batch) | Model inference on the ml.g5.xlarge instance type, batch mode | $1.50 |
ml.g5.4xlarge Inference (Batch) | Model inference on the ml.g5.4xlarge instance type, batch mode | $1.50 |
ml.g5.8xlarge Inference (Batch) | Model inference on the ml.g5.8xlarge instance type, batch mode | $0.00 |
ml.g5.12xlarge Inference (Batch) | Model inference on the ml.g5.12xlarge instance type, batch mode | $0.00 |
ml.g5.16xlarge Inference (Batch) | Model inference on the ml.g5.16xlarge instance type, batch mode | $0.00 |
ml.g5.24xlarge Inference (Batch) | Model inference on the ml.g5.24xlarge instance type, batch mode | $0.00 |
ml.g5.48xlarge Inference (Batch) | Model inference on the ml.g5.48xlarge instance type, batch mode | $0.00 |
ml.p3.2xlarge Inference (Batch) | Model inference on the ml.p3.2xlarge instance type, batch mode | $0.00 |
Vendor refund policy
Currently we do not support refunds, but you can cancel your subscription to the service at any time.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Initial release of MedGemma 1.5 4B on AWS Marketplace. This version includes the multimodal 4B parameter model supporting medical text reasoning and image comprehension across CT, MRI, chest X-ray, dermatology, and whole-slide histopathology modalities. Built on Gemma 3 architecture with improved accuracy on medical text benchmarks over MedGemma 1.
Additional details
Inputs
- Summary
Accepts JSON payloads via the /invocations endpoint. Supports two modes: (1) text-only, providing a prompt string under the "inputs" key; (2) multimodal, providing a messages array with interleaved text and base64-encoded image content following the Gemma chat template format. Supported image types: JPEG, PNG. Maximum image size: 1024×1024 pixels. Content-Type must be application/json.
- Limitations for input type
- Maximum prompt length: 8,192 tokens. Images must be base64-encoded and embedded inline; external image URLs are not supported (no network access at inference time). Supported content types: application/json only. Batch transform inputs must use JSON Lines format (.jsonl), one JSON object per line.
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
inputs | Text prompt string, or a messages array for multimodal inputs | Max 8,192 tokens | Yes |
max_new_tokens | Maximum number of tokens to generate in the response | 1–2048. Default: 512 | No |
temperature | Controls randomness. Lower values produce more deterministic output | 0.0–1.0. Default: 0.3 | No |
top_p | Nucleus sampling probability threshold | 0.0–1.0. Default: 0.9 | No |
top_k | Limits vocabulary to the top-k most likely tokens at each step | 1–100. Default: 50 | No |
return_full_text | If true, the input prompt is prepended to the generated output | Default: false | No |
Resources
Vendor resources
Support
Vendor support
Contact us at support@tech42consulting.com or visit our website at tech42consulting.com
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.