Overview
Traditional Vision AI demands extensive data labeling and repetitive model retraining, a labor-intensive process that consumes significant time, cost, and specialized expertise. Superb AI's "ZERO" brings a paradigm shift as an industrial-specialized Vision Foundation Model (VFM). Leveraging Open World Visual Grounding technology, ZERO comprehends novel concepts without prior training. This zero-shot capability empowers instant AI adoption for new tasks and flexible, on-the-fly changes to detection targets, eliminating the need for additional training.
A key advantage of ZERO is its Multi-Prompt capability. This feature enables immediate AI model application in real-world scenarios using intuitive prompts like text, image boxes, no separate model training required. Instead of time-consuming retraining, you simply describe your target in text or provide an example image, and ZERO adapts instantly. This dramatically cuts the time and cost of AI solution development, making AI adoption faster and more accessible.
Furthermore, ZERO is engineered for high efficiency, boasting a lightweight 622M parameters and high-performance processing at 1.03 TFLOPS. This optimized design ensures seamless operation across virtually any application or industry, from cloud to secure on premise environments and remote edge devices. Its efficiency significantly reduces the need for heavy computing hardware investments, making advanced AI more attainable for diverse industrial deployments.
Highlights
- Zero shot Deployment: Instantly detect untrained objects without complex data collection, labeling, or model retraining. Adapt immediately to new products, defect types, or environment changes, dramatically cutting development time and costs.
- Flexible Multi Prompt Input: Deploy and operate AI instantly by simply describing your target object in text or providing an example image. ZERO supports diverse input prompts for intuitive interaction.
- Industrial Specialized VFM: Trained on invaluable, real-world data from dozens of industrial sectors including manufacturing, logistics, and retail. ZERO delivers high performance and immediate usability across complex industrial domains.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g4dn.xlarge Inference (Batch) Recommended | Model inference on the ml.g4dn.xlarge instance type, batch mode | $5.00 |
ml.g4dn.xlarge Inference (Real-Time) Recommended | Model inference on the ml.g4dn.xlarge instance type, real-time mode | $5.00 |
Vendor refund policy
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Enables object detection by text and vision based prompt in single image
Additional details
Inputs
- Summary
Parameters
search_image Type: String Required: Yes Description: The source image for object detection. This can be a URL or a Base64 encoded string.
queries Type: List[Dict] Required: Yes Description: A list of query objects that define what to search for in the source search_image. At least one query must be provided.
-
prompt_image Type: String Required: Conditional Description: An image used for semantic queries. This is required when you want to use a part of an image (defined by a bounding box) as the search prompt. Can be an empty string "", URL or a Base64 encoded string.
-
prompts Type: List[Dict] Required: Yes Description: A list of prompt objects, each defining a specific target to detect. The prompts Object Each object within the prompts list specifies the actual search criteria. This is where you define whether the query is text-based, semantic (visual), or a combination.
-
- text Type: String Required: Conditional Description: A text description of the object to detect (e.g., "person", "red car"). Required for text-based detection. Can be an empty string "" if a box is provided for a semantic query.
-
- box Type: List[Float] Required: Conditional Description: A list of four floats representing a bounding box [x, y, w, h] that crops a region from the prompt_image. This cropped region is used as a visual prompt for detection. Required for semantic (visual) search. Provide an empty list [] for text-based search.
-
- box_threshold Type: Float Required: Yes Description: A threshold between 0.0 and 1.0. Detection boxes with a score below this value will be filtered out.
-
- multimodal_threshold Type: Float Required: Yes Description: A threshold between 0.0 and 1.0. Each corresponding threshold will be applied. to the individual prompt. Try to use this threshold for the product rather than box_threshold.
-
- Input MIME type
- application/json, application/jsonlines
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products
