
Overview
KanjuTech's Transcription and Diarization model ensures secure end-to-end recognition of multi-participant conversations. It converts dialogue records into precise transcripts with labeled speakers and lines, offering automatic detection for any number of participants. With low error rates (WER and DER) for real-life data, it supports ten languages with human-level accuracy (WER 3-8%). The model efficiently handles over 12 hours of recording in just one hour on ml.p3.2xlarge. Following AWS's secure policy, only users can access processed data through SageMaker products. Industries like translation, transcription, media, broadcasting, call centers, corporate governance, and education will find our solution invaluable for enhancing their products and services.
Highlights
- Achieve human-level precision in transcription and diarization with our industry-grade solution. Seamlessly convert conversations into accurate transcriptions, complete with speaker detection and labeling. Your data is secure because only you have access as the user.
- Our model seamlessly accommodates widely used formats of pre-recorded audio and video inputs. Users can define the number of speakers or rely on automatic recognition for enhanced flexibility.
- Experience industry-grade transcription quality in 10 languages: English, Spanish, French, Portuguese, Russian, Indonesian, German, Japanese, Turkish, and Italian.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.p3.2xlarge Inference (Batch) Recommended | Model inference on the ml.p3.2xlarge instance type, batch mode | $1.28 |
ml.p3.2xlarge Inference (Real-Time) Recommended | Model inference on the ml.p3.2xlarge instance type, real-time mode | $1.28 |
ml.p2.xlarge Inference (Batch) | Model inference on the ml.p2.xlarge instance type, batch mode | $0.38 |
ml.p2.xlarge Inference (Real-Time) | Model inference on the ml.p2.xlarge instance type, real-time mode | $0.38 |
ml.g4dn.4xlarge Inference (Real-Time) | Model inference on the ml.g4dn.4xlarge instance type, real-time mode | $0.51 |
ml.g4dn.16xlarge Inference (Real-Time) | Model inference on the ml.g4dn.16xlarge instance type, real-time mode | $1.82 |
ml.g5.xlarge Inference (Real-Time) | Model inference on the ml.g5.xlarge instance type, real-time mode | $0.47 |
ml.g5.8xlarge Inference (Real-Time) | Model inference on the ml.g5.8xlarge instance type, real-time mode | $1.02 |
ml.g4dn.2xlarge Inference (Real-Time) | Model inference on the ml.g4dn.2xlarge instance type, real-time mode | $0.32 |
ml.g5.4xlarge Inference (Real-Time) | Model inference on the ml.g5.4xlarge instance type, real-time mode | $0.68 |
Vendor refund policy
Please contact our support team: kanju@kanju.techÂ
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Features improve
- Improved: A word level with timestamps was added to the output model. The data will allow you to get better-quality subtitles.
Additional details
Inputs
- Summary
The model supports common audio and video input formats. Due to AWS restrictions on the size of the input data, we recommend converting video files to audio before passing to the model.
- Limitations for input type
- The maximum audio file size for real-time inference is 15MB, and for batch transform jobs, it is 75MB for each file. The recommended duration of one audio file for real-time inference is limited to 11 minutes for ml.p3.2xlarge and 7 minutes for ml.g4dn.xlarge.
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
'file' | Model input is a json request with the following payload:
'file' = base64 encoded audio
'language' = language of transcription ("auto" or a specific language, i.e. "en")
'num_speakers' = number of speakers ("auto" or a specific number, i.e. 2)
'f_name' = name of the input audio file | Type: FreeText | Yes |
'language' | Model input is a json request with the following payload:
'file' = base64 encoded audio
'language' = language of transcription ("auto" or a specific language, i.e. "en")
'num_speakers' = number of speakers ("auto" or a specific number, i.e. 2)
'f_name' = name of the input audio file | Type: FreeText | Yes |
'num_speakers' | Model input is a json request with the following payload:
'file' = base64 encoded audio
'language' = language of transcription ("auto" or a specific language, i.e. "en")
'num_speakers' = number of speakers ("auto" or a specific number, i.e. 2)
'f_name' = name of the input audio file | Type: FreeText | Yes |
'f_name' | Model input is a json request with the following payload:
'file' = base64 encoded audio
'language' = language of transcription ("auto" or a specific language, i.e. "en")
'num_speakers' = number of speakers ("auto" or a specific number, i.e. 2)
'f_name' = name of the input audio file | Type: FreeText | Yes |
Resources
Vendor resources
Support
Vendor support
If you have any questions about our product, please feel free to contact us. kanju@kanju.techÂ
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products
