
Overview
Japanese dictionaries and pre-trained models (word embeddings and language models) for natural language processing. SudachiDict is the dictionary for a Japanese tokenizer (morphological analyzer) Sudachi . chiVe is Japanese pretrained word embeddings (word vectors), trained using the ultra-large-scale web corpus NWJC by National Institute for Japanese Language and Linguistics, analyzed by Sudachi. chiTra is a library for using large-scale pre-trained language models with the Japanese tokenizer SudachiPy.
Features and programs
Open Data Sponsorship Program
Pricing
This is a publicly available data set. No subscription is required.
How can we make this page better?
Legal
Content disclaimer
Delivery details
AWS Data Exchange (ADX)
AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
Open data resources
Available with or without an AWS account.
- How to use
- To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more
- Description
- SudachiDict: Binary format of the mophological analysis dictionaries chiVe: Pretrained word embedding in various formats
- Resource type
- S3 bucket
- Amazon Resource Name (ARN)
- arn:aws:s3:::sudachi
- AWS region
- ap-northeast-1
- AWS CLI access (No AWS account required)
- aws s3 ls --no-sign-request s3://sudachi/
- Description
- Cloudfront CDN mirror
- Resource type
- CloudFront distribution
- Resource link
- d2ej7fkh96fzlu.cloudfront.net
- AWS region
- ap-northeast-1
Resources
Vendor resources
Support
Contact
Managed By
How to cite
Sudachi Language Resources was accessed on DATE from https://registry.opendata.aws/sudachi .
License
Apache-2.0