
Sold by: Not managed
Open data
|
Deployed on AWS
N-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The n specifies the number of elements in the tuple, so a 5-gram contains five words or characters. The n-grams in this dataset were produced by passing a sliding window of the text of books and outputting a record for each new token.
Overview
N-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The n specifies the number of elements in the tuple, so a 5-gram contains five words or characters. The n-grams in this dataset were produced by passing a sliding window of the text of books and outputting a record for each new token.
Features and programs
Open Data Sponsorship Program
This dataset is part of the Open Data Sponsorship Program, an AWS program that covers the cost of storage for publicly available high-value cloud-optimized datasets.
Pricing
This is a publicly available data set. No subscription is required.
How can we make this page better?
We'd like to hear your feedback and ideas on how to improve this page.
Legal
Content disclaimer
Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.
Delivery details
AWS Data Exchange (ADX)
AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
Open data resources
Available with or without an AWS account.
- How to use
- To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more
- Description
- A data set containing Google Books n-gram corpora in a Hadoop friendly file format.
- Resource type
- S3 bucket
- Amazon Resource Name (ARN)
- arn:aws:s3:::datasets.elasticmapreduce/ngrams/books/
- AWS region
- us-east-1
- AWS CLI access (No AWS account required)
- aws s3 ls --no-sign-request s3://datasets.elasticmapreduce/ngrams/books//
Resources
Vendor resources
Support
Managed By
Not managed
How to cite
Google Books Ngrams was accessed on DATE from https://registry.opendata.aws/google-ngrams .
License
Creative Commons Attribution 3.0 Unported License
Similar products
Easily connect to Google BigQuery from AWS Glue
Easily connect to Google Cloud Storage from AWS Glue
This product has a fee associated with the provision and deployment of the application and AMI support. Plausible is a simple and privacy-friendly Google Analytics alternative.

This is Windows Server 2022 Base with Containers repackaged with Google Chrome and VSCode to quickly get you started with the Web and Productivity. Launch your Windows Server today and enjoy the new stable Operating System from Microsoft.

This is Windows Server 2022 repackaged with Google Chrome and VSCode to quickly get you started with the Web and Productivity. Launch your Windows Server today and enjoy the new stable Operating System from Microsoft.