forked from awslabs/open-data-registry
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgoogle-ngrams.yaml
18 lines (18 loc) · 893 Bytes
/
google-ngrams.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Name: Google Books Ngrams
Description: N-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The n specifies the number of elements in the tuple, so a 5-gram contains five words or characters. The n-grams in this dataset were produced by passing a sliding window of the text of books and outputting a record for each new token.
Documentation: http://books.google.com/ngrams/
Contact: https://books.google.com/ngrams
UpdateFrequency: Not updated
Tags:
- aws-pds
- natural language processing
License: Creative Commons Attribution 3.0 Unported License
Resources:
- Description: A data set containing Google Books n-gram corpora in a Hadoop friendly file format.
ARN: arn:aws:s3:::datasets.elasticmapreduce/ngrams/books/
Region: us-east-1
Type: S3 Bucket
DataAtWork:
Tutorials:
Tools & Applications:
Publications: