Listing Thumbnail

    NIH NLM NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS

     Info
    Open data
    |
    Deployed on AWS
    PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal article at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The PubMed Central (PMC) Article Datasets include full-text articles archived in PMC and made available under license terms that allow for text mining and other types of secondary analysis and reuse. The articles are organized on AWS based on PMCID and version number:<br/><br/> The PMC Open Access (OA) Subset, which includes all articles in PMC that are available for reuse based on terms specified by the publisher. The majority of available articles have machine-readable Creative Commons license<br/><br/> The Author Manuscript Dataset, which includes all articles collected under a funder policy in PMC and made available in machine-readable formats for text mining. NOTEL Author manuscripts with Creative Commons licenses are also part of the PMC Open Access Subset.<br/><br/> The Historical OCR Dataset, w[...]

    Overview

    PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal article at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The PubMed Central (PMC) Article Datasets include full-text articles archived in PMC and made available under license terms that allow for text mining and other types of secondary analysis and reuse. The articles are organized on AWS based on PMCID and version number:

    The PMC Open Access (OA) Subset, which includes all articles in PMC that are available for reuse based on terms specified by the publisher. The majority of available articles have machine-readable Creative Commons license

    The Author Manuscript Dataset, which includes all articles collected under a funder policy in PMC and made available in machine-readable formats for text mining. NOTEL Author manuscripts with Creative Commons licenses are also part of the PMC Open Access Subset.

    The Historical OCR Dataset, which includes historical articles, published in the 18th, 19th, and 20th centuries, scanned as part of an NLM digitization project, that have Creative Commons licenses. NOTE: These articles are also part of the PMC Open Access Subset.

    These datasets collectively span more than half of PMC's total collection of full-text articles. PMC enables access to these datasets to expand the impact of open access and publicly-funded research; enable greater machine learning across the spectrum of scientific research; reach new audiences; and open new doors for discovery. The bucket in this registry contains individual article versions in NISO Z39.96-2015 JATS XML format as well as in plain text as extracted from the XML and the full article PDF. Media files and supplementary material files are also available for all open access articles. The bucket is updated continuously with new and updated articles. Also includes JSON metadata objects for each article version and a CVS inventory file.

    Features and programs

    Open Data Sponsorship Program

    This dataset is part of the Open Data Sponsorship Program, an AWS program that covers the cost of storage for publicly available high-value cloud-optimized datasets.

    Pricing

    This is a publicly available data set. No subscription is required.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    AWS Data Exchange (ADX)

    AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.

    Open data resources

    Available with or without an AWS account.

    How to use
    To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more 
    Description
    .xml, .txt, and PDF files with the full-text of articles; media and supplementary materials files; all located in a public S3 bucket
    Resource type
    S3 bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::pmc-oa-opendata
    AWS region
    us-east-1
    AWS CLI access (No AWS account required)
    aws s3 ls --no-sign-request s3://pmc-oa-opendata/

    Resources

    Support

    How to cite

    NIH NLM NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS was accessed on DATE from https://registry.opendata.aws/ncbi-pmc .

    Similar products