AWS Public Sector Blog
IRS 990 Filing Data Now Available as an AWS Public Data Set
We are excited to announce that over one million electronic IRS 990 filings are available via Amazon Simple Storage Service (Amazon S3). Filings from 2011 to the present are currently available and the IRS will add new 990 filing data each month.
Form 990 is the form used by the United States Internal Revenue Service (IRS) to gather financial information about nonprofit organizations. By making electronic 990 filing data available, the IRS has made it possible for anyone to programmatically access and analyze information about individual nonprofits or the entire nonprofit sector in the United States. This also makes it possible to analyze it in the cloud without having to download the data or store it themselves, which lowers the cost of product development and accelerates analysis.
Each electronic 990 filing is available as a unique XML file in the “irs-form-990” S3 bucket in the AWS US East (N. Virginia) region. Information on how the data is organized and what it contains is available on the IRS 990 Filings on AWS Public Data Set landing page.
Users of the data can easily access individual XML files or write scripts to organize 990 data into a database using Amazon Relational Database Service (Amazon RDS) or into a data warehouse using Amazon Redshift. Amazon Elastic MapReduce (Amazon EMR) can also be used to quickly process the entire set of filings for analysis.
Collaborating with the IRS allows us to improve access to this valuable data. Making machine-readable data available in bulk on Amazon S3 is an efficient way to empower a variety of users to analyze the data using whatever tools or services they prefer. We look forward to seeing what new services people are able to create to analyze the 990 filing data available on Amazon S3.