Why Share Data?
This blog post is part of a series on “Open Data for Public Good,” a collaboration between the AWS Institute and AWS Open Data aimed at identifying emerging issues around open data and offering best practices for data practitioners.
As open data policies become commonplace, it is worth examining the history and value of open data, and discuss why we share it in the cloud.
The idea of sharing data dates back at least to the 1950s, when the International Council of Scientific Unions established World Data Centers to facilitate sharing of data among scientists. In recent years, governments have created open data policies that require government agencies to share data with the public.
In May 2013, President Barack Obama issued an executive order stating that government data should be open and machine readable by default, which has accelerated the adoption of open data policies at U.S. federal government agencies. The Sunlight Foundation has tracked the creation of 119 state and local open data policies in the United States. The move towards open data is global. More than 72 countries have signed the Open Government Partnership, which was created in 2011 to make governments more open – and there are no signs of this trend stopping.
The premise behind open data is that the best way to get value out of data is to make it available to as many people as possible. Bill Joy, co-founder of Sun Microsystems, famously said, “No matter who you are, most of the smartest people work for someone else.” If Joy’s Law, as it came to be known, is true, then it leads to the question: How can organizations benefit from all the smart people who do not work for them? Joy’s Law explains how open source software projects benefit from contributions made from smart people, regardless of where they work. In the context of open data, Joy’s Law tells us that there are many people who may be able to derive value from your data, if only they could get it. Open data policies are important because they seek to maximize the number of people who can derive value and find insights from data.
Open Data IN PRACTICE
In London, access to data is facilitating transportation for city commuters. When Transport for London opened up access to data, application developers and researchers used it to create more than 600 applications providing services to 42 percent of Londoners – from getting real-time traffic information to finding charging stations for zero emission capable (ZEC) vehicles in the city. Transport for London estimates that opening up data has resulted in saving up to £130 million per year.
In the United States, the US Census Bureau’s (USCB) move towards open data has empowered small business owners to obtain demographic and economic data about communities in which they hope to open or expand their business. Yet another open data resource, OnTheMap for Emergency Management allows people to assess the impact of natural disasters on the local workforce.
The US Geological Survey’s (USGS) Landsat program used to charge for access to imagery produced by Landsat’s Earth observing satellites. In 2008, USGS determined that charging for access to Landsat data was hindering research because scientists couldn’t afford to acquire Landsat data. Since making Landsat data free and open, its usage has increased dramatically, and insights from Landsat have generated an estimated $1.8 Billion in 2011. In northeastern Iowa alone, it is estimated that Landsat data helped improve agricultural outcomes, producing an estimated $858 million in economic value per year, by enhancing analysis of the quality of crops, allowing for proper allocation of irrigation resources.
Data for a Purpose: Working Backwards
Opening data can have a positive impact on the economy by spurring entrepreneurs and accelerating innovation, but it’s important to remember that just because some open data can drive economic activity, not all open data will drive economic activity.
When developing new services for our customers, we use what we call the “working backwards” method to make sure our projects meet real customer needs. The working backwards approach can be taken when considering what data to open up and how to go about it. Before opening up a dataset, it’s useful to try to determine who might want to access the data, what questions they would like to answer with the data, and what skills and tools data users are likely to have. Attempting to answer these questions can help inform the approach taken to share the data and to set appropriate expectations for how the data will be used.
Advantages of Sharing Data in the Cloud
If an organization needs to share large volumes of data, the traditional approach of making data available for download will likely have limited use. For example, using the traditional approach, sharing a 100TB dataset would require a data user to have 100TB of their own storage capacity to copy the data as well as the time to download 100TB of data. With a 50 Mpbs Internet connection, downloading 100TB of data would take around 203 days. These costs of acquiring data, both in time and money, are a barrier to research and innovation.
In the cloud, people can bring computing resources to the data instead of downloading data to their computing resources. When data is staged for analysis in a commercial cloud environment, anyone can analyze it without needing to download it or store it themselves. Data users can turn on as many servers as they need to interrogate or analyze the data in place. This means that the data can be accessed by anyone in the world regardless of how much storage space or computing capacity they have access to. When data is available in the cloud, data users get analyze it using a wide variety of computing resources including GPUs, high-memory instances, networking-optimized instances, and other analytical tools like Amazon Athena.
The amount of data generated by public and private institutions is enormous, but its utility is directly linked to its openness and ease of accessibility – both features that can be facilitated by the cloud. As these institutions open their data up, they empower economic and social entrepreneurs to turn it into useful information and build new services and products for the general public driving innovation. Thanks to open data, people have accurate information about the weather, transportation services, demographics in markets across the world, among other things. Thanks to the cloud, this information is reliable, secure, and is available at their fingertips.
FIND OUT MORE
AWS continually launches new services and features that allow data to be used in new ways, which means that data becomes even more useful in the cloud over time. AWS also reduces prices as it achieves economies of scale which makes data analysis less expensive and more accessible. You can learn how organizations are putting data to work by opening it on AWS at https://opendata.aws.
A post by Jed Sundwall, Manager, Open Data Program, AWS, and Maysam Ali, Content Manager, AWS Institute, AWS