AWS Government, Education, & Nonprofits Blog

Achieving Sustainable Development in Africa with Open Data

Achieving sustainable development and addressing local and national needs requires access and analysis of large datasets and the use of complex tools and algorithms. This creates barriers for many users, particularly for communities in developing countries where resources for data storage and data analysis are limited.

The African Regional Data Cube (ARDC), a new data management technology developed on AWS, hopes to address this challenge by building capacity in those communities to access large datasets in support of their local and national needs framed around the Sustainable Development Goals (SDGs) and the broader development priorities. It seeks to create a standardized and free system that can be used by global users to access, process, and analyze key data for sustainability. It does this by working closely with five different countries in Africa – Ghana, Kenya, Senegal, Sierra Leone, and Tanzania – to understand their data-related challenges and respond to these needs with improved access and use of geospatial data.

The ARDC launched in May 2018 and it is based on the Open Data Cube (ODC) infrastructure, which has been successfully demonstrated in Australia, Colombia, and Switzerland and is under development in many other countries.

The Technology Behind the Cube

The African Regional Data Cube includes leadership and contributions from The Global Partnership for Sustainable Development Data (GPSDD), the Committee on Earth Observation Satellites (CEOS), Amazon Web Services (AWS), the Government of Kenya, Strathmore University, and the Group on Earth Observations (GEO).

The African Regional Data Cube (ARDC) team was awarded a two-year Amazon Web Services grant through the Strathmore University, and is leveraging AWS Cloud Credits for Research to prototype and implement the data cube on AWS. The ARDC uses a combination of AWS Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2) to host satellite data cubes, Open Data Consortium software, and several domain-specific application algorithms. The project plans to use other AWS services including: “spot” on-demand processing, Lambda functions, elastic EC2 load balancing, and Amazon WorkSpaces.

The First ARDC Implementation

The first implementation of the ARDC focused on enabling access to satellite observations, specifically for Landsat and Sentinel data. As a result, users in each of the five African countries involved in the project can now easily access satellite data and study their land and water resources. One of the key advantages of the ARDC is the ability to use the complete time series of satellite data to detect changes over time, and gain a better understanding of the impacts of these changes to support decisions that could preserve and sustain natural resources for the future.

Since the ARDC launched, several detailed analyses of land-based changes are yielding new insight into country sustainability problems and creating a new desire for further research. An example includes the analysis of urbanization changes in the Dadaab Refugee Camps in Kenya.

The African Regional Data Cube team is working with the United Nations and the Government of Kenya to leverage ARDC to investigate refugee camps in Dadaab, Kenya, and assess the level of “urbanization” or expansion of these camps. Dadaab, a site of the United Nations High Commission for Refugees, is the second-largest refugee camp in the world. Identifying refugee camps using satellite imagery is non-trivial, because it is difficult for the satellite algorithms to differentiate those areas from normal bare soil regions. The ARDC team is leveraging Landsat data on AWS and cloud computing to quickly and efficiently implement and test several algorithms to optimally map refugee camp extent. This will drive better planning and execution. Other examples of current uses of ARDC include studies of urbanization and deforestation in the Chenene Forest Reserve, Kenya, and detection of illegal mining in Ghana.

The images above show Fractional Cover (FC) in 2000 (left) and 2017 (right). FC is an iterative algorithm that classifies every pixel as a fraction of bare soil (RED), photosynthetic vegetation (GREEN) and non-photosynthetic vegetation (BLUE). FC can be used to identify areas that have moved from vegetated (GREEN) to non-vegetated (BLUE or RED). Illegal mining in Ghana starts with deforestation and then ends with small bodies of water on the surface due to underground aquifer exploitation. Mining areas are easy to locate, as the dense forest (green) is replaced with bare soil or contaminated water (red) or non-photosynthetic vegetation (blue). This change is evident in the two images from 2000 to 2017.

The African Regional Data Cube plans to expand its datasets to include more recent satellite data acquisitions (into 2018), add new data for Sentinel-1 (radar) and Sentinel-2 (optical) by early 2019, and explore access to weather forecast data. In addition, the ARDC plans to have a second training session in Africa in late 2018 or early 2019 to expand the capacity of local communities and broaden its understanding about needs for new algorithms and decision-making products to address key sustainability issues.