AWS Public Sector Blog
Urban Climates: Calculating the Sky View Factor for the Netherlands
Imagine you are on a city street surrounded by skyscrapers. When you look up to the stars, you will only see the part of the sky that is not blocked by buildings. This type of vision is called sky view factor (SVF).
The SVF denotes the ratio between the radiation received at a point on Earth and the available radiation for a hemisphere over that point. Dr. Andrea Pagani, a data scientist at the DataLab of the Royal Netherlands Meteorological Institute, has been looking at how to efficiently compute the SVF for the Netherlands.
Read on to learn more about Dr. Pagani’s work to help us understand what’s between us and the sky – and why that matters.
Q1. What is the Sky View Factor and why is it important?
The Sky View Factor is the ratio at a certain point in space between the visible sky and the entire hemisphere centered at that point. To put it simply, think about laying on the ground and looking up around you; if you see just the sky with no impairments you will have a full SVF (SVF is 1), but if you have obstacles around that do not allow you to have a full view of the sky, then you will have a lower SVF.
The SVF is important for many applications since it is a measurement that can be used as a proxy for radiation, which influences air temperature and other related weather phenomena. SVF is a key ingredient to study urban climate and urban heat island effects, elements that are important for citizen well-being.
Q2. What are the technical challenges in calculating the SFV?
We decided to perform the computation with an openly available package in R that is fit for the purpose. The main challenges come from handling the digital elevation model data, which is large. The data we needed for our analysis is roughly 1.5TB in a highly compressed format. When working with data of this volume, we are limited by how much data we can fit into available memory and the time required for the computation.
Q3. What tools did you use to meet these challenges?
R was the main environment to perform the SVF computation. In addition, we made a few proofs of concept to understand how to ease data access and preparation. One proof of concept consisted of using a data cube to ingest the data (converted to a raster format) and then using the data cube to facilitate data access, rather than directly using the data files. A datacube is a time series of multi-dimensional (space, time, datatype) spatially aligned raster images which are ready for analysis. Traditionally, working with such images would place a significant burden on the researcher, as each dimension would require different approaches to extracting and subsetting data. A datacube removes this burden by providing a software interface that enables the same extracting and subsetting actions, regardless of dimension.
Q4. Can you describe the solution reached and what AWS technology was used?
AWS was key to achieve a computation in a reasonable time and store the input and output data. The idea was to exploit the distributed computation capabilities that R libraries provide basically out of the box. The setup utilized a number of the multicore machines that are available in Amazon Elastic Compute Cloud (Amazon EC2) (2 units of m4.4x and 3 units of r4.4x). In total, 80 cores were used simultaneously. For access to the data and the storage of the results from all of the machines, the Amazon Elastic File System (Amazon EFS) technology was useful. With EFS, we didn’t have to worry about the size and if machines could read and write simultaneously, because it is a shared storage system. Finally, we used Amazon Simple Storage Service (Amazon S3) to share the results with the community.
Q5. What was the outcome and what were the lessons learned?
We managed to compute the SVF for all of the Netherlands at a 1-meter resolution. Unfortunately, we have found a bug in the R library that produces inconsistent results in some digital elevation model circumstances. When this is fixed, we will re-compute the SVF for the affected regions. As it turns out, it’s good that we found this issue because now the whole community can benefit from an improved software library. Of course, because we are using AWS, it is a straightforward task to switch back our workflow stack back on for the new computations. I think we learned a lot about performing these computations.
Q6. What were the highlights for you?
The first one is geeky: watching five 16-core machines working at 100% of their CPU capacity is just cool! The second was overcoming the relatively mild learning curve needed to perform computations on AWS. There is excellent documentation on how to use AWS on the web. The documentation on mounting the Amazon EFS was particularly useful for us. The third is more of a wish. Having tested data cube-like solutions, I hope that the geospatial data community transitions to the data cube world, and will move away from the current file-based solutions, as it would greatly facilitate data access and processing. For data science problems, you don’t always need to download a multi-GB geospatial file to extract just a few interesting points for your analysis. Distributed data access services that enable access to just relevant data – for example web-service technologies that provide for the subsetting of data in space and time – are promising and needed.
Q7. What is next for SVF?
At a future date, when the bug in the R library we used is fixed, we intend to compute a fresh SVF dataset and make it available on the Amazon Registry of Open Data on AWS.
The digital elevation model for the Netherlands gets updated every few years, so we want to compute SVF again with the new dataset as soon as it is ready. Our meteorologists would like to have detailed SVF information (even at higher resolution) for specific key locations. Our goal is to be able to understand the effects of shadowing on weather sensors located along motorways.