How CAPMO uses AWS to Revolutionize the Construction Industry

Guest post by Robert Schweizer, Senior Software Engineer, CAPMO

The analog disaster of (not so) modern construction sites

The construction industry in Germany is second to last when it comes to digitalization. Only the fishing and hunting industry relies equally on analog processes and paper-based communication. This backwardness is certainly one of the causes for construction disasters like the never-ending tale of the “new” Berlin airport. Started in 2006 and originally intended to be opened in 2011, the project remains unfinished until today. Not only has it exceeded its due date, but it has also overdrawn its budget, both by more than twofold. While that is a very figurative example, it is also an exemplary one. On average, construction projects tend to cost 60% more and take 80% longer than anticipated. With more than 400k construction sites in Germany per year, the resulting financial damage can be easily estimated.

Interestingly enough, construction is the industry that generates the most data after the financial sector. But unlike the latter, it does not effectively collect, analyze, and use this data. Taking into account how other industries have benefitted from digitalization, it could revolutionize one of the oldest industries in the world. Accounting for more than 11% of global GDP and rising, this industry, if disrupted, could truly have tremendous effects. Seamless integrations with modern technologies and advancements in other fields, as well as newly gained insights from analyzing data and making it actionable, are examples we at CAPMO envision to realize with our digital solutions to drive construction.

Creating a uniform experience across many devices

Major construction sites often involve hundreds of people. With such a high number of individuals coming from very different professional backgrounds and working on entirely distinct aspects of the construction project comes not only a wide range of technical affinity, but also an abundance of devices.

To support a uniform experience on a large variety of those devices with a code base that is as isomorphic as possible, our web and hybrid mobile apps are built on top of React and React Native respectively. To cope with the enormous differences in performance of many handhelds, we try to offload as much computation as possible to our distributed server-side infrastructure. Some aspects, however, such as displaying and navigating huge construction plans for example, can obviously not be lifted from the device but merely optimized.

Rendering image pyramids to optimize construction plans

Many mobile devices – especially, but interestingly not at all limited to, older ones – cannot cope with displaying, zooming in and panning around on even moderately sized, vector-based construction plan PDFs. At the same time, it is crucial for our users to identify even the smallest details and accurately place markers on those plans. The larger the construction site, the higher the scale of its plans and thus the more detail is needed.

We needed a way to reduce the amount of data the device would need to hold in memory while still being able to provide the necessary level of precision and detail. Naturally, we started looking into well established concepts like map tiling, Deep Zoom, and image pyramids.

Sounds like something AWS Lambda could help with

We wanted to ensure a seamless experience for our users, so speed and parallelism seemed crucial to hide the heavy lifting we need to do in the background to convert, prepare, and slice uploaded plans. Quickly turning our heads to AWS Lambda, we soon came up with our initial workflow:

A user would upload a plan using our interface as before. Those plans were already stored on Amazon S3, which turned out to be a good starting point. Using S3 Events, we wanted to invoke our Lambda workflow whenever a new vector-based plan PDF was stored in the respective bucket.

To provide enough detail, we wanted to render six zoom levels with a base tile size of 256 pixels. The first Lambda function would rasterize the uploaded vector-based PDF into six accordingly sized source images, one for each zoom level, to extract all required tiles from. Completely underestimating the scale, we would then fan-out invoking one Lambda function per tile. Each one of those would load the source image for a respective zoom level and extract a single 256 pixel by 256 pixel excerpt. After all, parallelism would make the process super fast, right?

There were many more problems to come with this approach, but just to quickly illustrate the dimensions before moving on, the following formula calculates the overall number of tiles needed:

function getNumberOfTiles(zoomLevel: number) {
    return 2 ** (2 ** (zoomLevel-1))
}

[1,2,3,4,5,6].reduce((sum, zoomLevel) => sum + getNumberOfTiles(zoomLevel), 0)
// => 1365

With 1,365 tiles, a single plan would invoke 1,365 Lambda functions in parallel in the approach described above. This feels like a superabundant effort for a relatively small matter. With whatever concurrency limit you have, you will quickly reach it whenever another handful of plans would be uploaded and processed in parallel.

To make matters worse, we need to sync those tiles to our user’s mobile devices for offline access. To shrink the amount of data being sent over the wire, we would create a ZIP archive of all tiles of a plan. However, we needed to wait until all those Lambda functions rendered all their respective individual tiles. Without any orchestration mechanism in place, that meant the initial Lambda function that rendered all the source images and invoked all of the 1,365 tile extraction functions had to wait until even the last one would finish, before initiating yet another function to package them up. This could not be right.

Orchestration using AWS Step Functions

We reached to AWS Step Functions in order to orchestrate the entire process and make the amount of concurrency predictable and manageable. This turned out to be so much easier than expected, and so powerful. Using a rather new addition to Step Functions, Dynamic Parallelism, we were able to achieve two things: External orchestration to the entire workflow, enabling our Lambda functions to do one thing, thus reducing their scope, as well as limitation of the amount of Lambda functions running simultaneously to process a single plan.

Remember our initial assumption:

After all, parallelism would make the process super fast, right?

That actually turned out to be our drawback here. The Step Functions orchestration was great, but it turned our workflow into a slightly longer-running process because they were intentionally limiting the amount of parallel work. But having laid out the process into its separate, individual steps helped us realize how much work we kept repeating over and over again and thus to identify potential optimizations: Due to parallelizing on the tile level, we needed to fetch and load the respective zoom level source image for each individual tile. The higher the zoom level, the larger the source image, but also the more tiles to extract. So we were not only loading the source images over and over again, but the larger ones even exponentially more often.

What if we could instead speed up the process by reducing the number of concurrency and increasing the work of the individual functions? After all, Lambda is not only so powerful because of its ability for mass concurrency. A single function is actually quite a powerhouse in itself. I mean think about it, up to 3GB of RAM and proportional CPU power. My Nintendo 64 had approximately 0.15% of that computation power and I have found it to be quite impressive growing up. So, if that was good enough for the games at the time, the computational resources of a Lambda function should be good enough to rasterize a bunch of PDFs.

Layer for layer using layers

Lambda Layers. I mean, whats not to love about them. With the ability to augment the default Lambda runtime, our possibilities just grew exponentially. Sure enough, another iteration and two Lambda Layers later, our initial workflow of over 1,365 executions had transitioned into a single – yep, one! – execution:

Offloading most of the work into two C and C++ libraries, we were actually able to rasterize the PDF, extract all the tiles, and even package them up in a single execution by still cutting a third of the time and especially cost to 0.0075% of the original solution. However, there is still one drawback. As mentioned earlier, we are currently rendering six zoom levels but are already seeing plans, where another one or two could massively increase detail quality and thus user experience. But because each zoom level has double the dimensions of the previous one, the tile count grows exponentially. Adding just one more zoom level will add a whopping 4,096 additional tiles. And because the offloaded work takes as long as it takes and we need to wait for it to complete entirely before we can actually start uploading the individual tiles to S3 and serving them, this multiplies the rendering time by 4.5 times! Needless to say, I was hooked and curious to see how we could improve.

As described earlier, the image pyramids we are rendering consist of multiple layers representing individual zoom levels. So maybe a hybrid approach in which we invoke one Lambda per zoom level is the silver bullet in this scenario: Each rasterizing the original plan with the required dimensions and slicing it up into the respective number of tiles for that zoom level using the aforementioned C libraries.

This way, we are able to serve the lower, i.e. smaller, zoom levels faster and progressively add layers on to that. Orchestrating the entire process using step functions, we can still package all the tiles for sync, once they the entire rendering is complete.

So, best of both worlds?

You bet! We were able to substantially decrease the duration yet again. But more importantly, we are far more flexible and scalable, regarding the number of zoom levels we are able to render without having our user experience suffer from exponentially increasing waiting times.

On first sight, hyper-parallelism often seems intriguing for these workloads. But, for us, it meant we were doing some of the work over and over again. Controlling the concurrency by simply reducing it decreased performance accordingly. While that might be more sustainable, it’s also not scalable.

Taking that into account, our biggest learning was to look for the balance between the sheer force as well as the overhead of parallelism. To find a sustainable as well as scalable solution, we needed to identify the sweet spot of being fast yet also simplistic enough. For our particular use case, it meant doing more work per Lambda and parallelizing less, on a higher level.

AWS Lambda is a powerhouse. Using layers to augment the default runtime without slowing down cold starts exponentially increased our possibilities. Layers, paired with Step Functions to externally orchestrate single-purpose functions, make for an incredible trio. If you did not have the chance to check this powerful combination out yet, I encourage you to do so. Pick a workflow you think will benefit from the described advantages and see for yourself.

Building on AWS helps us at CAPMO to quickly go from idea to implementation, keep the iteration pace high and scale fast. If you are interested in reading more stories like this one, check out the engineering blog we just launched. Or, if you want to build solutions like these yourself and help us revolutionize the construction industry with digitalization and smart data analysis, don’t miss your chance to apply – we’re hiring!

AWS Startups Blog