Zerobase creates private, secure, and automated contact tracing using Amazon Neptune
This is a guest post from the Zerobase Foundation. In their own words, “The Zerobase Foundation is a nonprofit organization whose mission is to build free, open-source public health technology for the good of communities around the world. Zerobase’s privacy-first contact tracing platform empowers individuals, communities, and local officials to stop the spread of COVID-19.”
As the COVID-19 pandemic runs its global course, widespread testing for the virus alone won’t be sufficient to contain its spread. Tests are lagging indicators; by the time someone tests positive, they might have already exposed dozens of others, even before developing symptoms. Communities are looking to implement effective contact tracing, which is a crucial part of modern pandemic response that allows each test to apply not just to one person, but to the hundreds they may have exposed, directly or indirectly.
Contact tracing is the identification of disease-spreading interactions between individuals in a community. When someone tests positive, tracing helps pinpoint the others, called contacts, they may have infected, so that the contacts can self-isolate or get tested, thereby preventing the further spread of disease.
Historically, contact tracing has been a labor-intensive and manual process. That’s because contact tracing has typically only been needed at a local scale—such as during an outbreak of respiratory diseases like pertussis (whooping cough) in a particular town—or because it requires highly sensitive communication with contacts—such as while tracing the spread of HIV. In these cases, public health officials call down a phone list or even travel door to door, interviewing those that have tested positive to see whom they may have infected.
COVID-19 has rendered manual contact tracing impossible. The virus is highly contagious, spreading transparently between individuals and even indirectly through contact with infected public objects like ATMs. Worse, studies show that an infected person is most contagious in the days before they develop symptoms—if they develop them at all—meaning that there is an extremely narrow critical window in which isolating someone who may have been exposed can be helpful. For more information, see the following studies:
- Temporal dynamics in viral shedding and transmissibility of COVID-19
- Temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by SARS-CoV-2: an observational cohort study
- Transmission of 2019-nCoV Infection from an Asymptomatic Contact in Germany
- Asymptomatic Transmission, the Achilles’ Heel of Current Strategies to Control Covid-19 Presymptomatic Transmission of SARS-CoV-2 — Singapore, January 23–March 16, 2020
Because of this, even if we could do massive-scale manual contact tracing, it would be ineffective. If a public health official called you after you tested positive, you could never hope to name everyone you might have exposed at your grocery store, pharmacy, or bank.
Conducting effective contact tracing for a highly contagious disease such as COVID-19 requires a more sophisticated, digital approach. Zerobase is a privacy-first, universally accessible COVID-19 contact tracing platform built on a Quick Response (QR) code mesh network that is deployed at the community level (towns, cities, regions, or health systems) to slow the spread of the coronavirus. Zerobase is free and open source, licensed under the Apache 2.0 License. You don’t download an app, Zerobase doesn’t monitor GPS location or Bluetooth interactions, and everyone can enroll immediately. In fact, Zerobase is working with public health experts to develop robust systems for individuals who do not have access to smartphones, for example carrying or downloading to feature phones anonymous QR codes that can be scanned at participating locations. The same privacy protections – like anonymity by default and robust data security – can be expected by every Zerobase user, smartphone-carrying or not. A user takes a few seconds to scan a code at a participating location in exchange for the peace of mind that they are never tracked and that the information they transmit is only the information they provide.
Our solution runs on AWS, and our fundamental data structure—the contact graph—is stored in Amazon Neptune, a powerful, fully managed graph database service. Data are encrypted in transit and at rest using industry standard data security procedures, including relying on AWS’s robust suite of data protection layers. With Zerobase, you can receive notifications of possible exposures in real time, and public health officials are empowered with insights about the current and future dynamics of the virus, which helps contain outbreaks and plan for a safe return to normalcy.
How Zerobase contact tracing works
Partnering with local governments, Zerobase asks essential businesses like grocery stores and pharmacies to post Zerobase QR codes at their entrances and points of sale. Each QR code is unique and is associated with that specific business. When you visit the location, you scan the QR code with your smartphone’s camera, for example as you check out with your groceries. This prompts the Zerobase website to open in a browser, which associates a random, anonymous identifier that persists in browser storage with the specific QR code that was just scanned. As more individuals scan codes and therefore associate their phones with check-ins, we can build up a fine-grained, anonymous contact network around a community. You never install an app, no software runs in the background, and your GPS locations and Bluetooth interactions are never tracked. And individuals without smartphones can simply carry anonymous QR code printouts with them to be scanned by any employee when they visit a participating location, and these individuals can be alerted to possible exposure via phone, email, or through trusted institutions, should they opt in to these communications. Zerobase is the only tracing solution that is universally accessible in this way.
One challenge in anonymous contact tracing is ingesting verified positive diagnostic information into the system. Apps developed thus far must rely either on self-reporting test results or on healthcare workers to perform laborious data entry. Our use of QR codes also offers an attractive way to avoid the inaccuracy and inefficiency with these approaches.
Zerobase provides unique, single-use QR codes on paper stickers to COVID-19 testing facilities. When a patient arrives for a test, they receive one of these codes, which is attached to their basic intake form. They scan the code, associating their device (and all its past check-ins) with that particular form and test. When their test comes back (whether minutes or days later), a physician, nurse, or other responsible healthcare professional can retrieve the patient’s form from their medical record and scan the attached code, thus linking their device (and its trace) with a verified positive diagnosis. Because these special codes are treated as sensitive medical materials (similar to prescription drugs or patient records), the facilities’ built-in security infrastructure and chain of custody protocols can be used without modification to prevent tampering with the testing codes.
When a positive test is associated with a device, the race begins. Zerobase uses Neptune to query efficiently for the locations that the patient checked into and the other individuals who may have been there at the same time or shortly thereafter. These individuals are known as first ring exposures, and they can receive a request—still anonymously—to isolate or get tested the next time they check in at a location.
But Zerobase can go even further. We can calculate real-time risk scores for both individuals and locations that take into account the totality of the community contact graph—not just the first ring—by leveraging more advanced machine learning (ML) algorithms developed through collaborative research with the Max Planck Society. Our models show that the contact tracing protocol and ML techniques developed and used by Zerobase could flatten the curve in a community up to 60%, proving effective even at levels of adoption under 50%.
Projection showing the potential impact of Zerobase contact tracing in reducing the number of infected people in a community
By deploying these community-level ML models on the AWS ML stack of Neptune, Amazon Simple Storage Service (Amazon S3), and Amazon SageMaker, we go beyond what an app using Bluetooth on a phone could do; because such apps generally only record pairwise data between a single device and its contacts, they can’t deliver actionable, community-level insight that can help flatten the curve.
Event-driven architecture and Neptune
We took a domain-driven approach to the system architecture. The primary data store backing our API is Amazon DynamoDB, in which we use a single-table design. DynamoDB has high availability and consistent, low-latency interactions at scale that are ideal for the throughput that large-scale contact tracing deployments require. Its low operational overhead is also key to reducing the number of engineers required to maintain it. Amazon DynamoDB Streams enables the domain-driven, decoupled architecture we need for asynchronous processing and ETL so that we can process tracing data as a graph. This approach makes sure we only pay for the compute time we require and allows us to filter and transform the data before it lands in the graph database. The following diagram shows our event-driven architecture on AWS.
One of the vital components of large-scale digital contact tracing is a real-time, efficient updating mechanism for each individual’s risk score. Calculating this value for each person in a large graph is extremely intensive, because every member of a community might influence every other person’s risk, either directly or indirectly. Performing this update using a traditional relational database would be challenging; it would require recursive queries and tabular graph traversals. For this reason, we must use a graph database designed to support exactly the sort of traversal queries and whole-community analytics we need to protect a large group of interconnected people.
We chose Neptune because it is a powerful graph database that is secure, performant, and analytics-friendly. In our model, each user node is connected to a device node. When a device checks in to a location, an edge forms between that device and a scannable (a QR code), which is associated with a particular site (a physical store) and linked organization (a corporate entity). Neptune allows us to store these rich relationships between users, check-ins, and locations to derive insight about the spread of the virus. The following diagram illustrates our backend ERD.
With Neptune, we write queries in Gremlin that perform complex traversals efficiently to update risk scores based on new check-in or test result data. Neptune quickly propagates these risk index evaluations through the whole graph so we can get a near-real-time view of the spread of the virus. For example, we can apply our predictive ML models using tools like Amazon SageMaker to forecast where new hot spots might emerge by looking for clusters of high-risk individuals. Performance is key—every minute that we wait for a database to finish updating a pre-symptomatic person’s risk score is an opportunity for them to expose others. Neptune’s performance delivers on this critical task and allows us to send real-time notifications that keep communities safe.
Another reason we chose Neptune is because of its rock-solid security. Zerobase is built to ensure participants’ privacy—it’s a founding principle of our organization and our platform. Neptune provides multiple levels of security out of the box, including network isolation using Amazon VPC, support for Identity and Access Management (IAM) authentication for endpoint access, HTTPS encrypted client connections, and encryption at rest. On a secure Neptune instance, data in the underlying storage is encrypted, as are the automated backups, snapshots, and replicas in the same cluster. These security guarantees are one reason the users of our HIPAA-compliant system trust that their privacy is protected.
The road ahead
We’re still in the early days of fighting COVID-19. Zerobase is currently conducting trials of the system in New Hampshire and Cologne, Germany.
As powerful as Zerobase is, no single technology will be enough to eliminate COVID-19. Zerobase is a key member of the eventual multimodal constellation of solutions that will be needed to contain the virus. As such, Zerobase’s QR-based API framework is designed to integrate seamlessly into any contact tracing system or app that is widely deployed in an area, augmenting a community’s response in two critical ways. First, it pushes a system to 100% inclusiveness, because everyone–not just those with the most advanced smartphones–can be protected. Second, it enhances the app’s effectiveness with Zerobase’s machine learning models that go beyond simple proximity sensing and provide actionable predictive insight to community officials.
As we fine-tune Zerobase, we’ll partner with universities, businesses, other tracing app developers, and government and public health officials to better understand, respond to, and ultimately help stop the spread of the virus.
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
About the Authors
Aron Szanto is CEO and Founder of the Zerobase Foundation.
David Harris is Principal Architect at the Zerobase Foundation.