The DARPA HIVE Program: Understanding Relationships with Data
Social media, sensor feeds, and scientific data generate large amounts of data and understanding the relationships between this data can be challenging. Graph analytics has emerged as a way to make sense of this allowing analysts to draw conclusions from the patterns in the data and to ask and answer questions, that they previously had been no hope of answering.
By understanding the complex relationships between different data feeds, a more complete picture of the problem can be understood. With lessons learned from innovations in the expanding realm of deep neural networks, the Defense Agencies Research Program Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) program seeks to advance graph analytics.
The DARPA HIVE program is looking to build a graph analytics processor that can process streaming graphs 1000X faster and at much lower power than current processing technology. This will provide the power to advance graph analytics to solve challenges in areas such as cyber security and infrastructure monitoring. In parallel with the development of the HIVE processor, DARPA is hosting the HIVE challenge to develop a trillion-edge dataset with solutions that will contribute to this initiative. The goal is to accelerate innovation in graph analytics to open new pathways for meeting the challenge of understanding an ever-increasing torrent of data.
Organizers will provide specifications, datasets, data generators, and serial implementations in various languages to participants. As part of the Challenge, AWS and DARPA have entered into a collaborative agreement, which represents the first Department of Defense (DoD) Agency to participate in the AWS Public Datasets program. Additionally, eligible researchers doing work with the DARPA HIVE Challenge are encouraged to apply for AWS usage credits via the AWS Cloud Credits for Research program.
There are two initial challenges:
- The first is a static graph problem focused on sub-graph Isomorphism. This provides the ability to search a large graph in order to identify a particular subsection of that graph.
- The second is a dynamic graph problem focused on trying to find optimal clusters of data within the graph. Both will have a small graph problems in the billions of nodes and a large graph problem in the trillions of nodes.