Enhancing Hadoop for GeoTemporal Graph Analytics

Data relationships play a vital role in various forms of data analytics, structured machine learning, and data mining applications. Data architects have begun to realize these relationships are more naturally expressed as “Graphs”. We discuss how Graph Databases for enhanced analytics have become an integral part of the Hadoop technology stack.

Hadoop provides a well established framework for the distributed processing of large data sets across clusters of computers using simple programming models. However, Processing large graphs in a distributed computing environment presents multiple system-level challenges, from parallelizing the graph construction algorithms to achieving balanced system utilization. In addition, the resultant graph must be carefully partitioned to ensure that subsequent application processing is properly balanced.

Recognizing these challenges around Hadoop, Intel recently released “GraphBuilder”. GraphBuilder is a Java library for constructing graphs out of large datasets for data analytics and structured machine learning applications that exploit relationships in data. The library offloads many of the complexities of graph construction, such as graph formation, tabulation, compression, transformation, partitioning, output formatting, and serialization.

We’ll discuss a number of use cases that use Graphbuilder to extract graphs from Hadoop into a Graph Database and perform complex graph analytics. The use cases are in the domain of life sciences, fraud detection in on-line banking and risk mitigation in the construction industry.

Jans Aasman started his career as an experimental and cognitive psychologist, earning his PhD in cognitive science with a detailed model of car driver behavior using Lisp and Soar. He has spent most of his professional life in telecommunications research, specializing in intelligent user interfaces and applied artificial intelligence projects. From 1995 to 2004, he was also a part-time professor in the Industrial Design department of the Technical University of Delft. Jans is currently the CEO of Franz Inc., the leading supplier of commercial, persistent, and scalable database products that provide the storage layer for powerful reasoning and ontology modeling capabilities for Graph based applications.