Data relationships play a vital role in various forms of data analytics, structured machine learning, and data mining applications. Data architects have begun to realize these relationships are more naturally expressed as “Graphs”. We discuss how Graph Databases for enhanced analytics have become an integral part of the Hadoop technology stack.
Hadoop provides a well established framework for the distributed processing of large data sets across clusters of computers using simple programming models. However, Processing large graphs in a distributed computing environment presents multiple system-level challenges, from parallelizing the graph construction algorithms to achieving balanced system utilization. In addition, the resultant graph must be carefully partitioned to ensure that subsequent application processing is properly balanced.
Recognizing these challenges around Hadoop, Intel recently released “GraphBuilder”. GraphBuilder is a Java library for constructing graphs out of large datasets for data analytics and structured machine learning applications that exploit relationships in data. The library offloads many of the complexities of graph construction, such as graph formation, tabulation, compression, transformation, partitioning, output formatting, and serialization.
We’ll discuss a number of use cases that use Graphbuilder to extract graphs from Hadoop into a Graph Database and perform complex graph analytics. The use cases are in the domain of life sciences, fraud detection in on-line banking and risk mitigation in the construction industry.