Avoiding a Data Swamp: Practical Data Lake Strategies
Share this Session:
  Sean Martin   Sean Martin
Cambridge Semantics


Thursday, August 21, 2014
09:30 AM - 10:00 AM

Level:  Technical - Introductory

One of the promises of NoSQL databases is building Data Lakes: scalable, high volume, and flexible data repositories that bring together diverse sources of data with minimal upfront data prep. But it's quite easy for NoSQL stores to turn into Data Swamps instead: dead end repositories filled with data that is hard to understand and analyze.

In this talk, we will present strategies for designing and implementing a practical Data Lake that delivers on the promise by:

  • Using tools that allow subject-matter experts to quickly and easily get data into and out of a Data Lake
  • Leveraging industry standard models to richly describe data's semantics
  • Exposing targeted data feeds from a Data Lake via services that can be consumed by analysts
  • Tracking data lineage to understand where data comes from and what can be trusted

Sean Martin has been on the leading edge of Internet technology innovation since the early nineties. His greatest strength has been the identification and pioneering of next generation software & networking technologies and techniques. Prior to founding Cambridge Semantics he spent fifteen years with IBM Corporation where he was a founder and the technology visionary for the IBM Advanced Internet Technology group. Sean has an astonishing number of Internet "firsts" to his credit. These include inventing and implementing both IBM's first Web application server and content-manager, WOM, along with its distributed Web application hosting environment, the Womplex, not to mention a number of other important web site scaling technologies.

Close Window