Avoiding a Data Swamp: Practical Data Lake Strategies

One of the promises of NoSQL databases is building Data Lakes: scalable, high volume, and flexible data repositories that bring together diverse sources of data with minimal upfront data prep. But it's quite easy for NoSQL stores to turn into Data Swamps instead: dead end repositories filled with data that is hard to understand and analyze.

In this talk, we will present strategies for designing and implementing a practical Data Lake that delivers on the promise by:

Using tools that allow subject-matter experts to quickly and easily get data into and out of a Data Lake
Leveraging industry standard models to richly describe data's semantics
Exposing targeted data feeds from a Data Lake via services that can be consumed by analysts
Tracking data lineage to understand where data comes from and what can be trusted

Sean Martin has been on the leading edge of Internet technology innovation since the early nineties. His greatest strength has been the identification and pioneering of next generation software & networking technologies and techniques. Prior to founding Cambridge Semantics he spent fifteen years with IBM Corporation where he was a founder and the technology visionary for the IBM Advanced Internet Technology group. Sean has an astonishing number of Internet "firsts" to his credit. These include inventing and implementing both IBM's first Web application server and content-manager, WOM, along with its distributed Web application hosting environment, the Womplex, not to mention a number of other important web site scaling technologies.