In the know.

The Platfora blog.

Big Data Analytics – Reuniting Data Science and Big Data

1

When I first started hearing (and later talking) about big data and how it represented the next evolutionary step for data technologies, it was all about getting value out of the mountains of data that organizations were already collecting.  Over the past several years, though, the term has come to be used to refer to  storage and processing data, and and is less frequently used to describe the process of actually getting any value out of the data.  That’s a problem, because corporations have become the digital-age equivalent of hoarders: afraid to throw any data away lest they need it, but unable to make sense of the mess they have and thus get any value out of it.

Hadoop and related technologies weren’t primarily designed with storage and processing as their primary goal – quite the contrary.  They were designed to make sense of the mounds of data collected so users can actually do something with it all. The storage and processing capabilities  were merely an artifact of solving that problem.

With the term “big data” being stretched out beyond all recognition, the industry came up with a new name for the work of getting value out of the data: Data Science. Data science was quickly recognized as the true seat of innovation for an organization.  That was true in a sense – companies with strong data science teams have been more successful than their peers that lack such resources.

But there is a downside. It turns out that Data Science, at least as it’s practiced today, is very difficult to scale.  Individuals that understand things like the difference between heteroscedasticity and homoscedasticity rarely understand business problems, and the few who do are expensive, hard to keep, and unable (or unwilling) to teach others to do what they do.

What we need is something that acts as a bridge between the two worlds of big data and data science. At Platfora, we call that something Big Data Analytics. Big Data Analytics brings business analysts closer to the Data Scientists, allowing them to both interact with and make sense of their Big Data holdings. It provides the two groups something of a common language.  It also enables Data Scientists to present their findings back to the business in an easy-to-consume format.

The insights discovered can now be leveraged by the entire organization, not just the select few who understand things like scoring and modeling.  This has the side effect of making certain high-cost, low-value functions easy and fast in the world of Big Data, allowing those expensive data scientist resources to spend their time doing what they do best rather than reinventing the wheel.

When you’re evaluating your first (or next) Big Data project, make sure you include a tool that bridges the gap between your business and your technologists.  You shouldn’t have to wait 6-12 months to get a return on your investment. And the outcome should be approachable enough that your Business Analysts will be just as excited about it as your Data Scientists.  Big Data Analytics can bridge that gap and provide exactly those kinds of benefits. Trust me — you’ll be happy you tried it.

Keith McClellan leads up Federal Engineering at Platfora, and has been focused on Big Data and related technologies for most of his career. If you’re interested in his random musings, he tweets @keithmcc and occasionally writes for the Platfora blog, as well as other publications.