Be in the know.

The Platfora blog.

Spark: the Bridge Across the Chasm

There is no question that Hadoop has achieved significant momentum in adoption by mainstream IT buyers over the last couple of years. But has it “crossed the chasm?” Answering that question requires both understanding what we mean by “the chasm” and understanding what is happening inside the businesses that are driving this momentum.

*The original version of this blog post is published on the IBM Big Data & Analytics Hub page as guest post here.

What Is the Chasm?

Geoffrey Moore introduced the idea of the chasm in the first edition of his book, Crossing the Chasm in 1991. Considered a modern business classic, the book has been frequently revised and updated since then, and Moore will tell you that the ideas in it are as relevant as ever. In it, he explains that when a new technology arrives, a small group of sophisticated and enthusiastic users adopts it right away. These Innovators and Early Adopters independently explore the new technology and figure out how it works.  

If all goes well, these first adopters are followed by a much larger group called the Early Majority. Members of the Early Majority lack the enthusiasm and, by and large, the technical know-how of the first two groups. They are less interested in experimenting than they are in solving a particular problem, or a set of problems. They need something that works out of the box.

Crossing the Chasm_Hadoop

It is no exaggeration to say that the Early Majority holds the fate of the new technology in their hands. If such a class of users emerges, the new technology is a success. If they don’t, the new technology fails. The gap between the Early Adopters and the Early Majority is what Moore calls the chasm.

Not Quite Ready to Cross

The parallels between what Moore describes happening in the early stages of technology adoption and what has occurred in recent years in the big data space are fairly obvious. Hadoop has benefited from the experimentation that both Innovators and Early Adopters have subjected it to within a growing number of organizations. And, of course, some of these organizations have grown far beyond experimentation into fully functioning production big data environments. For such environments to become the norm, big data solutions must evolve.

But in all too many businesses, access to and analysis of big data remains elusive to the get-the-the job-done users who should be making up the Early Majority. Analysis is still primarily the domain of Data Scientists and other power users. And, even when analysis of big data is made available to a broader group of users within the organization, it is not in the iterative, dynamic form these users require. To make things worse, bringing these new users on board creates new administrative and data prep burdens for the power users who got things started, limiting how effective they are in leveraging the big data environment.

Building the Bridge

What these businesses need in order to get to the other side of the chasm is Big Data Discovery, a technology that empowers a class of end users to leverage big data, expands the possibilities when working with it it, and increases the business value they can derive from it. Big Data Discovery requires an underlying distributed processing framework to support a variety of data and analytical processes: data preparation, descriptive analysis, search, predictive analysis, and more advanced capabilities like machine learning and graph processing. Moreover, businesses need a toolset that allows them leverage the existing skill sets of their employees. Until quite recently, there was no single processing framework that could fit all those criteria.  But Apache Spark has changed all that.

Spark is rapidly changing how businesses that rely on Hadoop “do” big data. With Spark, they can optimize, accelerate, and automate data preparation. This frees the power users from data prep drudgery (letting them get back to work on more sophisticated problems) while placing significant out-of-the-box advanced analytics capabilities into the hands of the power users who need it. Additionally, Spark simplifies technical proficiency requirements from expertise in MapReduce and Java to a basic understanding of database and scripting, allowing these businesses to draw from a much broader talent pool in implementing and managing their analytics environment. Spark also opens up new options for SQL access of Hadoop data while eliminating concerns about which Hadoop “distro” the business uses.

In other words, Spark not only provides the kinds of finished-product solutions that Early Majority adopters are looking for, it enables them to build them for themselves.

Across the Chasm

Getting across the chasm requires putting big data to work across the organization and truly tapping its potential to add value to, and often completely transform, the business. Companies that do Big Data Discovery powered by Spark are able to successfully integrate Hadoop into how they do business. You can learn more about Spark-enabled Big Data Discovery, and how Platfora is making it a reality, here.