Be in the know.

The Platfora blog.

Advanced Analytics Powered by Spark

Advanced Analytics with Spark and Platfora

There’s still time to download your complimentary chapter of Advanced Analytics with Spark, an O’Reilly guide to performing large-scale data analysis using Apache Spark.

Written by four data scientists from Cloudera ‒ each a leading authority on meeting data science challenges within a big data environment ‒ this book provides step-by-step instructions for implementing an advanced analytics environment using Spark, statistical methods, and real-world data sets. As an early adopter and supporter of the Spark framework, Platfora is excited to provide this valuable resource to the data analyst and data science communities.

Spark’s emergence as a processing framework for big data is a game-changer for many businesses. Although MapReduce is a core big data technology (one that isn’t going anywhere), it doesn’t address every situation. It is becoming increasingly clear that businesses that rely on Hadoop need a variety of infrastructures to best meet their analytics needs. In addition to data preparation, descriptive and visual analytics, they need predictive analysis as well as other advanced capabilities like machine learning and graph processing.

Not long ago, we learned that while 80% of our customers reported having data scientists on staff, only 20% reported that they are currently deploying advanced analytics across the enterprise. Powered by Spark, Platfora enables businesses to make the best use of both their data analyst and data scientist resources. Now they have a solution that meets them where they are, allowing them to leverage the skillsets and other resources they already have. This truly represents the future of big data analytics.

A Spark-enabled analytics environment enables three key capabilities related to advanced analytics:

  • Data Preparation At Scale
    One reason many organizations have not yet implemented advanced analytics is that their data scientists are too busy working on cleansing and preparing data for themselves and others. Spark enables fully integrated, simplified data preparation at scale ‒ freeing data scientists to prioritize their work.
  • Iterative Analysis
    The iterative nature of data science is often hampered by workflow and performance issues. Platfora provides integrated access to advanced analytics ‒ machine learning, predictive analytics, and graph processing ‒ by enabling data scientists to use Spark directly within the Platfora workflow.
  • Reaching Production
    The data science lab and the data analyst workbench are very different environments. Optimizing an analytics solution for one or the other can severely limit an organization’s analytics options. Platfora and Spark combine provide the sophisticated analytics capabilities that data scientists need with the simplified workflow that data analysts need.

There has never been a better time to incorporate advanced analytics capability into your analytics stack. Chapter 2 of Advanced Analytics with Spark is entitled “Introduction to Data Analysis with Scala and Spark;” this background will give you a head start in implementing your own advanced analytics initiative using Platfora and Spark.

Download your free copy today.