Be in the know.

The Platfora blog.

The End of the Data Warehouse

Today is a major milestone on the Platfora journey. But it is more than that. Today we reach out beyond our early beta customers and share what we know is possible.

We’ve been living in the dark ages of data management. We’ve been conditioned to believe that it is right and proper to spend a year or more architecting and implementing a data warehouse and business intelligence solution. That you need teams of consultants and IT people to make sense of data. We are living in the status quo of practices developed 30 years ago — practices that are the lifeblood of companies like Oracle, IBM and Teradata.

When I ran product at Greenplum, we understood this reality. Working with brilliant folks like Joe Hellerstein (UC Berkeley) and Brian Dolan (then at Fox Interactive), the team developed practices to navigate around the outmoded approaches of the past. Joe coined the name ‘MAD Skills’ (Magnetic, Agile and Deep).

But we could only distort reality so far. At the end of the day it was still a big relational database. When the rubber met the road, DBAs were doing what they always do — designing data models, building ETL jobs, and tuning indexes and aggregates.

The insight for Platfora came a number of months after leaving Greenplum (post EMC acquisition). I’d been spending a lot of time thinking about Hadoop and why it was gaining so much momentum. Clearly it was cost-effective and scalable, and was intimately linked in people’s minds to companies like Google, Yahoo and Facebook. But there was more to it. Everywhere I looked, companies were generating more and more data — interactions, logs, views, purchases, clicks, etc. These were being linked with increasing numbers of new and interesting datasets — location data, purchased user demographics, twitter sentiment, etc. The questions that these swirling datasets would one day support couldn’t be know yet. And yet to build a data warehouse I’d be expected to perfectly predict what data would be important and how I’d want to question it, years in advance, or spend months rearchitecting every time I was wrong. This is actually considered ‘best practice’.

The brilliance of what Hadoop does differently is that it doesn’t ask for any of these decisions up front. You can land raw data, in any format and at any size, in Hadoop with virtually no friction. You don’t have to think twice about how you are going to use the data when you write it. No more throwing away data because of cost, friction or politics.

Which brings us to the insight.

In the view of the status-quo players, Hadoop is just another data source. It is a dumping ground, and from there you can pull chunks into their carefully architected data warehouses – their ‘system of record’. They’ll even provide you a ‘connector’ to make the medicine go down sweet. Sure, you are back in the land of consultants and 12-18 month IT projects, but you can rest easy because you know the ‘important’ data is safely being pumped into your multi-million dollar database box. Just don’t change your mind about what data or questions are important.

But lets go through the looking glass. The database isn’t the ‘system of record’ — it is just a shadow of the data in Hadoop. In fact there is nothing more authentic than all of that raw data sitting in Hadoop. With just a bit of metadata to describe the data, it’d be possible to materialize any ‘data warehouse’ from that data in a completely automated way. These ephemeral ‘data warehouses’ could be built, maintained, and disposed of with a click of a button.

Imagine what is possible. Raw data of any kind or type lands in Hadoop with no friction. Everyday business users can interactively explore, visualize and analyze any of that data immediately, with no waiting for an IT project. One question can lead to the next and take them anywhere through the data. And the connective tissue that makes this possible — bridging between lumbering batch-processing Hadoop and this interactive experience — are ‘software defined’ scale-out in-memory data marts that automatically evolve with users questions and interest. Enter… Platfora.

Through the looking glass, there is no need for a traditional data warehouse. It is an inflexible, expensive relic of a bygone age. It is time to leave the dark ages.