Wednesday, May 30, 2012

Big Data ... say what?

Everyone is talking about Big Data, which probably has 10 different definitions depending on who you are talking to, but no one is blogging.  Seems odd.

My goal is to have an open and honest look at what Big Data means and can offer, with a focus on who is doing it right.  My gut tells me that the promise of Big Data is more transformative than incremental, and that of the solutions and use cases one sees peddled very few lean in that direction.

Let's start with a few of the buzzwords frequently thrown around with Big Data:

  • data warehousing
  • business intelligence
  • KPI
  • structured/unstructured data (NOSQL)
  • data visualization 
  • machine learning
I've gone from least to most interesting here.  The reason I say that is even though the era of Big Data is and will continue to impact the DW and BI worlds, those are pretty limited and very costly things to do.  You are generally only going to see incremental gains ... and costly ones at that.  

That doesn't mean that DWs aren't going to need to reorganize in an age where we don't really ever have to throw away data, or BI tools aren't going to need the ability to access hadoop clusters for dealing with larger data sets.  They are.  

But there are fundamental problems with traditional BI, it is resource intensive and slow as a feedback loop.

Let's say we want to do a typical BI project.  We convene a group of product, technical, and BI folks to talk over an idea.  Say we want to increase the number of connections someone on a network has.  The product folks tell is that increasing connections will increase all things good (usage, revenue, etc) ... basically improve all our all of our KPIs.  

We then ask the BI/tech folks how we might do this, or feed them some ideas to investigate.  They come back with ideas, that are fed into models.  These models are tested/trained and if they seem good, launched.  We go back and unleash them on the world and see what happens.  Wash, rinse, repeat.

Let's be charitable as say these iterations takes months.  Now, let's look at what LinkedIn did.  

They exposed the data they had to end users and let them do the work.  It's called "People You May Know".  FB does something similar.  More people need to be working on gathering and exposing data to end users, who can do the required BI way better than any crack team of technologists.

In the end, it is less about the amount of data we have access to than getting that little nugget of information to the right person at the right time.