I've always claimed that I have more complexity issues, tying together large data sets, than actual Big Data problems. I run a smallish 8 node ~50GB EMR cluster at Amazon, processing a few hours daily. My issues are generally around throughput, getting data in and out of my cluster in a reasonable time (say < 2hrs).
Then I watch the BigQuery guys from Google doing a full table scan on a 500TB table in a minute. Ouch.
In session after session numbers in the 100s of Terabytes or even Petabytes were being geek dropped, dwarfing anything I've seen here in our local Chicago Hadoop/BigData/Machine Learning groups. There were interesting discussions just about moving that volume of information around.
What I liked most what the interest in extracting value (information) from data, not necessarily in creating the biggest pile. I think the R folks (from Revolution Analytics) were spot on here. I need to spend some time there.
It was also nice to see the lack of BI tool vendors pitching how their wares could "integrate" in some lame ass way to a Hadoop cluster (JDBC anyone???). This sometimes infects Big Data gatherings and is insufferable. So happy to avoid it.
A friend in sales recently asked ...