Monday, January 7, 2013

Why I switched to MapR and love EMR ...

A few months back I was approached by a colleague interested in using Hadoop/Hive to power our Tableau clients.  We played around for a bit and learned, the hard way, they the only way were were going to get this to go easily was switching to a MapR distribution.

Fortunately, EMR (Elastic MapReduce) makes this a snap.

./elastic-mapreduce --create --alive --hive-interactive --name "Hive Dev Flow" --instance-t
ype m1.large --num-instances 8 --hadoop-version 1.0.3 --hive-versions 0.8.1.6 --ami-version 2.3.0 --with-supported-products mapr-m3 

Literally, you just need to add the parameter (--with-supported-products mapr-m3).

Next you need to open up 8453 (and possibly some others) on your Hadoop master at EC2 (in Security Groups), and then run the MapR ODBC connector tool.  Now you are ready to rock (with any ODBC Hadoop client).  One noted caveat is that you can't run Gangila for stats, but as I noted in the EMR forum you can use the MapR Control Center to the same effect.

Now I have no idea if I will stick with MapR, but the ease of switching between Hadoop distributions, AMIs, versions, etc with EMR is pretty slick.  It did take a some trial and error to figure out a working Hadoop/Hive/AMI version combo ... but nothing too crazy.

I know things like Apache WHIRR are making strides, but for now I am very happy with EMR.

Peace,
Tom