Wednesday, April 2, 2014

a workable Neo4J config @ AWS

A recent problem set has just been screaming for a graph db (large number of nodes, massive number of edges, shortest path traversal).

After a bit of research we decided that it was best to go with Neo4J over those running on Hadoop (Girpah, etc), at least to get started.Following was our rationale:
  • No one on our team has any graph db experience.
  • The Neo4J community is way more active that it's Hadoop based counterparts.
  • We already run a petty heterogeneous stack (R Studio/Server, Hive, Impala, Python).
  • The ability to show the graph off visually us somewhat important.
  • Our graph db won't see frequent updates, and usage will be minimal.
  • Hydrating the graph from our MYSQL db via python was pretty trivial.
  • Neo4J offers very easy REST API access.

So I went about getting Neo4J to go at AWS, here is a blow by blow of how I got it done.  

Don't follow the main Neo4J EC2 instructions (utilizing Cloud Formation), it was hosed trying to locate the AMI.  Go manual, you will thank me.


Notes / extra steps:
  • Neo4J will be installed in /var/lib/neo4j/
  • Neo4J start script is /etc/init.d/neo4j-service (stop/start/restart) ... this incorrectly noted in docs
  • Edit /etc/security/limits.conf and add these two lines:
neo4j   soft    no file  40000
neo4j   hard    nofile  40000
  • Neo4J is only accessible locally by default:
Edit /var/lib/neo4j/conf/neo4j-server.properties
uncomment this line
org.neo4j.server.webserver.address=0.0.0.0
  • Another thing
Edit /etc/pam.d/su
uncomment or add the following line
session    required   pam_limits.so
That should get you up and running, and accessible.  Here is a picture of a graph we created based on wikipedia categories and pages centered on "Machine Learning".  We are just getting our feet wet, but love what we are seeing!