Single Node CentOS Hadoop

Starting from a base install of CentOS 6.3, you’re only a few commands away from having a single-node hadoop cluster up and running.

* For filesystem type, ext4 seems to be the winner, so we suggest formatting your filesystem with it. This article gives benchmarks.
Install java:
The java build from the yum repos does just fine, so lets install that. If you wish to use another version for whatever reason make sure /usr/java/default is symlinked to it from wherever you extract it.

yum -y install java

Install hadoop rpm:We download the latest hadoop 1.0 rpm from apache.org and install it, I then delete the RPM for cleanup.

wget http://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/hadoop-1.0.3-1.i386.rpm && rpm -ivh hadoop-1.0.3-1.i386.rpm && rm -f hadoop-1.0.3-1.i386.rpm

Make /usr/java directory and symlink java install to /usr/java/default

mkdir /usr/java && ln -sf /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre /usr/java/default

hadoop-setup-single-node.sh

Answer “y” to everything like this:

[root@hadoop1 ssl]# hadoop-setup-single-node.sh
 
Welcome to Hadoop single node setup wizard
 
Would you like to use default single node configuration? (y/n) y
Would you like to format name node? (y/n) y
Would you like to setup default directory structure? (y/n) y
Would you like to start up Hadoop? (y/n) y
Would you like to start up Hadoop on reboot? (y/n) y
 
Review your choices:
Setup single node configuration    : y
Format namenode                    : y
Setup default file system structure: y
Start up Hadoop                    : y
Start up Hadoop on reboot          : y
Proceed with setup? (y/n) y

 

After this completes
After this completes everything will be configured and started except one little issue with the mapred permissions that causes the jobtracker not to properly start.

You will see this in the logfile: /var/log/hadoop/mapred/hadoop-mapred-jobtracker-hadoop1.example.motorrobot.net.log

2012-08-02 21:57:55,610 WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir(hdfs://localhost:8020/mapred/mapredsystem) because of permissions.
2012-08-02 21:57:55,610 WARN org.apache.hadoop.mapred.JobTracker: Manually delete the mapred.system.dir(hdfs://localhost:8020/mapred/mapredsystem) and then start the JobTracker.
2012-08-02 21:57:55,611 WARN org.apache.hadoop.mapred.JobTracker: Bailing out ...
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode="":hdfs:supergroup:rwxr-xr-x

Creating the mapred directory in hdfs and then changing ownership of it to mapred solves the problem.

Fix permissions and restart jobtracker:

su - hdfs
hadoop fs -mkdir /mapred
hadoop fs -chown mapred /mapred
exit

Validate our new setup

hadoop-validate-setup.sh --user=hdfs

If all goes well, things should look like this

12/08/02 19:48:56 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1881120768
12/08/02 19:48:56 INFO mapred.JobClient:     Map output records=2
teragen, terasort, teravalidate passed.
Cleaning the data created by tests: validate_deploy_1343951216
su -c '/usr/libexec/../bin/hadoop --config /usr/libexec/../etc/hadoop dfs -rmr -skipTrash validate_deploy_1343951216' hdfs

If it doesn’t hopefully the error is a little obvious, if not drop a comment.

(Comments)

Comments