8 December 2010 / HADOOP

Installing Hadoop Cluster

I have four VM machines in dev and I want to configure my own hadoop cluster to use as a tool and analysis.

I’m going to follow the general process out lined by hadoop’s instructions and yahoo help, here and here.

This is what the final setup will look like

HadoopCluster-Dev

Prework

I found that hadoop has default ports that need to be opened between servers before it will work.

# Add local ssh support on each machine
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

# Configure master to talk to each slave.
a.brlamore@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub a.brlamore@slave

# Configure each slave to talk to the master.
a.brlamore@slave:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub a.brlamore@master

Important Directories

Directory	Description	Suggested location
HADOOP_LOG_DIR	Output location for log files from daemons	/var/log/hadoop
hadoop.tmp.dir	A base for other temporary directories	/tmp/hadoop
dfs.name.dir	Where the NameNode metadata should be stored	/home/hadoop/dfs/name
dfs.data.dir	Where DataNodes store their blocks	/home/hadoop/dfs/data
mapred.system.dir	The in-HDFS path to shared MapReduce system files	/hadoop/mapred/system

Configuration

Hadoop Home

HADOOP_HOME is set to /u01/accts/a.brlamore/tmp/hadoop-0.21.0
I’ve put in a request for root access so I can change this to /opt/hadoop

Edit Slaves file

vi /conf/slaves

ali-graph002.devapollogrp.edu
ali-graph003.devapollogrp.edu
ali-graph004.devapollogrp.edu

Site Configuration

Set the JAVA_HOME in conf/hadoop-env.sh to export JAVA_HOME=/usr/java/default
Set values in conf/core-site.xml

<configuration>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/u01/accts/a.brlamore/tmp/hadoop-datastore/hadoop-${user.name}</value>
    <description>A base for other temporary directories. Default location /tmp/hadoop-${user.name}. Suggested Location /tmp/hadoop</description>
</property>
<property>
    <name>fs.default.name</name>
    <value>hdfs://ali-graph001.devapollogrp.edu:8020</value>
    <description>The name of the default file system. This specifies the NameNode</description>
</property>
</configuration>

Set values in conf/hdfs-site.xml

<configuration>
<!--
<property>
    <name>dfs.name.dir</name>
    <value>/u01/accts/a.brlamore/tmp/path/to/namenode/namespace/</value>
<description>Where the NameNode metadata should be stored. Default location is ${hadoop.tmp.dir}/dfs/name. Suggested location /home/hadoop/dfs/name</description>
</property>
<property>
    <name>dfs.data.dir</name>
    <value>/u01/accts/a.brlamore/tmp/path/to/datanode/namespace/</value>
    <description>Where DataNodes store their blocks. Default location ${hadoop.tmp.dir}/dfs/data. Suggested location /home/hadoop/dfs/data</description>
</property>
-->
</configuration>

Set values in conf/mapred-site.xml

<configuration>
<property>
    <name>mapreduce.jobtracker.address</name>
    <value>ali-graph001.devapollogrp.edu:8021</value>
    <description>Host or IP and port of JobTracker</description>
</property>
</configuration>

Hadoop Startup

# Format the filesystem
bin/hadoop namenode -format

# Start the HDFS on the NameNode
bin/start-dfs.sh

# Start Map-Reduce on the TrackerNode
bin/start-mapred.sh

Installing Hadoop Cluster

This is what the final setup will look like

Prework

Important Directories

Configuration

Week 49 Objective

Finding Bad Code